2024-09-16 17:46:33,836 INFO [train.py:1266] (1/2) Training started 2024-09-16 17:46:33,836 INFO [train.py:1276] (1/2) Device: cuda:1 2024-09-16 17:46:33,838 INFO [train.py:1307] (1/2) Using dtype=torch.float16 2024-09-16 17:46:33,839 INFO [train.py:1308] (1/2) Use AMP=True 2024-09-16 17:46:33,839 INFO [train.py:1310] (1/2) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'ignore_id': -1, 'label_smoothing': 0.1, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.4', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '9f6206b565b833d71e19b4411493d04d99f0a308', 'k2-git-date': 'Thu Mar 28 09:46:54 2024', 'lhotse-version': '1.27.0', 'torch-version': '2.2.2+cu118', 'torch-cuda-available': True, 'torch-cuda-version': '11.8', 'python-version': '3.10', 'icefall-git-branch': 'cr-ctc', 'icefall-git-sha1': '07d6b123-dirty', 'icefall-git-date': 'Wed Sep 4 19:33:41 2024', 'icefall-path': '/zw/mnt/yaozengwei/workspace/icefall_cr_ctc', 'k2-path': '/root/anaconda3/envs/python3.10/lib/python3.10/site-packages/k2/__init__.py', 'lhotse-path': '/root/anaconda3/envs/python3.10/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'NGK_zengwei'}, 'world_size': 2, 'master_port': 12343, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('zipformer/exp-large-ctc-rnnt-ctc-loss-scale-0.1-cr-loss-scale-0.02-time-mask-ratio-2.5-scaled-masked-1'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.1, 'cr_loss_scale': 0.02, 'time_mask_ratio': 2.5, 'cr_loss_masked_scale': 1.0, 'attention_decoder_loss_scale': 0.8, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'use_bf16': False, 'num_encoder_layers': '2,2,4,5,4,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1536,2048,1536,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,512,768,512,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,320,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'attention_decoder_dim': 512, 'attention_decoder_num_layers': 6, 'attention_decoder_attention_dim': 512, 'attention_decoder_num_heads': 8, 'attention_decoder_feedforward_dim': 2048, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': True, 'use_attention_decoder': False, 'use_cr_ctc': True, 'full_libri': True, 'mini_libri': False, 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1400, 'bucketing_sampler': True, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': False, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'input_strategy': 'PrecomputedFeatures', 'blank_id': 0, 'sos_id': 1, 'eos_id': 1, 'vocab_size': 500, 'dtype': torch.float16, 'use_autocast': True} 2024-09-16 17:46:33,839 INFO [train.py:1312] (1/2) About to create model 2024-09-16 17:46:34,516 INFO [train.py:1316] (1/2) Number of model parameters: 148824074 2024-09-16 17:46:34,517 INFO [train.py:752] (1/2) num_frame_masks: 25.0, max_frames_mask_fraction: 0.375 2024-09-16 17:46:36,044 INFO [train.py:1338] (1/2) Using DDP 2024-09-16 17:46:38,100 INFO [asr_datamodule.py:436] (1/2) About to get the shuffled train-clean-100, train-clean-360 and train-other-500 cuts 2024-09-16 17:46:38,101 INFO [asr_datamodule.py:232] (1/2) Enable MUSAN 2024-09-16 17:46:38,101 INFO [asr_datamodule.py:233] (1/2) About to get Musan cuts 2024-09-16 17:46:39,952 INFO [asr_datamodule.py:279] (1/2) Disable SpecAugment 2024-09-16 17:46:39,952 INFO [asr_datamodule.py:281] (1/2) About to create train dataset 2024-09-16 17:46:39,952 INFO [asr_datamodule.py:308] (1/2) Using DynamicBucketingSampler. 2024-09-16 17:46:40,695 INFO [asr_datamodule.py:325] (1/2) About to create train dataloader 2024-09-16 17:46:40,696 INFO [asr_datamodule.py:453] (1/2) About to get dev-clean cuts 2024-09-16 17:46:40,697 INFO [asr_datamodule.py:460] (1/2) About to get dev-other cuts 2024-09-16 17:46:40,698 INFO [asr_datamodule.py:356] (1/2) About to create dev dataset 2024-09-16 17:46:40,855 INFO [asr_datamodule.py:373] (1/2) About to create dev dataloader 2024-09-16 17:46:40,855 INFO [train.py:1545] (1/2) Sanity check -- see if any of the batches in epoch 1 would cause OOM. 2024-09-16 17:49:26,628 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 48777MB 2024-09-16 17:49:28,789 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 48777MB 2024-09-16 17:49:31,043 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 48777MB 2024-09-16 17:49:33,371 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 48777MB 2024-09-16 17:49:36,069 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 48777MB 2024-09-16 17:49:37,617 INFO [scaling.py:1024] (1/2) Whitening: name=None, num_groups=1, num_channels=512, metric=112.09 vs. limit=7.5 2024-09-16 17:49:38,712 INFO [train.py:1576] (1/2) Maximum memory allocated so far is 48777MB 2024-09-16 17:50:08,364 INFO [train.py:1198] (1/2) Epoch 1, batch 0, loss[loss=8.112, simple_loss=6.913, pruned_loss=7.129, ctc_loss=4.732, cr_loss=0.5711, over 34444.00 frames. ], tot_loss[loss=8.112, simple_loss=6.913, pruned_loss=7.129, ctc_loss=4.732, cr_loss=0.5711, over 34444.00 frames. ], batch size: 85, lr: 2.25e-02, grad_scale: 1.0 2024-09-16 17:50:08,365 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 17:50:26,823 INFO [train.py:1230] (1/2) Epoch 1, validation: loss=8.189, simple_loss=6.983, pruned_loss=7.222, ctc_loss=4.826, cr_loss=2.97e-15, over 944034.00 frames. 2024-09-16 17:50:26,824 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 48777MB 2024-09-16 17:50:29,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=0.0, ans=0.5 2024-09-16 17:50:29,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=0.0, ans=0.2 2024-09-16 17:50:31,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=28.14 vs. limit=7.5 2024-09-16 17:50:40,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.39 vs. limit=5.0 2024-09-16 17:50:43,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.80 vs. limit=7.5 2024-09-16 17:50:44,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=46.666666666666664, ans=0.2995333333333333 2024-09-16 17:50:51,581 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 6.023e+03 6.181e+03 6.859e+03 8.923e+03 9.381e+03, threshold=2.744e+04, percent-clipped=0.0 2024-09-16 17:50:54,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=46.666666666666664, ans=0.8983666666666666 2024-09-16 17:51:11,371 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=622.14 vs. limit=7.535 2024-09-16 17:51:12,153 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.124e+03 2.227e+03 6.181e+03 8.923e+03 1.431e+04, threshold=2.472e+04, percent-clipped=0.0 2024-09-16 17:51:19,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=204.08 vs. limit=7.57 2024-09-16 17:51:26,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=140.0, ans=0.8951 2024-09-16 17:51:27,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=724.52 vs. limit=7.5525 2024-09-16 17:51:45,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=241.97 vs. limit=7.5525 2024-09-16 17:51:45,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=200.34 vs. limit=5.07 2024-09-16 17:51:54,523 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.393e+02 1.537e+03 3.758e+03 6.859e+03 4.427e+04, threshold=1.503e+04, percent-clipped=2.5 2024-09-16 17:51:55,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=15.14 vs. limit=4.074666666666666 2024-09-16 17:52:05,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=186.66666666666666, ans=7.64 2024-09-16 17:52:08,978 INFO [train.py:1198] (1/2) Epoch 1, batch 50, loss[loss=1.318, simple_loss=1.056, pruned_loss=1.226, ctc_loss=1.211, cr_loss=0.2006, over 34493.00 frames. ], tot_loss[loss=3.407, simple_loss=2.89, pruned_loss=2.563, ctc_loss=2.444, cr_loss=0.4648, over 1481628.99 frames. ], batch size: 82, lr: 2.48e-02, grad_scale: 0.25 2024-09-16 17:52:10,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=124.12 vs. limit=7.5875 2024-09-16 17:52:15,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=233.33333333333334, ans=0.4890625 2024-09-16 17:52:16,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=116.96 vs. limit=7.5875 2024-09-16 17:52:17,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=233.33333333333334, ans=0.4708333333333333 2024-09-16 17:52:18,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=35.51 vs. limit=5.116666666666666 2024-09-16 17:52:26,266 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=11.64 vs. limit=5.058333333333334 2024-09-16 17:52:36,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=280.0, ans=0.486875 2024-09-16 17:52:36,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=22.84 vs. limit=5.07 2024-09-16 17:52:51,034 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=46.27 vs. limit=7.745 2024-09-16 17:52:57,530 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=72.27 vs. limit=7.745 2024-09-16 17:53:01,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=326.6666666666667, ans=0.4846875 2024-09-16 17:53:08,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=748.34 vs. limit=7.6225 2024-09-16 17:53:21,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=125.09 vs. limit=7.64 2024-09-16 17:53:34,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.34 vs. limit=7.815 2024-09-16 17:53:40,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=159.83 vs. limit=7.6575 2024-09-16 17:53:43,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=420.0, ans=0.4803125 2024-09-16 17:53:45,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=420.0, ans=0.4803125 2024-09-16 17:53:46,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=67.10 vs. limit=7.815 2024-09-16 17:53:50,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=77.75 vs. limit=7.815 2024-09-16 17:53:50,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=86.04 vs. limit=7.6575 2024-09-16 17:53:55,894 INFO [train.py:1198] (1/2) Epoch 1, batch 100, loss[loss=1.267, simple_loss=0.9932, pruned_loss=1.23, ctc_loss=1.167, cr_loss=0.1801, over 34594.00 frames. ], tot_loss[loss=2.299, simple_loss=1.91, pruned_loss=1.875, ctc_loss=1.781, cr_loss=0.3144, over 2630295.85 frames. ], batch size: 89, lr: 2.70e-02, grad_scale: 0.5 2024-09-16 17:53:58,301 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:54:02,189 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 9.925e+01 1.892e+02 4.401e+02 2.755e+03 4.427e+04, threshold=8.802e+02, percent-clipped=0.0 2024-09-16 17:54:03,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=16.53 vs. limit=4.1866666666666665 2024-09-16 17:54:13,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=109.69 vs. limit=7.675 2024-09-16 17:54:18,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=214.62 vs. limit=7.6925 2024-09-16 17:54:25,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=177.59 vs. limit=7.6925 2024-09-16 17:54:30,144 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=7.6925 2024-09-16 17:54:30,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=24.74 vs. limit=5.128333333333333 2024-09-16 17:54:52,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=560.0, ans=0.0874 2024-09-16 17:54:54,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=113.19 vs. limit=7.71 2024-09-16 17:54:57,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=46.06 vs. limit=7.955 2024-09-16 17:55:00,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=606.6666666666666, ans=0.8787666666666667 2024-09-16 17:55:01,236 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.66 vs. limit=4.242666666666667 2024-09-16 17:55:11,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.52 vs. limit=7.955 2024-09-16 17:55:22,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=84.63 vs. limit=5.326666666666667 2024-09-16 17:55:22,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=111.21 vs. limit=7.745 2024-09-16 17:55:25,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=653.3333333333334, ans=0.8771333333333333 2024-09-16 17:55:31,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=653.3333333333334, ans=0.469375 2024-09-16 17:55:34,553 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=22.41 vs. limit=7.745 2024-09-16 17:55:34,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=107.93 vs. limit=7.745 2024-09-16 17:55:38,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=700.0, ans=0.4125 2024-09-16 17:55:39,823 INFO [train.py:1198] (1/2) Epoch 1, batch 150, loss[loss=1.057, simple_loss=0.8073, pruned_loss=1.01, ctc_loss=1.081, cr_loss=0.1131, over 34490.00 frames. ], tot_loss[loss=1.837, simple_loss=1.497, pruned_loss=1.574, ctc_loss=1.529, cr_loss=0.2393, over 3557903.97 frames. ], batch size: 82, lr: 2.93e-02, grad_scale: 0.5 2024-09-16 17:55:40,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.55 vs. limit=8.025 2024-09-16 17:55:40,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=55.05 vs. limit=7.7625 2024-09-16 17:55:42,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=56.48 vs. limit=7.7625 2024-09-16 17:55:51,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=16.82 vs. limit=5.175 2024-09-16 17:56:00,486 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=165.67 vs. limit=7.7625 2024-09-16 17:56:04,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=187.97 vs. limit=7.78 2024-09-16 17:56:04,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=12.83 vs. limit=5.1866666666666665 2024-09-16 17:56:10,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=30.25 vs. limit=7.78 2024-09-16 17:56:12,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=66.52 vs. limit=7.78 2024-09-16 17:56:12,877 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=18.36 vs. limit=7.78 2024-09-16 17:56:16,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=31.22 vs. limit=8.06 2024-09-16 17:56:18,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=746.6666666666666, ans=0.46499999999999997 2024-09-16 17:56:20,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=746.6666666666666, ans=0.46499999999999997 2024-09-16 17:56:41,587 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=214.57 vs. limit=7.7975 2024-09-16 17:56:47,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.18 vs. limit=5.21 2024-09-16 17:56:47,582 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=15.74 vs. limit=7.815 2024-09-16 17:56:51,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840.0, ans=0.29159999999999997 2024-09-16 17:56:55,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=840.0, ans=0.29159999999999997 2024-09-16 17:56:55,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=840.0, ans=0.460625 2024-09-16 17:57:02,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=840.0, ans=0.460625 2024-09-16 17:57:03,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=78.20 vs. limit=7.815 2024-09-16 17:57:15,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=44.14 vs. limit=7.8325 2024-09-16 17:57:26,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=31.53 vs. limit=7.8325 2024-09-16 17:57:29,785 INFO [train.py:1198] (1/2) Epoch 1, batch 200, loss[loss=1.118, simple_loss=0.8523, pruned_loss=0.9834, ctc_loss=1.187, cr_loss=0.14, over 31970.00 frames. ], tot_loss[loss=1.584, simple_loss=1.269, pruned_loss=1.382, ctc_loss=1.405, cr_loss=0.2047, over 4271706.49 frames. ], batch size: 145, lr: 3.15e-02, grad_scale: 1.0 2024-09-16 17:57:32,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=933.3333333333334, ans=0.24066666666666667 2024-09-16 17:57:35,970 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 8.001e+01 1.455e+02 1.837e+02 2.607e+02 5.166e+02, threshold=3.674e+02, percent-clipped=0.0 2024-09-16 17:57:47,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=933.3333333333334, ans=0.45625 2024-09-16 17:58:06,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=29.85 vs. limit=7.8675 2024-09-16 17:58:08,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.53 vs. limit=8.235 2024-09-16 17:58:33,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=37.35 vs. limit=5.536666666666667 2024-09-16 17:58:33,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=10.06 vs. limit=4.429333333333333 2024-09-16 17:58:53,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=1120.0, ans=0.4475 2024-09-16 17:59:06,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=1120.0, ans=0.8608 2024-09-16 17:59:14,378 INFO [train.py:1198] (1/2) Epoch 1, batch 250, loss[loss=1.139, simple_loss=0.8544, pruned_loss=0.9936, ctc_loss=1.233, cr_loss=0.1703, over 34191.00 frames. ], tot_loss[loss=1.429, simple_loss=1.128, pruned_loss=1.252, ctc_loss=1.333, cr_loss=0.1901, over 4834041.18 frames. ], batch size: 117, lr: 3.38e-02, grad_scale: 1.0 2024-09-16 17:59:17,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=38.19 vs. limit=8.375 2024-09-16 17:59:19,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.57 vs. limit=7.9375 2024-09-16 17:59:21,280 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=15.33 vs. limit=5.291666666666667 2024-09-16 17:59:27,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=1166.6666666666667, ans=0.15625 2024-09-16 17:59:27,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=1166.6666666666667, ans=0.4453125 2024-09-16 17:59:28,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=7.9375 2024-09-16 17:59:42,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=113.42 vs. limit=7.955 2024-09-16 17:59:45,097 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=107.05 vs. limit=7.955 2024-09-16 17:59:52,333 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 17:59:59,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1260.0, ans=0.4409375 2024-09-16 18:00:10,785 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=21.63 vs. limit=7.9725 2024-09-16 18:00:25,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=79.89 vs. limit=7.99 2024-09-16 18:00:26,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=1306.6666666666667, ans=0.43875 2024-09-16 18:00:26,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=1306.6666666666667, ans=7.99 2024-09-16 18:00:30,976 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=137.02 vs. limit=7.99 2024-09-16 18:00:49,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=45.08 vs. limit=8.0075 2024-09-16 18:00:51,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.91 vs. limit=8.515 2024-09-16 18:00:54,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.44 vs. limit=8.0075 2024-09-16 18:00:56,517 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=8.0075 2024-09-16 18:00:59,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=64.17 vs. limit=8.025 2024-09-16 18:01:01,258 INFO [train.py:1198] (1/2) Epoch 1, batch 300, loss[loss=1.112, simple_loss=0.8227, pruned_loss=0.9579, ctc_loss=1.226, cr_loss=0.169, over 34373.00 frames. ], tot_loss[loss=1.327, simple_loss=1.034, pruned_loss=1.161, ctc_loss=1.287, cr_loss=0.1817, over 5259117.72 frames. ], batch size: 107, lr: 3.60e-02, grad_scale: 2.0 2024-09-16 18:01:07,508 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.294e+02 2.077e+02 2.901e+02 3.882e+02 6.477e+02, threshold=5.802e+02, percent-clipped=28.0 2024-09-16 18:01:19,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.18 vs. limit=8.55 2024-09-16 18:01:22,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.87 vs. limit=8.585 2024-09-16 18:01:30,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1446.6666666666667, ans=0.14575 2024-09-16 18:01:30,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=1446.6666666666667, ans=0.4321875 2024-09-16 18:01:36,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=128.55 vs. limit=5.723333333333334 2024-09-16 18:01:37,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=1446.6666666666667, ans=0.14575 2024-09-16 18:01:39,599 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=143.83 vs. limit=8.0425 2024-09-16 18:01:43,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=69.80 vs. limit=8.06 2024-09-16 18:01:52,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=46.07 vs. limit=8.06 2024-09-16 18:01:52,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=8.620000000000001 2024-09-16 18:02:03,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=1493.3333333333333, ans=8.06 2024-09-16 18:02:06,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=21.90 vs. limit=5.77 2024-09-16 18:02:09,110 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.31 vs. limit=8.655 2024-09-16 18:02:14,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=1540.0, ans=0.4278125 2024-09-16 18:02:17,325 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=46.32 vs. limit=8.0775 2024-09-16 18:02:21,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=17.14 vs. limit=8.0775 2024-09-16 18:02:27,688 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=80.39 vs. limit=8.095 2024-09-16 18:02:31,555 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=47.76 vs. limit=8.095 2024-09-16 18:02:47,205 INFO [train.py:1198] (1/2) Epoch 1, batch 350, loss[loss=0.9809, simple_loss=0.7187, pruned_loss=0.8357, ctc_loss=1.065, cr_loss=0.1583, over 34267.00 frames. ], tot_loss[loss=1.26, simple_loss=0.969, pruned_loss=1.097, ctc_loss=1.258, cr_loss=0.1763, over 5594781.47 frames. ], batch size: 83, lr: 3.83e-02, grad_scale: 2.0 2024-09-16 18:02:50,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=32.74 vs. limit=8.1125 2024-09-16 18:02:50,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=55.54 vs. limit=8.1125 2024-09-16 18:02:57,190 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=40.31 vs. limit=5.0 2024-09-16 18:03:09,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.91 vs. limit=5.42 2024-09-16 18:03:11,425 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=30.04 vs. limit=8.76 2024-09-16 18:03:18,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=8.76 2024-09-16 18:03:21,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=1680.0, ans=0.08950000000000001 2024-09-16 18:03:38,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=27.23 vs. limit=8.1475 2024-09-16 18:03:42,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=1726.6666666666667, ans=0.4190625 2024-09-16 18:03:47,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=115.19 vs. limit=8.1475 2024-09-16 18:03:57,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=8.01 vs. limit=4.709333333333333 2024-09-16 18:03:57,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=30.68 vs. limit=8.165 2024-09-16 18:04:02,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=1773.3333333333333, ans=0.416875 2024-09-16 18:04:08,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=43.07 vs. limit=8.165 2024-09-16 18:04:18,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=123.67 vs. limit=8.1825 2024-09-16 18:04:20,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=50.95 vs. limit=8.1825 2024-09-16 18:04:24,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=4.728 2024-09-16 18:04:26,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=8.1825 2024-09-16 18:04:31,928 INFO [train.py:1198] (1/2) Epoch 1, batch 400, loss[loss=1.109, simple_loss=0.8051, pruned_loss=0.9314, ctc_loss=1.204, cr_loss=0.1464, over 34421.00 frames. ], tot_loss[loss=1.21, simple_loss=0.9191, pruned_loss=1.047, ctc_loss=1.234, cr_loss=0.17, over 5862619.47 frames. ], batch size: 95, lr: 4.05e-02, grad_scale: 4.0 2024-09-16 18:04:36,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=1866.6666666666667, ans=0.14500000000000002 2024-09-16 18:04:36,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=1866.6666666666667, ans=0.4125 2024-09-16 18:04:38,064 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.717e+02 2.628e+02 3.573e+02 4.515e+02 7.735e+02, threshold=7.147e+02, percent-clipped=6.0 2024-09-16 18:04:39,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=21.63 vs. limit=5.933333333333334 2024-09-16 18:04:47,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=41.85 vs. limit=8.2 2024-09-16 18:04:47,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=8.2 2024-09-16 18:04:55,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=115.00 vs. limit=8.2175 2024-09-16 18:04:56,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=22.90 vs. limit=5.956666666666667 2024-09-16 18:04:59,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=1913.3333333333333, ans=0.41031249999999997 2024-09-16 18:05:02,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.79 vs. limit=8.2175 2024-09-16 18:05:27,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.25 vs. limit=8.97 2024-09-16 18:05:28,969 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=84.30 vs. limit=8.235 2024-09-16 18:05:39,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=38.45 vs. limit=8.2525 2024-09-16 18:05:49,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=2006.6666666666667, ans=0.12475 2024-09-16 18:05:50,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=8.2525 2024-09-16 18:05:51,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=2006.6666666666667, ans=0.2799333333333333 2024-09-16 18:05:55,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.88 vs. limit=9.040000000000001 2024-09-16 18:05:56,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=8.27 2024-09-16 18:06:02,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=19.46 vs. limit=8.27 2024-09-16 18:06:02,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=8.27 2024-09-16 18:06:12,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=2053.3333333333335, ans=0.8281333333333334 2024-09-16 18:06:19,857 INFO [train.py:1198] (1/2) Epoch 1, batch 450, loss[loss=1.137, simple_loss=0.8162, pruned_loss=0.9474, ctc_loss=1.232, cr_loss=0.1527, over 34710.00 frames. ], tot_loss[loss=1.176, simple_loss=0.8833, pruned_loss=1.01, ctc_loss=1.218, cr_loss=0.1658, over 6053459.23 frames. ], batch size: 97, lr: 4.28e-02, grad_scale: 2.0 2024-09-16 18:06:36,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=2100.0, ans=0.4015625 2024-09-16 18:06:45,463 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=23.23 vs. limit=8.305 2024-09-16 18:06:55,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=2146.6666666666665, ans=6.073333333333333 2024-09-16 18:07:12,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=7.62 vs. limit=4.8773333333333335 2024-09-16 18:07:40,860 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=102.72 vs. limit=8.3575 2024-09-16 18:07:46,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.93 vs. limit=9.215 2024-09-16 18:08:02,388 INFO [train.py:1198] (1/2) Epoch 1, batch 500, loss[loss=1.135, simple_loss=0.8086, pruned_loss=0.9218, ctc_loss=1.237, cr_loss=0.1828, over 34433.00 frames. ], tot_loss[loss=1.147, simple_loss=0.8527, pruned_loss=0.9738, ctc_loss=1.2, cr_loss=0.1638, over 6220007.26 frames. ], batch size: 110, lr: 4.49e-02, grad_scale: 4.0 2024-09-16 18:08:02,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=2333.3333333333335, ans=6.458333333333334 2024-09-16 18:08:10,378 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.713e+02 2.611e+02 3.561e+02 4.626e+02 8.209e+02, threshold=7.123e+02, percent-clipped=1.0 2024-09-16 18:08:14,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2333.3333333333335, ans=0.390625 2024-09-16 18:08:33,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=27.05 vs. limit=9.285 2024-09-16 18:08:40,042 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.34 vs. limit=9.285 2024-09-16 18:08:43,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=2426.6666666666665, ans=0.11349999999999999 2024-09-16 18:08:45,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.53 vs. limit=9.32 2024-09-16 18:08:47,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=2426.6666666666665, ans=0.109 2024-09-16 18:08:47,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=72.03 vs. limit=8.41 2024-09-16 18:08:55,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2426.6666666666665, ans=0.109 2024-09-16 18:08:57,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=2426.6666666666665, ans=0.22573333333333334 2024-09-16 18:09:04,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=8.4275 2024-09-16 18:09:05,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=2473.3333333333335, ans=0.1908333333333333 2024-09-16 18:09:14,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=2473.3333333333335, ans=0.3840625 2024-09-16 18:09:16,511 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=17.19 vs. limit=6.236666666666666 2024-09-16 18:09:24,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=2520.0, ans=0.2378 2024-09-16 18:09:33,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.13 vs. limit=8.445 2024-09-16 18:09:37,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.85 vs. limit=9.39 2024-09-16 18:09:44,774 INFO [train.py:1198] (1/2) Epoch 1, batch 550, loss[loss=1.101, simple_loss=0.7932, pruned_loss=0.8498, ctc_loss=1.173, cr_loss=0.2125, over 33756.00 frames. ], tot_loss[loss=1.126, simple_loss=0.831, pruned_loss=0.9414, ctc_loss=1.186, cr_loss=0.1681, over 6329786.52 frames. ], batch size: 122, lr: 4.49e-02, grad_scale: 4.0 2024-09-16 18:09:47,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=2566.6666666666665, ans=0.5 2024-09-16 18:10:06,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=52.70 vs. limit=8.48 2024-09-16 18:10:07,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.53 vs. limit=5.653333333333333 2024-09-16 18:10:18,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=2613.3333333333335, ans=0.3775 2024-09-16 18:10:20,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=2613.3333333333335, ans=8.48 2024-09-16 18:10:34,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=2660.0, ans=0.8069000000000001 2024-09-16 18:10:40,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=2660.0, ans=0.8069000000000001 2024-09-16 18:10:47,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=39.92 vs. limit=8.4975 2024-09-16 18:11:03,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=2706.6666666666665, ans=0.37312500000000004 2024-09-16 18:11:09,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=2753.3333333333335, ans=0.37093750000000003 2024-09-16 18:11:12,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.20 vs. limit=9.565 2024-09-16 18:11:30,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=13.75 vs. limit=8.55 2024-09-16 18:11:32,013 INFO [train.py:1198] (1/2) Epoch 1, batch 600, loss[loss=1.056, simple_loss=0.767, pruned_loss=0.7707, ctc_loss=1.137, cr_loss=0.2334, over 34197.00 frames. ], tot_loss[loss=1.107, simple_loss=0.8132, pruned_loss=0.9041, ctc_loss=1.17, cr_loss=0.1817, over 6432120.00 frames. ], batch size: 117, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 18:11:34,992 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=30.16 vs. limit=8.55 2024-09-16 18:11:40,082 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.170e+02 3.595e+02 4.520e+02 5.846e+02 1.463e+03, threshold=9.040e+02, percent-clipped=16.0 2024-09-16 18:11:41,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.15 vs. limit=5.7 2024-09-16 18:11:57,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.97 vs. limit=9.635 2024-09-16 18:12:00,949 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.17 vs. limit=6.423333333333333 2024-09-16 18:12:10,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=2893.3333333333335, ans=0.2710666666666667 2024-09-16 18:12:12,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=2893.3333333333335, ans=0.04095833333333333 2024-09-16 18:12:17,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=11.15 vs. limit=8.585 2024-09-16 18:12:19,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=35.17 vs. limit=8.585 2024-09-16 18:12:20,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2893.3333333333335, ans=0.09149999999999998 2024-09-16 18:12:27,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=8.585 2024-09-16 18:12:32,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.37 vs. limit=8.6025 2024-09-16 18:12:39,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.86 vs. limit=5.735 2024-09-16 18:12:47,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=2940.0, ans=0.3621875 2024-09-16 18:13:11,524 INFO [train.py:1198] (1/2) Epoch 1, batch 650, loss[loss=0.9236, simple_loss=0.6799, pruned_loss=0.6367, ctc_loss=0.9883, cr_loss=0.266, over 34522.00 frames. ], tot_loss[loss=1.077, simple_loss=0.7907, pruned_loss=0.8544, ctc_loss=1.14, cr_loss=0.2047, over 6523121.58 frames. ], batch size: 94, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 18:13:15,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=3033.3333333333335, ans=0.03174999999999999 2024-09-16 18:13:18,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=8.6375 2024-09-16 18:13:27,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=3033.3333333333335, ans=0.7803333333333333 2024-09-16 18:13:28,781 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=17.40 vs. limit=8.6375 2024-09-16 18:13:35,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=3080.0, ans=6.925 2024-09-16 18:13:38,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=9.81 2024-09-16 18:13:39,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.24 vs. limit=8.655 2024-09-16 18:13:51,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3126.6666666666665, ans=0.35343749999999996 2024-09-16 18:13:57,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=8.6725 2024-09-16 18:13:58,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=8.6725 2024-09-16 18:14:03,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=6.40 vs. limit=5.250666666666667 2024-09-16 18:14:05,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=20.12 vs. limit=8.6725 2024-09-16 18:14:11,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=3173.3333333333335, ans=0.07 2024-09-16 18:14:17,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=9.879999999999999 2024-09-16 18:14:24,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3173.3333333333335, ans=0.35125 2024-09-16 18:14:29,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.30 vs. limit=8.7075 2024-09-16 18:14:46,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=6.05 vs. limit=5.288 2024-09-16 18:14:51,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=9.95 2024-09-16 18:14:53,070 INFO [train.py:1198] (1/2) Epoch 1, batch 700, loss[loss=0.8346, simple_loss=0.6223, pruned_loss=0.5455, ctc_loss=0.8789, cr_loss=0.3392, over 34581.00 frames. ], tot_loss[loss=1.042, simple_loss=0.7669, pruned_loss=0.7994, ctc_loss=1.103, cr_loss=0.2313, over 6578620.48 frames. ], batch size: 89, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 18:15:02,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.56 vs. limit=6.633333333333333 2024-09-16 18:15:03,023 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.330e+02 4.023e+02 5.458e+02 7.563e+02 1.955e+03, threshold=1.092e+03, percent-clipped=11.0 2024-09-16 18:15:03,775 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=8.725 2024-09-16 18:15:06,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=8.725 2024-09-16 18:15:19,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=3313.3333333333335, ans=0.06362499999999999 2024-09-16 18:15:37,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=3360.0, ans=0.7836 2024-09-16 18:15:42,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=3360.0, ans=0.3425 2024-09-16 18:15:50,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=3360.0, ans=0.3425 2024-09-16 18:16:01,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.77 vs. limit=5.851666666666667 2024-09-16 18:16:02,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=3406.6666666666665, ans=0.2511 2024-09-16 18:16:02,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=3406.6666666666665, ans=0.3403125 2024-09-16 18:16:08,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=3406.6666666666665, ans=0.7807666666666667 2024-09-16 18:16:08,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3406.6666666666665, ans=0.26593333333333335 2024-09-16 18:16:20,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=3453.3333333333335, ans=0.03920833333333333 2024-09-16 18:16:20,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=16.43 vs. limit=8.795 2024-09-16 18:16:21,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=3453.3333333333335, ans=0.26546666666666663 2024-09-16 18:16:28,002 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.85 vs. limit=5.863333333333333 2024-09-16 18:16:33,216 INFO [train.py:1198] (1/2) Epoch 1, batch 750, loss[loss=0.8283, simple_loss=0.6291, pruned_loss=0.508, ctc_loss=0.8745, cr_loss=0.3672, over 34431.00 frames. ], tot_loss[loss=0.9962, simple_loss=0.7369, pruned_loss=0.7385, ctc_loss=1.054, cr_loss=0.254, over 6621776.59 frames. ], batch size: 95, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 18:16:37,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=3500.0, ans=0.3359375 2024-09-16 18:16:59,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=11.22 vs. limit=10.16 2024-09-16 18:17:09,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.36 vs. limit=10.16 2024-09-16 18:17:10,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=3593.3333333333335, ans=0.06524999999999997 2024-09-16 18:17:14,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=3593.3333333333335, ans=0.3315625 2024-09-16 18:17:16,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=8.8475 2024-09-16 18:17:24,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=8.8475 2024-09-16 18:17:36,017 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.39 vs. limit=8.865 2024-09-16 18:17:37,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=3640.0, ans=0.329375 2024-09-16 18:18:10,467 INFO [train.py:1198] (1/2) Epoch 1, batch 800, loss[loss=0.7131, simple_loss=0.5514, pruned_loss=0.4156, ctc_loss=0.7453, cr_loss=0.3126, over 34457.00 frames. ], tot_loss[loss=0.9517, simple_loss=0.708, pruned_loss=0.6814, ctc_loss=1.004, cr_loss=0.2724, over 6658082.48 frames. ], batch size: 85, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 18:18:20,106 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.882e+02 4.521e+02 5.991e+02 8.647e+02 2.359e+03, threshold=1.198e+03, percent-clipped=10.0 2024-09-16 18:18:27,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3780.0, ans=0.058249999999999996 2024-09-16 18:18:28,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3780.0, ans=0.3228125 2024-09-16 18:18:55,521 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.86 vs. limit=10.370000000000001 2024-09-16 18:19:00,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=3826.6666666666665, ans=0.013899999999999996 2024-09-16 18:19:05,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=3826.6666666666665, ans=0.03475 2024-09-16 18:19:30,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.80 vs. limit=8.97 2024-09-16 18:19:35,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=3920.0, ans=0.31625000000000003 2024-09-16 18:19:49,194 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.09 vs. limit=5.586666666666667 2024-09-16 18:19:50,106 INFO [train.py:1198] (1/2) Epoch 1, batch 850, loss[loss=0.7358, simple_loss=0.5728, pruned_loss=0.4168, ctc_loss=0.7663, cr_loss=0.3468, over 34359.00 frames. ], tot_loss[loss=0.9043, simple_loss=0.6778, pruned_loss=0.6244, ctc_loss=0.951, cr_loss=0.2886, over 6692344.04 frames. ], batch size: 103, lr: 4.49e-02, grad_scale: 8.0 2024-09-16 18:19:51,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=8.13 vs. limit=8.9875 2024-09-16 18:19:55,472 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=10.475 2024-09-16 18:19:58,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=3966.6666666666665, ans=0.7611666666666667 2024-09-16 18:20:04,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.89 vs. limit=6.983333333333333 2024-09-16 18:20:13,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.90 vs. limit=9.004999999999999 2024-09-16 18:20:31,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4060.0, ans=0.3096875 2024-09-16 18:20:33,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.31 vs. limit=7.029999999999999 2024-09-16 18:20:45,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=10.58 2024-09-16 18:20:56,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=4106.666666666667, ans=0.049555555555555554 2024-09-16 18:20:58,868 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.46 vs. limit=9.040000000000001 2024-09-16 18:21:04,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.36 vs. limit=9.0575 2024-09-16 18:21:05,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=4153.333333333333, ans=0.04936111111111111 2024-09-16 18:21:11,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.08 vs. limit=10.615 2024-09-16 18:21:19,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=4153.333333333333, ans=0.04936111111111111 2024-09-16 18:21:24,111 INFO [train.py:1198] (1/2) Epoch 1, batch 900, loss[loss=0.6633, simple_loss=0.5253, pruned_loss=0.36, ctc_loss=0.675, cr_loss=0.3442, over 34496.00 frames. ], tot_loss[loss=0.8625, simple_loss=0.6515, pruned_loss=0.575, ctc_loss=0.9023, cr_loss=0.3047, over 6700450.53 frames. ], batch size: 85, lr: 4.48e-02, grad_scale: 8.0 2024-09-16 18:21:27,104 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.34 vs. limit=9.075 2024-09-16 18:21:35,050 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.410e+02 3.968e+02 5.607e+02 7.426e+02 2.185e+03, threshold=1.121e+03, percent-clipped=7.0 2024-09-16 18:21:42,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=4246.666666666667, ans=0.26370000000000005 2024-09-16 18:21:52,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=11.73 vs. limit=10.685 2024-09-16 18:22:05,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=4293.333333333333, ans=0.29874999999999996 2024-09-16 18:22:25,913 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:22:31,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=4340.0, ans=0.09899494936611666 2024-09-16 18:22:34,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.99 vs. limit=9.1275 2024-09-16 18:22:36,053 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.46 vs. limit=6.085 2024-09-16 18:22:40,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4386.666666666667, ans=0.294375 2024-09-16 18:22:56,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=11.50 vs. limit=10.825 2024-09-16 18:22:57,406 INFO [train.py:1198] (1/2) Epoch 1, batch 950, loss[loss=0.6574, simple_loss=0.5155, pruned_loss=0.3587, ctc_loss=0.6706, cr_loss=0.3997, over 34666.00 frames. ], tot_loss[loss=0.8232, simple_loss=0.6269, pruned_loss=0.5302, ctc_loss=0.8565, cr_loss=0.3207, over 6704615.69 frames. ], batch size: 87, lr: 4.48e-02, grad_scale: 4.0 2024-09-16 18:23:32,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=4480.0, ans=0.009895652173913043 2024-09-16 18:23:40,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=4526.666666666667, ans=0.04780555555555556 2024-09-16 18:23:42,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4526.666666666667, ans=0.2878125 2024-09-16 18:23:53,831 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.40 vs. limit=9.1975 2024-09-16 18:24:06,882 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=9.215 2024-09-16 18:24:10,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.98 vs. limit=6.1433333333333335 2024-09-16 18:24:16,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=10.965 2024-09-16 18:24:31,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.30 vs. limit=6.155 2024-09-16 18:24:36,325 INFO [train.py:1198] (1/2) Epoch 1, batch 1000, loss[loss=0.6026, simple_loss=0.4871, pruned_loss=0.3081, ctc_loss=0.6001, cr_loss=0.3887, over 34513.00 frames. ], tot_loss[loss=0.7892, simple_loss=0.606, pruned_loss=0.4917, ctc_loss=0.8152, cr_loss=0.3366, over 6696315.28 frames. ], batch size: 90, lr: 4.48e-02, grad_scale: 8.0 2024-09-16 18:24:40,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=5.866666666666667 2024-09-16 18:24:49,343 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.510e+02 3.653e+02 4.837e+02 6.545e+02 1.448e+03, threshold=9.674e+02, percent-clipped=1.0 2024-09-16 18:24:55,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=4713.333333333333, ans=0.2790625 2024-09-16 18:25:08,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=4713.333333333333, ans=0.2790625 2024-09-16 18:25:31,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=9.3025 2024-09-16 18:25:39,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=4806.666666666667, ans=0.2746875 2024-09-16 18:25:49,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=4853.333333333333, ans=0.27249999999999996 2024-09-16 18:25:57,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.33 vs. limit=11.14 2024-09-16 18:26:07,768 INFO [train.py:1198] (1/2) Epoch 1, batch 1050, loss[loss=0.676, simple_loss=0.5426, pruned_loss=0.351, ctc_loss=0.6531, cr_loss=0.4767, over 34574.00 frames. ], tot_loss[loss=0.753, simple_loss=0.5833, pruned_loss=0.4544, ctc_loss=0.7719, cr_loss=0.3517, over 6705841.94 frames. ], batch size: 99, lr: 4.48e-02, grad_scale: 4.0 2024-09-16 18:26:14,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.40 vs. limit=11.175 2024-09-16 18:26:21,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=11.76 vs. limit=11.175 2024-09-16 18:26:25,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=9.355 2024-09-16 18:26:26,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=4946.666666666667, ans=0.009794202898550725 2024-09-16 18:26:30,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4946.666666666667, ans=0.25053333333333333 2024-09-16 18:26:45,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.33 vs. limit=6.248333333333333 2024-09-16 18:26:48,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4993.333333333333, ans=0.25006666666666666 2024-09-16 18:26:48,787 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.93 vs. limit=7.496666666666666 2024-09-16 18:27:04,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=5040.0, ans=0.04566666666666667 2024-09-16 18:27:24,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=5086.666666666667, ans=0.00976376811594203 2024-09-16 18:27:40,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=5086.666666666667, ans=0.26156250000000003 2024-09-16 18:27:43,788 INFO [train.py:1198] (1/2) Epoch 1, batch 1100, loss[loss=0.5671, simple_loss=0.4694, pruned_loss=0.2771, ctc_loss=0.5388, cr_loss=0.4018, over 34335.00 frames. ], tot_loss[loss=0.7219, simple_loss=0.564, pruned_loss=0.4225, ctc_loss=0.7329, cr_loss=0.367, over 6719148.02 frames. ], batch size: 91, lr: 4.48e-02, grad_scale: 8.0 2024-09-16 18:27:58,489 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.648e+02 3.595e+02 4.192e+02 6.372e+02 1.329e+03, threshold=8.384e+02, percent-clipped=5.0 2024-09-16 18:27:58,892 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:28:09,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=5180.0, ans=0.009743478260869565 2024-09-16 18:28:15,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.39 vs. limit=9.442499999999999 2024-09-16 18:28:29,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.38 vs. limit=11.42 2024-09-16 18:28:51,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5273.333333333333, ans=0.2528125 2024-09-16 18:28:55,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=16.90 vs. limit=7.66 2024-09-16 18:29:04,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=6.09 vs. limit=6.128 2024-09-16 18:29:10,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=11.61 vs. limit=11.49 2024-09-16 18:29:14,687 INFO [train.py:1198] (1/2) Epoch 1, batch 1150, loss[loss=0.5673, simple_loss=0.4664, pruned_loss=0.2806, ctc_loss=0.5369, cr_loss=0.4002, over 34364.00 frames. ], tot_loss[loss=0.6939, simple_loss=0.547, pruned_loss=0.3946, ctc_loss=0.697, cr_loss=0.382, over 6716929.86 frames. ], batch size: 91, lr: 4.47e-02, grad_scale: 4.0 2024-09-16 18:29:31,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.65 vs. limit=9.53 2024-09-16 18:30:29,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.16 vs. limit=9.5825 2024-09-16 18:30:30,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=5553.333333333333, ans=0.24446666666666667 2024-09-16 18:30:42,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5600.0, ans=0.2375 2024-09-16 18:30:45,380 INFO [train.py:1198] (1/2) Epoch 1, batch 1200, loss[loss=0.614, simple_loss=0.5075, pruned_loss=0.3011, ctc_loss=0.5768, cr_loss=0.4188, over 34583.00 frames. ], tot_loss[loss=0.6726, simple_loss=0.5344, pruned_loss=0.3726, ctc_loss=0.6677, cr_loss=0.3962, over 6708501.13 frames. ], batch size: 99, lr: 4.47e-02, grad_scale: 8.0 2024-09-16 18:30:45,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=5600.0, ans=0.043333333333333335 2024-09-16 18:31:00,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=5600.0, ans=0.2375 2024-09-16 18:31:03,321 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.570e+02 3.793e+02 5.115e+02 6.474e+02 1.483e+03, threshold=1.023e+03, percent-clipped=10.0 2024-09-16 18:31:09,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=9.6175 2024-09-16 18:31:16,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=9.6175 2024-09-16 18:31:36,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5693.333333333333, ans=0.04294444444444445 2024-09-16 18:31:51,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=5740.0, ans=0.23093750000000002 2024-09-16 18:31:56,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=5740.0, ans=0.23093750000000002 2024-09-16 18:32:00,170 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:32:02,648 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.43 vs. limit=11.84 2024-09-16 18:32:12,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=5786.666666666667, ans=0.22875 2024-09-16 18:32:19,315 INFO [train.py:1198] (1/2) Epoch 1, batch 1250, loss[loss=0.5871, simple_loss=0.4906, pruned_loss=0.2808, ctc_loss=0.5502, cr_loss=0.4354, over 34356.00 frames. ], tot_loss[loss=0.6513, simple_loss=0.5222, pruned_loss=0.3517, ctc_loss=0.6384, cr_loss=0.4099, over 6741895.09 frames. ], batch size: 107, lr: 4.47e-02, grad_scale: 4.0 2024-09-16 18:32:47,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=5880.0, ans=0.224375 2024-09-16 18:33:00,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=5926.666666666667, ans=0.24073333333333333 2024-09-16 18:33:18,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=11.98 2024-09-16 18:33:22,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.35 vs. limit=7.986666666666666 2024-09-16 18:33:39,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.94 vs. limit=9.7575 2024-09-16 18:33:47,360 INFO [train.py:1198] (1/2) Epoch 1, batch 1300, loss[loss=0.5883, simple_loss=0.4914, pruned_loss=0.2821, ctc_loss=0.5379, cr_loss=0.4817, over 33123.00 frames. ], tot_loss[loss=0.6307, simple_loss=0.5097, pruned_loss=0.3331, ctc_loss=0.611, cr_loss=0.4201, over 6745895.26 frames. ], batch size: 130, lr: 4.47e-02, grad_scale: 8.0 2024-09-16 18:34:04,807 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.424e+02 3.725e+02 4.722e+02 6.314e+02 1.480e+03, threshold=9.443e+02, percent-clipped=2.0 2024-09-16 18:34:07,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.42 vs. limit=8.056666666666667 2024-09-16 18:34:12,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=6113.333333333333, ans=0.18886666666666665 2024-09-16 18:34:22,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=6160.0, ans=0.0 2024-09-16 18:34:29,090 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=9.81 2024-09-16 18:34:31,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.67 vs. limit=9.81 2024-09-16 18:34:32,074 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.35 vs. limit=6.54 2024-09-16 18:34:38,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=6206.666666666667, ans=0.20906249999999998 2024-09-16 18:35:14,758 INFO [train.py:1198] (1/2) Epoch 1, batch 1350, loss[loss=0.5583, simple_loss=0.4692, pruned_loss=0.2657, ctc_loss=0.5078, cr_loss=0.4325, over 34541.00 frames. ], tot_loss[loss=0.6119, simple_loss=0.4985, pruned_loss=0.3165, ctc_loss=0.5858, cr_loss=0.4283, over 6765390.93 frames. ], batch size: 94, lr: 4.46e-02, grad_scale: 4.0 2024-09-16 18:35:17,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=12.225 2024-09-16 18:35:23,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=6300.0, ans=0.0 2024-09-16 18:35:24,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=6300.0, ans=9.8625 2024-09-16 18:35:44,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=6346.666666666667, ans=0.04022222222222222 2024-09-16 18:35:58,632 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.320e-02 2024-09-16 18:36:01,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=6393.333333333333, ans=0.2003125 2024-09-16 18:36:02,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=12.295 2024-09-16 18:36:04,582 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.39 vs. limit=6.598333333333333 2024-09-16 18:36:12,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=6440.0, ans=0.198125 2024-09-16 18:36:23,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.92 vs. limit=3.966 2024-09-16 18:36:31,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=6486.666666666667, ans=0.009459420289855072 2024-09-16 18:36:37,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=6486.666666666667, ans=0.1959375 2024-09-16 18:36:46,072 INFO [train.py:1198] (1/2) Epoch 1, batch 1400, loss[loss=0.5078, simple_loss=0.429, pruned_loss=0.241, ctc_loss=0.4456, cr_loss=0.4188, over 34304.00 frames. ], tot_loss[loss=0.5961, simple_loss=0.4891, pruned_loss=0.3029, ctc_loss=0.5639, cr_loss=0.4351, over 6777123.26 frames. ], batch size: 80, lr: 4.46e-02, grad_scale: 8.0 2024-09-16 18:36:56,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=6533.333333333333, ans=0.03944444444444445 2024-09-16 18:37:05,018 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.703e+02 3.827e+02 5.101e+02 6.663e+02 1.072e+03, threshold=1.020e+03, percent-clipped=7.0 2024-09-16 18:37:05,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=12.91 vs. limit=12.434999999999999 2024-09-16 18:37:12,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=6580.0, ans=0.19156250000000002 2024-09-16 18:37:19,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=12.469999999999999 2024-09-16 18:37:20,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=6626.666666666667, ans=0.18937500000000002 2024-09-16 18:37:24,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.36 vs. limit=12.469999999999999 2024-09-16 18:37:25,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.91 vs. limit=12.469999999999999 2024-09-16 18:37:26,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=6626.666666666667, ans=0.18937500000000002 2024-09-16 18:38:07,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=6720.0, ans=0.029 2024-09-16 18:38:12,847 INFO [train.py:1198] (1/2) Epoch 1, batch 1450, loss[loss=0.551, simple_loss=0.4726, pruned_loss=0.2537, ctc_loss=0.4902, cr_loss=0.4871, over 34465.00 frames. ], tot_loss[loss=0.5833, simple_loss=0.4823, pruned_loss=0.2914, ctc_loss=0.5455, cr_loss=0.4416, over 6774249.15 frames. ], batch size: 110, lr: 4.46e-02, grad_scale: 8.0 2024-09-16 18:38:16,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=6766.666666666667, ans=0.23233333333333334 2024-09-16 18:38:22,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=6766.666666666667, ans=0.03847222222222223 2024-09-16 18:38:48,614 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.39 vs. limit=10.0725 2024-09-16 18:38:50,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.63 vs. limit=6.715 2024-09-16 18:38:56,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=6860.0, ans=0.17843750000000003 2024-09-16 18:39:24,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=10.1075 2024-09-16 18:39:43,341 INFO [train.py:1198] (1/2) Epoch 1, batch 1500, loss[loss=0.5363, simple_loss=0.4625, pruned_loss=0.246, ctc_loss=0.4767, cr_loss=0.4412, over 34473.00 frames. ], tot_loss[loss=0.5704, simple_loss=0.4751, pruned_loss=0.2805, ctc_loss=0.5279, cr_loss=0.4475, over 6774389.41 frames. ], batch size: 100, lr: 4.46e-02, grad_scale: 8.0 2024-09-16 18:39:51,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.46 vs. limit=10.125 2024-09-16 18:40:01,775 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.19 vs. limit=12.785 2024-09-16 18:40:02,610 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.809e+02 3.655e+02 4.954e+02 7.122e+02 1.604e+03, threshold=9.907e+02, percent-clipped=1.0 2024-09-16 18:40:43,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7140.0, ans=0.2286 2024-09-16 18:40:46,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=7140.0, ans=0.2286 2024-09-16 18:40:52,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.90 vs. limit=6.796666666666667 2024-09-16 18:41:10,387 INFO [train.py:1198] (1/2) Epoch 1, batch 1550, loss[loss=0.532, simple_loss=0.4597, pruned_loss=0.243, ctc_loss=0.4721, cr_loss=0.4724, over 34438.00 frames. ], tot_loss[loss=0.56, simple_loss=0.469, pruned_loss=0.2722, ctc_loss=0.5134, cr_loss=0.4516, over 6746287.87 frames. ], batch size: 105, lr: 4.45e-02, grad_scale: 8.0 2024-09-16 18:41:23,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.41 vs. limit=10.2125 2024-09-16 18:41:43,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=7326.666666666667, ans=0.03613888888888889 2024-09-16 18:41:57,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.75 vs. limit=10.2475 2024-09-16 18:42:00,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=7373.333333333333, ans=0.15437499999999998 2024-09-16 18:42:08,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.82 vs. limit=8.686666666666667 2024-09-16 18:42:09,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=7373.333333333333, ans=0.6419333333333334 2024-09-16 18:42:13,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=10.265 2024-09-16 18:42:20,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.99 vs. limit=10.2825 2024-09-16 18:42:32,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.11 vs. limit=8.71 2024-09-16 18:42:37,500 INFO [train.py:1198] (1/2) Epoch 1, batch 1600, loss[loss=0.5287, simple_loss=0.4587, pruned_loss=0.2413, ctc_loss=0.458, cr_loss=0.4892, over 34578.00 frames. ], tot_loss[loss=0.5501, simple_loss=0.4634, pruned_loss=0.2645, ctc_loss=0.4993, cr_loss=0.455, over 6725886.29 frames. ], batch size: 99, lr: 4.45e-02, grad_scale: 8.0 2024-09-16 18:42:58,179 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.747e+02 3.983e+02 4.810e+02 6.785e+02 1.550e+03, threshold=9.620e+02, percent-clipped=12.0 2024-09-16 18:43:01,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=7513.333333333333, ans=0.035361111111111114 2024-09-16 18:43:11,175 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.674e-02 2024-09-16 18:43:44,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=7606.666666666667, ans=10.3525 2024-09-16 18:44:00,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=7653.333333333333, ans=0.14125 2024-09-16 18:44:04,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=13.24 2024-09-16 18:44:08,577 INFO [train.py:1198] (1/2) Epoch 1, batch 1650, loss[loss=0.5364, simple_loss=0.4624, pruned_loss=0.2475, ctc_loss=0.4744, cr_loss=0.4397, over 34370.00 frames. ], tot_loss[loss=0.54, simple_loss=0.4577, pruned_loss=0.2569, ctc_loss=0.4861, cr_loss=0.4573, over 6719532.00 frames. ], batch size: 103, lr: 4.45e-02, grad_scale: 8.0 2024-09-16 18:44:10,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7700.0, ans=0.223 2024-09-16 18:44:24,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=7746.666666666667, ans=0.2225333333333333 2024-09-16 18:44:25,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.42 vs. limit=10.405 2024-09-16 18:44:28,246 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.09 vs. limit=10.405 2024-09-16 18:44:37,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=7746.666666666667, ans=0.034388888888888886 2024-09-16 18:44:49,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=7793.333333333333, ans=0.6272333333333333 2024-09-16 18:45:06,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.64 vs. limit=13.379999999999999 2024-09-16 18:45:22,296 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 18:45:33,700 INFO [train.py:1198] (1/2) Epoch 1, batch 1700, loss[loss=0.4579, simple_loss=0.4018, pruned_loss=0.2048, ctc_loss=0.4023, cr_loss=0.471, over 34281.00 frames. ], tot_loss[loss=0.5295, simple_loss=0.4521, pruned_loss=0.2491, ctc_loss=0.4729, cr_loss=0.4608, over 6744920.51 frames. ], batch size: 80, lr: 4.44e-02, grad_scale: 8.0 2024-09-16 18:45:42,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=7933.333333333333, ans=0.128125 2024-09-16 18:45:54,076 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.484e+02 3.380e+02 4.226e+02 5.703e+02 1.244e+03, threshold=8.453e+02, percent-clipped=3.0 2024-09-16 18:45:58,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=7980.0, ans=0.03341666666666667 2024-09-16 18:46:01,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=7980.0, ans=0.2202 2024-09-16 18:46:02,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.09 vs. limit=6.995 2024-09-16 18:46:13,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=8026.666666666667, ans=0.00912463768115942 2024-09-16 18:46:27,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=26.12 vs. limit=13.555 2024-09-16 18:46:58,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.31 vs. limit=4.225 2024-09-16 18:46:58,808 INFO [train.py:1198] (1/2) Epoch 1, batch 1750, loss[loss=0.4558, simple_loss=0.4009, pruned_loss=0.2062, ctc_loss=0.3896, cr_loss=0.4173, over 34148.00 frames. ], tot_loss[loss=0.521, simple_loss=0.4473, pruned_loss=0.243, ctc_loss=0.4618, cr_loss=0.4627, over 6754725.71 frames. ], batch size: 78, lr: 4.44e-02, grad_scale: 8.0 2024-09-16 18:47:06,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=8166.666666666667, ans=0.6141666666666667 2024-09-16 18:47:08,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=8166.666666666667, ans=0.125 2024-09-16 18:47:08,711 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.11 vs. limit=13.625 2024-09-16 18:47:23,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=10.58 2024-09-16 18:47:40,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=8260.0, ans=0.125 2024-09-16 18:47:43,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=8260.0, ans=0.00907391304347826 2024-09-16 18:47:45,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.86 vs. limit=13.695 2024-09-16 18:47:48,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=8260.0, ans=0.05 2024-09-16 18:48:05,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=8306.666666666666, ans=0.6092666666666667 2024-09-16 18:48:29,291 INFO [train.py:1198] (1/2) Epoch 1, batch 1800, loss[loss=0.4612, simple_loss=0.421, pruned_loss=0.1988, ctc_loss=0.3883, cr_loss=0.4948, over 34691.00 frames. ], tot_loss[loss=0.5139, simple_loss=0.4438, pruned_loss=0.2379, ctc_loss=0.4526, cr_loss=0.4657, over 6755899.42 frames. ], batch size: 97, lr: 4.44e-02, grad_scale: 8.0 2024-09-16 18:48:49,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.509e+02 3.288e+02 4.725e+02 6.185e+02 1.010e+03, threshold=9.450e+02, percent-clipped=3.0 2024-09-16 18:48:55,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=4.2669999999999995 2024-09-16 18:48:58,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=8446.666666666666, ans=0.125 2024-09-16 18:49:25,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=8540.0, ans=0.05 2024-09-16 18:49:54,267 INFO [train.py:1198] (1/2) Epoch 1, batch 1850, loss[loss=0.4928, simple_loss=0.4352, pruned_loss=0.2226, ctc_loss=0.4025, cr_loss=0.55, over 34456.00 frames. ], tot_loss[loss=0.5066, simple_loss=0.4401, pruned_loss=0.2328, ctc_loss=0.4431, cr_loss=0.4678, over 6763980.90 frames. ], batch size: 100, lr: 4.43e-02, grad_scale: 8.0 2024-09-16 18:50:31,147 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.77 vs. limit=9.363333333333333 2024-09-16 18:50:36,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.91 vs. limit=4.309 2024-09-16 18:50:42,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=8726.666666666666, ans=0.21273333333333333 2024-09-16 18:50:47,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=8773.333333333334, ans=0.03011111111111111 2024-09-16 18:50:55,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.58 vs. limit=10.79 2024-09-16 18:51:04,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=8820.0, ans=0.5913 2024-09-16 18:51:09,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=8820.0, ans=0.008952173913043478 2024-09-16 18:51:18,507 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.91 vs. limit=10.807500000000001 2024-09-16 18:51:23,191 INFO [train.py:1198] (1/2) Epoch 1, batch 1900, loss[loss=0.4979, simple_loss=0.4403, pruned_loss=0.2257, ctc_loss=0.4133, cr_loss=0.4948, over 34683.00 frames. ], tot_loss[loss=0.5013, simple_loss=0.4378, pruned_loss=0.2291, ctc_loss=0.4356, cr_loss=0.4698, over 6773124.57 frames. ], batch size: 104, lr: 4.43e-02, grad_scale: 8.0 2024-09-16 18:51:31,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=8866.666666666666, ans=0.11798333333333334 2024-09-16 18:51:37,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8866.666666666666, ans=0.21133333333333332 2024-09-16 18:51:43,581 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.783e+02 3.693e+02 4.592e+02 5.852e+02 1.819e+03, threshold=9.183e+02, percent-clipped=10.0 2024-09-16 18:51:47,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=8913.333333333334, ans=0.029527777777777778 2024-09-16 18:51:52,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=8913.333333333334, ans=0.21086666666666665 2024-09-16 18:52:15,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=9006.666666666666, ans=0.125 2024-09-16 18:52:24,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=10.8775 2024-09-16 18:52:47,498 INFO [train.py:1198] (1/2) Epoch 1, batch 1950, loss[loss=0.4534, simple_loss=0.4103, pruned_loss=0.1991, ctc_loss=0.3879, cr_loss=0.4882, over 34358.00 frames. ], tot_loss[loss=0.4969, simple_loss=0.4367, pruned_loss=0.2257, ctc_loss=0.4297, cr_loss=0.474, over 6789760.25 frames. ], batch size: 91, lr: 4.43e-02, grad_scale: 8.0 2024-09-16 18:52:53,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=9100.0, ans=0.159 2024-09-16 18:52:53,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=9100.0, ans=0.125 2024-09-16 18:52:58,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=9100.0, ans=0.025 2024-09-16 18:53:15,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=9146.666666666666, ans=0.5798666666666668 2024-09-16 18:53:25,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=9193.333333333334, ans=0.125 2024-09-16 18:53:37,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=9240.0, ans=0.125 2024-09-16 18:53:44,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=10.965 2024-09-16 18:53:48,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=9240.0, ans=0.125 2024-09-16 18:53:49,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.31 vs. limit=7.3100000000000005 2024-09-16 18:53:55,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9286.666666666666, ans=0.20713333333333334 2024-09-16 18:53:58,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=9286.666666666666, ans=0.035 2024-09-16 18:54:12,864 INFO [train.py:1198] (1/2) Epoch 1, batch 2000, loss[loss=0.407, simple_loss=0.3758, pruned_loss=0.1756, ctc_loss=0.3422, cr_loss=0.4636, over 34145.00 frames. ], tot_loss[loss=0.4922, simple_loss=0.4347, pruned_loss=0.2226, ctc_loss=0.4239, cr_loss=0.4753, over 6764280.65 frames. ], batch size: 78, lr: 4.42e-02, grad_scale: 16.0 2024-09-16 18:54:29,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=11.0175 2024-09-16 18:54:35,799 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.626e+02 3.532e+02 4.361e+02 6.352e+02 1.553e+03, threshold=8.723e+02, percent-clipped=7.0 2024-09-16 18:54:44,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=9380.0, ans=0.125 2024-09-16 18:54:44,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=9380.0, ans=0.027583333333333335 2024-09-16 18:54:50,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.07 vs. limit=14.57 2024-09-16 18:55:03,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9426.666666666666, ans=0.20573333333333332 2024-09-16 18:55:14,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.36 vs. limit=9.736666666666668 2024-09-16 18:55:15,371 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=14.605 2024-09-16 18:55:28,631 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.47 vs. limit=14.64 2024-09-16 18:55:34,110 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.33 vs. limit=5.904 2024-09-16 18:55:42,933 INFO [train.py:1198] (1/2) Epoch 1, batch 2050, loss[loss=0.4015, simple_loss=0.3721, pruned_loss=0.1731, ctc_loss=0.3361, cr_loss=0.4383, over 34474.00 frames. ], tot_loss[loss=0.4864, simple_loss=0.4311, pruned_loss=0.2194, ctc_loss=0.4169, cr_loss=0.475, over 6756087.72 frames. ], batch size: 82, lr: 4.42e-02, grad_scale: 8.0 2024-09-16 18:55:54,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.31 vs. limit=7.391666666666667 2024-09-16 18:56:13,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=9613.333333333334, ans=0.025 2024-09-16 18:56:19,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=9660.0, ans=0.125 2024-09-16 18:56:21,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.28 vs. limit=11.1225 2024-09-16 18:56:48,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.04 vs. limit=14.780000000000001 2024-09-16 18:56:53,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.83 vs. limit=14.815000000000001 2024-09-16 18:57:04,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=9753.333333333334, ans=0.5586333333333333 2024-09-16 18:57:07,698 INFO [train.py:1198] (1/2) Epoch 1, batch 2100, loss[loss=0.51, simple_loss=0.4467, pruned_loss=0.2349, ctc_loss=0.4216, cr_loss=0.4806, over 34537.00 frames. ], tot_loss[loss=0.4805, simple_loss=0.4283, pruned_loss=0.2156, ctc_loss=0.4103, cr_loss=0.4752, over 6769897.75 frames. ], batch size: 94, lr: 4.42e-02, grad_scale: 8.0 2024-09-16 18:57:18,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=9800.0, ans=0.5569999999999999 2024-09-16 18:57:29,687 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.494e+02 3.185e+02 4.318e+02 6.245e+02 1.503e+03, threshold=8.635e+02, percent-clipped=11.0 2024-09-16 18:57:30,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=9846.666666666666, ans=0.125 2024-09-16 18:57:35,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=9846.666666666666, ans=0.125 2024-09-16 18:57:41,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=9893.333333333334, ans=0.125 2024-09-16 18:57:53,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=9893.333333333334, ans=0.5537333333333334 2024-09-16 18:57:56,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=9940.0, ans=0.125 2024-09-16 18:58:34,303 INFO [train.py:1198] (1/2) Epoch 1, batch 2150, loss[loss=0.4232, simple_loss=0.3931, pruned_loss=0.1819, ctc_loss=0.3508, cr_loss=0.4829, over 34329.00 frames. ], tot_loss[loss=0.4735, simple_loss=0.4245, pruned_loss=0.2114, ctc_loss=0.4023, cr_loss=0.4756, over 6789408.59 frames. ], batch size: 91, lr: 4.41e-02, grad_scale: 8.0 2024-09-16 18:58:38,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=10033.333333333334, ans=0.07 2024-09-16 18:58:53,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=10080.0, ans=0.125 2024-09-16 18:58:55,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=10080.0, ans=0.5472 2024-09-16 18:58:56,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=10080.0, ans=0.008678260869565217 2024-09-16 18:58:58,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=10080.0, ans=0.125 2024-09-16 18:59:02,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.14 vs. limit=7.52 2024-09-16 18:59:02,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.59 vs. limit=11.28 2024-09-16 18:59:15,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=10126.666666666666, ans=0.19873333333333332 2024-09-16 18:59:22,122 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=7.744e-03 2024-09-16 18:59:28,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=10173.333333333334, ans=0.19826666666666665 2024-09-16 18:59:59,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=10220.0, ans=0.024083333333333335 2024-09-16 19:00:02,961 INFO [train.py:1198] (1/2) Epoch 1, batch 2200, loss[loss=0.4663, simple_loss=0.4335, pruned_loss=0.2005, ctc_loss=0.3874, cr_loss=0.5171, over 34445.00 frames. ], tot_loss[loss=0.4697, simple_loss=0.4228, pruned_loss=0.2088, ctc_loss=0.3977, cr_loss=0.4766, over 6783976.00 frames. ], batch size: 100, lr: 4.41e-02, grad_scale: 8.0 2024-09-16 19:00:13,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=10266.666666666666, ans=0.125 2024-09-16 19:00:23,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=10313.333333333334, ans=0.125 2024-09-16 19:00:23,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.95 vs. limit=15.235 2024-09-16 19:00:24,868 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.579e+02 3.438e+02 4.198e+02 5.699e+02 1.211e+03, threshold=8.395e+02, percent-clipped=8.0 2024-09-16 19:00:37,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=11.385 2024-09-16 19:01:02,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.61 vs. limit=11.4025 2024-09-16 19:01:09,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=10453.333333333334, ans=0.07 2024-09-16 19:01:27,815 INFO [train.py:1198] (1/2) Epoch 1, batch 2250, loss[loss=0.4427, simple_loss=0.4102, pruned_loss=0.1913, ctc_loss=0.3666, cr_loss=0.4843, over 34424.00 frames. ], tot_loss[loss=0.4662, simple_loss=0.4211, pruned_loss=0.2067, ctc_loss=0.393, cr_loss=0.4762, over 6781783.27 frames. ], batch size: 95, lr: 4.40e-02, grad_scale: 8.0 2024-09-16 19:02:02,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=10593.333333333334, ans=0.035 2024-09-16 19:02:09,978 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=15.445 2024-09-16 19:02:24,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=10640.0, ans=0.1436 2024-09-16 19:02:44,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=11.5075 2024-09-16 19:02:47,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=10686.666666666666, ans=11.5075 2024-09-16 19:02:55,081 INFO [train.py:1198] (1/2) Epoch 1, batch 2300, loss[loss=0.4153, simple_loss=0.3801, pruned_loss=0.1814, ctc_loss=0.3442, cr_loss=0.4721, over 34256.00 frames. ], tot_loss[loss=0.4609, simple_loss=0.4178, pruned_loss=0.2037, ctc_loss=0.387, cr_loss=0.475, over 6766813.25 frames. ], batch size: 83, lr: 4.40e-02, grad_scale: 8.0 2024-09-16 19:03:19,597 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.506e+02 3.649e+02 4.474e+02 6.158e+02 1.975e+03, threshold=8.948e+02, percent-clipped=8.0 2024-09-16 19:03:40,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=10826.666666666666, ans=0.125 2024-09-16 19:03:40,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=10826.666666666666, ans=0.0 2024-09-16 19:03:58,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=10873.333333333334, ans=0.125 2024-09-16 19:04:21,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.41 vs. limit=11.6125 2024-09-16 19:04:22,213 INFO [train.py:1198] (1/2) Epoch 1, batch 2350, loss[loss=0.4507, simple_loss=0.4182, pruned_loss=0.1948, ctc_loss=0.3696, cr_loss=0.495, over 34718.00 frames. ], tot_loss[loss=0.4574, simple_loss=0.4162, pruned_loss=0.2014, ctc_loss=0.3829, cr_loss=0.4758, over 6773747.75 frames. ], batch size: 97, lr: 4.40e-02, grad_scale: 8.0 2024-09-16 19:04:24,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=10966.666666666666, ans=0.125 2024-09-16 19:04:32,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=10966.666666666666, ans=0.020972222222222225 2024-09-16 19:04:40,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11013.333333333334, ans=0.18986666666666668 2024-09-16 19:04:56,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=11060.0, ans=0.008465217391304347 2024-09-16 19:05:11,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=11106.666666666666, ans=0.5112666666666668 2024-09-16 19:05:18,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.67 vs. limit=7.776666666666666 2024-09-16 19:05:22,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=11106.666666666666, ans=0.125 2024-09-16 19:05:23,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.46 vs. limit=4.666 2024-09-16 19:05:47,019 INFO [train.py:1198] (1/2) Epoch 1, batch 2400, loss[loss=0.4158, simple_loss=0.3894, pruned_loss=0.1781, ctc_loss=0.3445, cr_loss=0.4283, over 34602.00 frames. ], tot_loss[loss=0.4552, simple_loss=0.4155, pruned_loss=0.1999, ctc_loss=0.3798, cr_loss=0.4767, over 6777885.82 frames. ], batch size: 89, lr: 4.39e-02, grad_scale: 16.0 2024-09-16 19:06:09,132 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.582e+02 3.308e+02 4.068e+02 5.169e+02 1.111e+03, threshold=8.136e+02, percent-clipped=5.0 2024-09-16 19:06:37,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=11293.333333333334, ans=0.0 2024-09-16 19:06:59,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=11386.666666666666, ans=0.125 2024-09-16 19:07:17,111 INFO [train.py:1198] (1/2) Epoch 1, batch 2450, loss[loss=0.4546, simple_loss=0.4162, pruned_loss=0.1989, ctc_loss=0.3821, cr_loss=0.4664, over 34422.00 frames. ], tot_loss[loss=0.455, simple_loss=0.4161, pruned_loss=0.1995, ctc_loss=0.3786, cr_loss=0.478, over 6751231.69 frames. ], batch size: 95, lr: 4.39e-02, grad_scale: 16.0 2024-09-16 19:07:21,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.56 vs. limit=10.716666666666667 2024-09-16 19:07:41,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=11480.0, ans=0.09899494936611666 2024-09-16 19:08:04,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=11526.666666666666, ans=0.4965666666666667 2024-09-16 19:08:16,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=11573.333333333334, ans=0.125 2024-09-16 19:08:20,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=11573.333333333334, ans=0.018444444444444444 2024-09-16 19:08:30,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=11620.0, ans=0.125 2024-09-16 19:08:32,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=11620.0, ans=0.1838 2024-09-16 19:08:42,017 INFO [train.py:1198] (1/2) Epoch 1, batch 2500, loss[loss=0.4415, simple_loss=0.416, pruned_loss=0.1879, ctc_loss=0.3583, cr_loss=0.4861, over 34469.00 frames. ], tot_loss[loss=0.4517, simple_loss=0.4144, pruned_loss=0.1974, ctc_loss=0.3746, cr_loss=0.4787, over 6762174.82 frames. ], batch size: 100, lr: 4.38e-02, grad_scale: 16.0 2024-09-16 19:08:49,230 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:09:03,989 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.772e+02 3.771e+02 4.879e+02 6.727e+02 1.083e+03, threshold=9.757e+02, percent-clipped=7.0 2024-09-16 19:09:11,721 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.75 vs. limit=11.8925 2024-09-16 19:09:17,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=11760.0, ans=0.00831304347826087 2024-09-16 19:09:18,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.69 vs. limit=11.91 2024-09-16 19:09:22,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=11760.0, ans=0.125 2024-09-16 19:09:23,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=11760.0, ans=0.4884 2024-09-16 19:09:34,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=11806.666666666666, ans=0.125 2024-09-16 19:09:46,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=11806.666666666666, ans=0.008302898550724637 2024-09-16 19:09:50,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.54 vs. limit=7.963333333333333 2024-09-16 19:09:54,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.68 vs. limit=10.926666666666666 2024-09-16 19:10:00,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=11853.333333333334, ans=0.4851333333333333 2024-09-16 19:10:08,910 INFO [train.py:1198] (1/2) Epoch 1, batch 2550, loss[loss=0.3825, simple_loss=0.3634, pruned_loss=0.1607, ctc_loss=0.3039, cr_loss=0.4839, over 34191.00 frames. ], tot_loss[loss=0.448, simple_loss=0.4123, pruned_loss=0.1952, ctc_loss=0.3706, cr_loss=0.479, over 6766405.45 frames. ], batch size: 78, lr: 4.38e-02, grad_scale: 16.0 2024-09-16 19:10:19,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=11900.0, ans=0.01708333333333334 2024-09-16 19:10:45,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.54 vs. limit=11.997499999999999 2024-09-16 19:11:24,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.82 vs. limit=16.564999999999998 2024-09-16 19:11:34,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=12133.333333333334, ans=0.17866666666666667 2024-09-16 19:11:36,607 INFO [train.py:1198] (1/2) Epoch 1, batch 2600, loss[loss=0.4363, simple_loss=0.4078, pruned_loss=0.1875, ctc_loss=0.3449, cr_loss=0.5211, over 34366.00 frames. ], tot_loss[loss=0.4468, simple_loss=0.4122, pruned_loss=0.1942, ctc_loss=0.3686, cr_loss=0.4807, over 6761474.32 frames. ], batch size: 91, lr: 4.37e-02, grad_scale: 16.0 2024-09-16 19:11:41,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=12133.333333333334, ans=0.47533333333333333 2024-09-16 19:11:58,344 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.588e+02 3.503e+02 4.103e+02 5.805e+02 1.212e+03, threshold=8.206e+02, percent-clipped=4.0 2024-09-16 19:11:59,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.21 vs. limit=8.045 2024-09-16 19:12:19,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.01 vs. limit=8.890666666666666 2024-09-16 19:12:27,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=12273.333333333334, ans=0.125 2024-09-16 19:12:32,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=12273.333333333334, ans=0.125 2024-09-16 19:12:46,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=12.120000000000001 2024-09-16 19:12:53,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=12320.0, ans=0.125 2024-09-16 19:12:58,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.34 vs. limit=8.08 2024-09-16 19:13:00,427 INFO [train.py:1198] (1/2) Epoch 1, batch 2650, loss[loss=0.4574, simple_loss=0.4271, pruned_loss=0.1962, ctc_loss=0.382, cr_loss=0.4725, over 34244.00 frames. ], tot_loss[loss=0.4439, simple_loss=0.411, pruned_loss=0.1923, ctc_loss=0.365, cr_loss=0.481, over 6769913.28 frames. ], batch size: 117, lr: 4.37e-02, grad_scale: 16.0 2024-09-16 19:13:04,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.56 vs. limit=16.775 2024-09-16 19:13:07,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=12366.666666666666, ans=0.46716666666666673 2024-09-16 19:13:56,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.56 vs. limit=16.880000000000003 2024-09-16 19:14:05,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.04 vs. limit=11.253333333333334 2024-09-16 19:14:12,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.54 vs. limit=8.138333333333334 2024-09-16 19:14:14,184 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=18.14 vs. limit=11.276666666666667 2024-09-16 19:14:21,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.11 vs. limit=16.915 2024-09-16 19:14:26,862 INFO [train.py:1198] (1/2) Epoch 1, batch 2700, loss[loss=0.4381, simple_loss=0.4122, pruned_loss=0.1866, ctc_loss=0.3535, cr_loss=0.5051, over 34626.00 frames. ], tot_loss[loss=0.442, simple_loss=0.41, pruned_loss=0.1912, ctc_loss=0.3624, cr_loss=0.4816, over 6765618.23 frames. ], batch size: 102, lr: 4.36e-02, grad_scale: 16.0 2024-09-16 19:14:32,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=12600.0, ans=0.459 2024-09-16 19:14:36,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.64 vs. limit=4.89 2024-09-16 19:14:37,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=12600.0, ans=0.389 2024-09-16 19:14:51,519 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.642e+02 3.362e+02 4.371e+02 6.586e+02 1.306e+03, threshold=8.741e+02, percent-clipped=11.0 2024-09-16 19:15:28,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=12740.0, ans=0.05 2024-09-16 19:15:30,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=12740.0, ans=0.4541 2024-09-16 19:15:41,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=12786.666666666666, ans=12.295 2024-09-16 19:15:53,643 INFO [train.py:1198] (1/2) Epoch 1, batch 2750, loss[loss=0.4248, simple_loss=0.3973, pruned_loss=0.1826, ctc_loss=0.3413, cr_loss=0.4706, over 34628.00 frames. ], tot_loss[loss=0.4372, simple_loss=0.4067, pruned_loss=0.1885, ctc_loss=0.3574, cr_loss=0.4799, over 6762757.78 frames. ], batch size: 88, lr: 4.36e-02, grad_scale: 16.0 2024-09-16 19:16:10,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=12880.0, ans=0.8788 2024-09-16 19:16:12,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=12880.0, ans=0.0 2024-09-16 19:16:34,751 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.47 vs. limit=9.170666666666666 2024-09-16 19:16:36,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.29 vs. limit=17.195 2024-09-16 19:16:49,916 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.32 vs. limit=12.365 2024-09-16 19:17:01,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=13020.0, ans=0.44430000000000003 2024-09-16 19:17:05,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.26 vs. limit=8.254999999999999 2024-09-16 19:17:16,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=13066.666666666666, ans=0.125 2024-09-16 19:17:18,983 INFO [train.py:1198] (1/2) Epoch 1, batch 2800, loss[loss=0.5175, simple_loss=0.4549, pruned_loss=0.2356, ctc_loss=0.4433, cr_loss=0.5054, over 23094.00 frames. ], tot_loss[loss=0.4355, simple_loss=0.4057, pruned_loss=0.1875, ctc_loss=0.3554, cr_loss=0.4796, over 6740217.83 frames. ], batch size: 244, lr: 4.36e-02, grad_scale: 32.0 2024-09-16 19:17:19,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=13066.666666666666, ans=0.16933333333333334 2024-09-16 19:17:33,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=13066.666666666666, ans=0.008028985507246377 2024-09-16 19:17:42,974 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.545e+02 3.487e+02 4.175e+02 5.677e+02 1.653e+03, threshold=8.350e+02, percent-clipped=14.0 2024-09-16 19:17:55,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=13160.0, ans=0.3974 2024-09-16 19:18:33,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=13253.333333333334, ans=0.011444444444444438 2024-09-16 19:18:46,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=13300.0, ans=0.16699999999999998 2024-09-16 19:18:47,553 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=7.43 vs. limit=9.32 2024-09-16 19:18:48,314 INFO [train.py:1198] (1/2) Epoch 1, batch 2850, loss[loss=0.404, simple_loss=0.3856, pruned_loss=0.1695, ctc_loss=0.3263, cr_loss=0.4575, over 34475.00 frames. ], tot_loss[loss=0.4366, simple_loss=0.4066, pruned_loss=0.1881, ctc_loss=0.3559, cr_loss=0.4814, over 6723217.14 frames. ], batch size: 90, lr: 4.35e-02, grad_scale: 32.0 2024-09-16 19:19:05,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=13346.666666666666, ans=0.125 2024-09-16 19:19:07,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=13346.666666666666, ans=0.04949747468305833 2024-09-16 19:19:07,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.42 vs. limit=8.336666666666666 2024-09-16 19:19:09,506 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.24 vs. limit=17.509999999999998 2024-09-16 19:19:12,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=13346.666666666666, ans=0.43286666666666673 2024-09-16 19:19:18,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=13346.666666666666, ans=0.007968115942028986 2024-09-16 19:19:18,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=13346.666666666666, ans=0.011055555555555562 2024-09-16 19:19:22,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=13393.333333333334, ans=0.16606666666666667 2024-09-16 19:19:45,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=13440.0, ans=0.125 2024-09-16 19:19:47,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=13440.0, ans=0.007947826086956522 2024-09-16 19:19:50,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=13440.0, ans=0.125 2024-09-16 19:19:54,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=13486.666666666666, ans=0.0 2024-09-16 19:19:57,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=13486.666666666666, ans=0.125 2024-09-16 19:19:59,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=13486.666666666666, ans=0.01047222222222223 2024-09-16 19:20:10,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=13533.333333333334, ans=0.42633333333333334 2024-09-16 19:20:12,109 INFO [train.py:1198] (1/2) Epoch 1, batch 2900, loss[loss=0.4061, simple_loss=0.3926, pruned_loss=0.1682, ctc_loss=0.3262, cr_loss=0.4481, over 34531.00 frames. ], tot_loss[loss=0.4345, simple_loss=0.406, pruned_loss=0.1865, ctc_loss=0.353, cr_loss=0.4825, over 6753415.74 frames. ], batch size: 94, lr: 4.35e-02, grad_scale: 32.0 2024-09-16 19:20:24,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=13533.333333333334, ans=0.07 2024-09-16 19:20:28,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.74 vs. limit=17.685000000000002 2024-09-16 19:20:34,057 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.572e+02 3.381e+02 4.173e+02 5.679e+02 9.194e+02, threshold=8.346e+02, percent-clipped=2.0 2024-09-16 19:20:39,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=13580.0, ans=0.125 2024-09-16 19:20:44,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=13626.666666666666, ans=0.009888888888888892 2024-09-16 19:20:54,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=13626.666666666666, ans=0.125 2024-09-16 19:20:57,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=13626.666666666666, ans=0.125 2024-09-16 19:21:20,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=13720.0, ans=0.4198 2024-09-16 19:21:20,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=12.645 2024-09-16 19:21:38,481 INFO [train.py:1198] (1/2) Epoch 1, batch 2950, loss[loss=0.3899, simple_loss=0.3748, pruned_loss=0.1624, ctc_loss=0.3114, cr_loss=0.4466, over 34650.00 frames. ], tot_loss[loss=0.4313, simple_loss=0.4038, pruned_loss=0.1848, ctc_loss=0.3499, cr_loss=0.4803, over 6748115.18 frames. ], batch size: 88, lr: 4.34e-02, grad_scale: 32.0 2024-09-16 19:22:14,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=13860.0, ans=0.41490000000000005 2024-09-16 19:22:19,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=9.71 vs. limit=11.93 2024-09-16 19:22:31,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=13906.666666666666, ans=0.008722222222222228 2024-09-16 19:23:07,398 INFO [train.py:1198] (1/2) Epoch 1, batch 3000, loss[loss=0.4173, simple_loss=0.3942, pruned_loss=0.1775, ctc_loss=0.3321, cr_loss=0.4742, over 34548.00 frames. ], tot_loss[loss=0.4295, simple_loss=0.4028, pruned_loss=0.1837, ctc_loss=0.3475, cr_loss=0.4797, over 6749994.74 frames. ], batch size: 94, lr: 4.34e-02, grad_scale: 32.0 2024-09-16 19:23:07,398 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 19:23:24,296 INFO [train.py:1230] (1/2) Epoch 1, validation: loss=0.2404, simple_loss=0.3228, pruned_loss=0.06506, ctc_loss=0.1396, cr_loss=1.36e-14, over 944034.00 frames. 2024-09-16 19:23:24,296 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-16 19:23:46,338 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.877e+02 3.432e+02 4.516e+02 5.542e+02 1.079e+03, threshold=9.031e+02, percent-clipped=8.0 2024-09-16 19:24:29,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=12.82 2024-09-16 19:24:40,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.29 vs. limit=5.128 2024-09-16 19:24:42,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=14186.666666666666, ans=0.007555555555555558 2024-09-16 19:24:47,770 INFO [train.py:1198] (1/2) Epoch 1, batch 3050, loss[loss=0.3703, simple_loss=0.3652, pruned_loss=0.1508, ctc_loss=0.2876, cr_loss=0.4106, over 34590.00 frames. ], tot_loss[loss=0.4279, simple_loss=0.4021, pruned_loss=0.1827, ctc_loss=0.3454, cr_loss=0.4797, over 6742595.37 frames. ], batch size: 89, lr: 4.33e-02, grad_scale: 32.0 2024-09-16 19:24:48,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=14233.333333333334, ans=0.025 2024-09-16 19:25:01,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=14233.333333333334, ans=10.0 2024-09-16 19:25:07,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=14280.0, ans=10.0 2024-09-16 19:25:52,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=14373.333333333334, ans=0.09899494936611666 2024-09-16 19:26:02,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=14420.0, ans=0.125 2024-09-16 19:26:11,825 INFO [train.py:1198] (1/2) Epoch 1, batch 3100, loss[loss=0.4687, simple_loss=0.431, pruned_loss=0.2052, ctc_loss=0.3773, cr_loss=0.5106, over 34199.00 frames. ], tot_loss[loss=0.4256, simple_loss=0.4007, pruned_loss=0.1814, ctc_loss=0.3426, cr_loss=0.4795, over 6741766.30 frames. ], batch size: 117, lr: 4.33e-02, grad_scale: 32.0 2024-09-16 19:26:17,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=14466.666666666666, ans=0.025 2024-09-16 19:26:23,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=14466.666666666666, ans=0.125 2024-09-16 19:26:33,170 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.687e+02 3.494e+02 4.492e+02 6.092e+02 1.184e+03, threshold=8.983e+02, percent-clipped=7.0 2024-09-16 19:27:34,497 INFO [train.py:1198] (1/2) Epoch 1, batch 3150, loss[loss=0.4481, simple_loss=0.4242, pruned_loss=0.1899, ctc_loss=0.3628, cr_loss=0.4931, over 33786.00 frames. ], tot_loss[loss=0.4237, simple_loss=0.3998, pruned_loss=0.1801, ctc_loss=0.3404, cr_loss=0.48, over 6747624.23 frames. ], batch size: 122, lr: 4.32e-02, grad_scale: 32.0 2024-09-16 19:27:38,248 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=2.552e-03 2024-09-16 19:28:01,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=14746.666666666666, ans=0.125 2024-09-16 19:28:16,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=14793.333333333334, ans=0.3822333333333333 2024-09-16 19:28:16,839 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.22 vs. limit=5.218999999999999 2024-09-16 19:28:21,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=14793.333333333334, ans=0.3822333333333333 2024-09-16 19:28:28,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=5.226 2024-09-16 19:28:45,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.82 vs. limit=5.2330000000000005 2024-09-16 19:28:59,672 INFO [train.py:1198] (1/2) Epoch 1, batch 3200, loss[loss=0.4103, simple_loss=0.3982, pruned_loss=0.1697, ctc_loss=0.3214, cr_loss=0.4667, over 34563.00 frames. ], tot_loss[loss=0.4218, simple_loss=0.3985, pruned_loss=0.1791, ctc_loss=0.3382, cr_loss=0.4792, over 6760983.71 frames. ], batch size: 94, lr: 4.32e-02, grad_scale: 32.0 2024-09-16 19:29:00,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.40 vs. limit=13.1 2024-09-16 19:29:21,253 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.535e+02 3.587e+02 4.330e+02 5.406e+02 1.089e+03, threshold=8.660e+02, percent-clipped=2.0 2024-09-16 19:29:26,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=14980.0, ans=0.0 2024-09-16 19:29:28,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=14980.0, ans=0.025 2024-09-16 19:29:41,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=15026.666666666666, ans=0.125 2024-09-16 19:29:46,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=15026.666666666666, ans=0.37406666666666677 2024-09-16 19:29:54,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=15073.333333333334, ans=0.125 2024-09-16 19:30:01,338 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:30:04,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=15120.0, ans=0.0036666666666666722 2024-09-16 19:30:12,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=15120.0, ans=0.125 2024-09-16 19:30:22,573 INFO [train.py:1198] (1/2) Epoch 1, batch 3250, loss[loss=0.4277, simple_loss=0.4084, pruned_loss=0.1796, ctc_loss=0.3405, cr_loss=0.4929, over 34684.00 frames. ], tot_loss[loss=0.4209, simple_loss=0.3984, pruned_loss=0.1784, ctc_loss=0.3368, cr_loss=0.4807, over 6770563.15 frames. ], batch size: 98, lr: 4.31e-02, grad_scale: 32.0 2024-09-16 19:30:41,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.64 vs. limit=12.606666666666667 2024-09-16 19:31:20,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=15306.666666666666, ans=0.125 2024-09-16 19:31:23,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.63 vs. limit=18.98 2024-09-16 19:31:29,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=15353.333333333334, ans=0.36263333333333336 2024-09-16 19:31:45,788 INFO [train.py:1198] (1/2) Epoch 1, batch 3300, loss[loss=0.4302, simple_loss=0.4101, pruned_loss=0.1817, ctc_loss=0.3395, cr_loss=0.4779, over 32999.00 frames. ], tot_loss[loss=0.4169, simple_loss=0.3955, pruned_loss=0.1764, ctc_loss=0.3328, cr_loss=0.4783, over 6769674.16 frames. ], batch size: 130, lr: 4.31e-02, grad_scale: 16.0 2024-09-16 19:31:56,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=15400.0, ans=0.0025000000000000022 2024-09-16 19:32:08,711 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.789e+02 4.003e+02 5.070e+02 6.338e+02 1.346e+03, threshold=1.014e+03, percent-clipped=8.0 2024-09-16 19:32:55,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=15586.666666666666, ans=0.125 2024-09-16 19:32:58,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=15586.666666666666, ans=0.125 2024-09-16 19:33:01,984 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.37 vs. limit=19.189999999999998 2024-09-16 19:33:06,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=15633.333333333334, ans=0.125 2024-09-16 19:33:07,680 INFO [train.py:1198] (1/2) Epoch 1, batch 3350, loss[loss=0.4301, simple_loss=0.409, pruned_loss=0.1812, ctc_loss=0.3437, cr_loss=0.4999, over 33846.00 frames. ], tot_loss[loss=0.4179, simple_loss=0.3963, pruned_loss=0.1768, ctc_loss=0.3335, cr_loss=0.4794, over 6742241.76 frames. ], batch size: 122, lr: 4.30e-02, grad_scale: 16.0 2024-09-16 19:33:31,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=15680.0, ans=0.35120000000000007 2024-09-16 19:34:01,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=15773.333333333334, ans=0.0009444444444444422 2024-09-16 19:34:32,604 INFO [train.py:1198] (1/2) Epoch 1, batch 3400, loss[loss=0.3461, simple_loss=0.344, pruned_loss=0.1388, ctc_loss=0.2661, cr_loss=0.433, over 34195.00 frames. ], tot_loss[loss=0.4161, simple_loss=0.3952, pruned_loss=0.1758, ctc_loss=0.3315, cr_loss=0.4785, over 6731922.79 frames. ], batch size: 78, lr: 4.29e-02, grad_scale: 16.0 2024-09-16 19:34:39,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=15866.666666666666, ans=0.125 2024-09-16 19:34:41,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=15866.666666666666, ans=0.125 2024-09-16 19:34:55,735 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.654e+02 3.636e+02 4.710e+02 6.058e+02 1.078e+03, threshold=9.419e+02, percent-clipped=3.0 2024-09-16 19:34:56,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.25 vs. limit=8.978333333333333 2024-09-16 19:35:02,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=15913.333333333334, ans=0.0 2024-09-16 19:35:14,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=15960.0, ans=0.34140000000000004 2024-09-16 19:35:48,887 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.91 vs. limit=9.013333333333334 2024-09-16 19:35:54,604 INFO [train.py:1198] (1/2) Epoch 1, batch 3450, loss[loss=0.4406, simple_loss=0.4196, pruned_loss=0.186, ctc_loss=0.3511, cr_loss=0.4817, over 33137.00 frames. ], tot_loss[loss=0.4149, simple_loss=0.3948, pruned_loss=0.175, ctc_loss=0.3298, cr_loss=0.4781, over 6744481.83 frames. ], batch size: 130, lr: 4.29e-02, grad_scale: 16.0 2024-09-16 19:35:59,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=16100.0, ans=0.125 2024-09-16 19:36:06,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=16100.0, ans=0.3365 2024-09-16 19:36:28,109 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:36:30,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.96 vs. limit=13.5725 2024-09-16 19:36:50,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16240.0, ans=0.1376 2024-09-16 19:36:57,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=16240.0, ans=0.025 2024-09-16 19:37:16,095 INFO [train.py:1198] (1/2) Epoch 1, batch 3500, loss[loss=0.381, simple_loss=0.3693, pruned_loss=0.1582, ctc_loss=0.2968, cr_loss=0.4253, over 34480.00 frames. ], tot_loss[loss=0.4131, simple_loss=0.3936, pruned_loss=0.174, ctc_loss=0.3278, cr_loss=0.4778, over 6746651.91 frames. ], batch size: 85, lr: 4.28e-02, grad_scale: 16.0 2024-09-16 19:37:34,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16380.0, ans=0.13620000000000002 2024-09-16 19:37:38,750 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.506e+02 3.450e+02 4.343e+02 5.707e+02 1.021e+03, threshold=8.686e+02, percent-clipped=2.0 2024-09-16 19:37:53,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=16426.666666666668, ans=0.04949747468305833 2024-09-16 19:38:12,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=16473.333333333332, ans=0.025 2024-09-16 19:38:17,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=16473.333333333332, ans=0.0 2024-09-16 19:38:17,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.65 vs. limit=19.854999999999997 2024-09-16 19:38:21,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.24 vs. limit=9.129999999999999 2024-09-16 19:38:27,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.55 vs. limit=19.89 2024-09-16 19:38:37,524 INFO [train.py:1198] (1/2) Epoch 1, batch 3550, loss[loss=0.4209, simple_loss=0.4079, pruned_loss=0.1741, ctc_loss=0.3329, cr_loss=0.479, over 34392.00 frames. ], tot_loss[loss=0.4118, simple_loss=0.393, pruned_loss=0.1731, ctc_loss=0.3263, cr_loss=0.4779, over 6757358.56 frames. ], batch size: 103, lr: 4.28e-02, grad_scale: 16.0 2024-09-16 19:38:58,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=16613.333333333332, ans=0.125 2024-09-16 19:38:58,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=16613.333333333332, ans=0.0 2024-09-16 19:39:11,666 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:39:25,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=26.71 vs. limit=20.03 2024-09-16 19:39:36,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=13.765 2024-09-16 19:39:50,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=14.00 vs. limit=13.376666666666665 2024-09-16 19:39:59,097 INFO [train.py:1198] (1/2) Epoch 1, batch 3600, loss[loss=0.3905, simple_loss=0.3728, pruned_loss=0.164, ctc_loss=0.3042, cr_loss=0.484, over 34476.00 frames. ], tot_loss[loss=0.4108, simple_loss=0.3924, pruned_loss=0.1726, ctc_loss=0.3247, cr_loss=0.4786, over 6766432.21 frames. ], batch size: 90, lr: 4.27e-02, grad_scale: 32.0 2024-09-16 19:40:04,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16800.0, ans=0.132 2024-09-16 19:40:09,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=16800.0, ans=0.025 2024-09-16 19:40:16,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.54 vs. limit=13.8175 2024-09-16 19:40:23,793 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.609e+02 3.434e+02 4.167e+02 5.448e+02 1.224e+03, threshold=8.333e+02, percent-clipped=6.0 2024-09-16 19:40:40,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=16893.333333333332, ans=0.0 2024-09-16 19:40:51,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=16940.0, ans=0.13060000000000002 2024-09-16 19:40:59,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=16940.0, ans=0.125 2024-09-16 19:41:09,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=16986.666666666668, ans=0.0 2024-09-16 19:41:17,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=16986.666666666668, ans=0.0 2024-09-16 19:41:20,529 INFO [train.py:1198] (1/2) Epoch 1, batch 3650, loss[loss=0.4265, simple_loss=0.4131, pruned_loss=0.1766, ctc_loss=0.3327, cr_loss=0.5009, over 34487.00 frames. ], tot_loss[loss=0.4079, simple_loss=0.3904, pruned_loss=0.171, ctc_loss=0.3219, cr_loss=0.4775, over 6769969.93 frames. ], batch size: 110, lr: 4.27e-02, grad_scale: 16.0 2024-09-16 19:41:23,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=17033.333333333332, ans=0.125 2024-09-16 19:41:30,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=17033.333333333332, ans=0.3038333333333334 2024-09-16 19:41:33,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=17033.333333333332, ans=0.125 2024-09-16 19:42:04,683 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 19:42:10,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.37 vs. limit=9.293333333333333 2024-09-16 19:42:22,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=17173.333333333332, ans=0.125 2024-09-16 19:42:28,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.49 vs. limit=9.305 2024-09-16 19:42:33,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.68 vs. limit=13.9575 2024-09-16 19:42:42,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.65 vs. limit=9.316666666666666 2024-09-16 19:42:42,721 INFO [train.py:1198] (1/2) Epoch 1, batch 3700, loss[loss=0.4304, simple_loss=0.4072, pruned_loss=0.1829, ctc_loss=0.342, cr_loss=0.4843, over 34598.00 frames. ], tot_loss[loss=0.4068, simple_loss=0.39, pruned_loss=0.1702, ctc_loss=0.3204, cr_loss=0.4777, over 6784397.91 frames. ], batch size: 102, lr: 4.26e-02, grad_scale: 16.0 2024-09-16 19:42:46,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=17266.666666666668, ans=0.29566666666666674 2024-09-16 19:42:47,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=17266.666666666668, ans=0.125 2024-09-16 19:43:07,328 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.697e+02 3.569e+02 4.265e+02 5.406e+02 1.024e+03, threshold=8.530e+02, percent-clipped=2.0 2024-09-16 19:43:12,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=17313.333333333332, ans=0.125 2024-09-16 19:43:27,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=17360.0, ans=0.125 2024-09-16 19:43:29,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=17360.0, ans=0.125 2024-09-16 19:43:47,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=17453.333333333332, ans=0.125 2024-09-16 19:44:05,209 INFO [train.py:1198] (1/2) Epoch 1, batch 3750, loss[loss=0.4069, simple_loss=0.398, pruned_loss=0.1664, ctc_loss=0.3178, cr_loss=0.4863, over 34331.00 frames. ], tot_loss[loss=0.4103, simple_loss=0.3934, pruned_loss=0.1717, ctc_loss=0.3231, cr_loss=0.482, over 6785674.38 frames. ], batch size: 113, lr: 4.26e-02, grad_scale: 16.0 2024-09-16 19:44:07,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.15 vs. limit=20.625 2024-09-16 19:44:24,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=17546.666666666668, ans=0.1245333333333333 2024-09-16 19:44:50,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=17593.333333333332, ans=0.125 2024-09-16 19:44:54,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.44 vs. limit=14.115 2024-09-16 19:45:28,051 INFO [train.py:1198] (1/2) Epoch 1, batch 3800, loss[loss=0.4539, simple_loss=0.4194, pruned_loss=0.1985, ctc_loss=0.3582, cr_loss=0.4936, over 29630.00 frames. ], tot_loss[loss=0.4161, simple_loss=0.3974, pruned_loss=0.1749, ctc_loss=0.3284, cr_loss=0.4844, over 6675804.25 frames. ], batch size: 175, lr: 4.25e-02, grad_scale: 16.0 2024-09-16 19:45:29,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=17733.333333333332, ans=0.125 2024-09-16 19:45:32,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.74 vs. limit=14.15 2024-09-16 19:45:44,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.23 vs. limit=20.835 2024-09-16 19:45:51,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=11.112 2024-09-16 19:45:53,746 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.484e+02 3.418e+02 3.963e+02 4.944e+02 8.359e+02, threshold=7.927e+02, percent-clipped=0.0 2024-09-16 19:46:09,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.09 vs. limit=9.456666666666667 2024-09-16 19:46:24,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.95 vs. limit=14.2025 2024-09-16 19:46:29,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=17873.333333333332, ans=0.125 2024-09-16 19:46:29,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=17873.333333333332, ans=0.12126666666666669 2024-09-16 19:46:36,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=17920.0, ans=0.0 2024-09-16 19:46:38,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.90 vs. limit=13.96 2024-09-16 19:46:40,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.37 vs. limit=14.219999999999999 2024-09-16 19:46:46,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=26.12 vs. limit=14.219999999999999 2024-09-16 19:46:52,953 INFO [train.py:1198] (1/2) Epoch 1, batch 3850, loss[loss=0.4586, simple_loss=0.4155, pruned_loss=0.2031, ctc_loss=0.3837, cr_loss=0.4692, over 23700.00 frames. ], tot_loss[loss=0.4257, simple_loss=0.4025, pruned_loss=0.1808, ctc_loss=0.3396, cr_loss=0.4839, over 6248771.87 frames. ], batch size: 244, lr: 4.24e-02, grad_scale: 16.0 2024-09-16 19:47:13,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=18013.333333333332, ans=0.2695333333333334 2024-09-16 19:47:33,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.63 vs. limit=21.045 2024-09-16 19:48:22,584 INFO [train.py:1198] (1/2) Epoch 2, batch 0, loss[loss=0.3751, simple_loss=0.3643, pruned_loss=0.1546, ctc_loss=0.292, cr_loss=0.454, over 34467.00 frames. ], tot_loss[loss=0.3751, simple_loss=0.3643, pruned_loss=0.1546, ctc_loss=0.292, cr_loss=0.454, over 34467.00 frames. ], batch size: 85, lr: 4.16e-02, grad_scale: 32.0 2024-09-16 19:48:22,584 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 19:48:39,256 INFO [train.py:1230] (1/2) Epoch 2, validation: loss=0.2402, simple_loss=0.3258, pruned_loss=0.06371, ctc_loss=0.1354, cr_loss=1.424e-14, over 944034.00 frames. 2024-09-16 19:48:39,256 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-16 19:49:01,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=18134.666666666668, ans=0.125 2024-09-16 19:49:02,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=18134.666666666668, ans=0.125 2024-09-16 19:49:03,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.13 vs. limit=14.067333333333334 2024-09-16 19:49:14,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.09 vs. limit=14.318 2024-09-16 19:49:25,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=18181.333333333332, ans=0.09899494936611666 2024-09-16 19:49:43,585 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.488e+02 3.171e+02 3.652e+02 4.546e+02 8.787e+02, threshold=7.303e+02, percent-clipped=2.0 2024-09-16 19:49:48,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=18274.666666666668, ans=0.125 2024-09-16 19:49:50,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=18274.666666666668, ans=0.125 2024-09-16 19:50:03,910 INFO [train.py:1198] (1/2) Epoch 2, batch 50, loss[loss=0.3497, simple_loss=0.3457, pruned_loss=0.1416, ctc_loss=0.2692, cr_loss=0.4167, over 34474.00 frames. ], tot_loss[loss=0.4124, simple_loss=0.394, pruned_loss=0.1736, ctc_loss=0.3234, cr_loss=0.474, over 1479648.88 frames. ], batch size: 82, lr: 4.15e-02, grad_scale: 32.0 2024-09-16 19:50:14,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=18321.333333333332, ans=0.125 2024-09-16 19:50:36,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=18368.0, ans=0.125 2024-09-16 19:51:01,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=18461.333333333332, ans=0.0 2024-09-16 19:51:25,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=18508.0, ans=0.2522200000000001 2024-09-16 19:51:27,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=18508.0, ans=0.125 2024-09-16 19:51:30,416 INFO [train.py:1198] (1/2) Epoch 2, batch 100, loss[loss=0.3716, simple_loss=0.3642, pruned_loss=0.1525, ctc_loss=0.2818, cr_loss=0.4439, over 34596.00 frames. ], tot_loss[loss=0.4098, simple_loss=0.3931, pruned_loss=0.1717, ctc_loss=0.3204, cr_loss=0.4768, over 2628602.45 frames. ], batch size: 89, lr: 4.15e-02, grad_scale: 32.0 2024-09-16 19:51:47,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=18601.333333333332, ans=0.07 2024-09-16 19:52:00,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=18601.333333333332, ans=0.125 2024-09-16 19:52:05,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=16.26 vs. limit=14.493 2024-09-16 19:52:20,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=18648.0, ans=0.125 2024-09-16 19:52:40,424 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.380e+02 3.205e+02 4.277e+02 6.109e+02 1.381e+03, threshold=8.555e+02, percent-clipped=13.0 2024-09-16 19:52:58,046 INFO [train.py:1198] (1/2) Epoch 2, batch 150, loss[loss=0.3731, simple_loss=0.3607, pruned_loss=0.1553, ctc_loss=0.2843, cr_loss=0.4498, over 34475.00 frames. ], tot_loss[loss=0.403, simple_loss=0.3885, pruned_loss=0.1678, ctc_loss=0.3145, cr_loss=0.475, over 3556880.48 frames. ], batch size: 82, lr: 4.14e-02, grad_scale: 32.0 2024-09-16 19:53:07,394 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.47 vs. limit=21.591 2024-09-16 19:53:21,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=18834.666666666668, ans=0.125 2024-09-16 19:53:38,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=18881.333333333332, ans=0.006764927536231884 2024-09-16 19:53:38,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=14.5805 2024-09-16 19:53:41,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=18881.333333333332, ans=0.23915333333333344 2024-09-16 19:54:07,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=18974.666666666668, ans=0.04949747468305833 2024-09-16 19:54:22,417 INFO [train.py:1198] (1/2) Epoch 2, batch 200, loss[loss=0.4514, simple_loss=0.422, pruned_loss=0.1932, ctc_loss=0.3617, cr_loss=0.5495, over 31923.00 frames. ], tot_loss[loss=0.4009, simple_loss=0.3867, pruned_loss=0.1667, ctc_loss=0.3127, cr_loss=0.4751, over 4271329.86 frames. ], batch size: 145, lr: 4.14e-02, grad_scale: 32.0 2024-09-16 19:54:22,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=19021.333333333332, ans=0.025 2024-09-16 19:55:07,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=19114.666666666668, ans=0.05885333333333331 2024-09-16 19:55:14,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=19161.333333333332, ans=0.0 2024-09-16 19:55:22,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.37 vs. limit=5.8742 2024-09-16 19:55:28,655 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.348e+02 3.527e+02 4.204e+02 5.422e+02 1.023e+03, threshold=8.408e+02, percent-clipped=5.0 2024-09-16 19:55:46,775 INFO [train.py:1198] (1/2) Epoch 2, batch 250, loss[loss=0.4165, simple_loss=0.3978, pruned_loss=0.1749, ctc_loss=0.3297, cr_loss=0.484, over 34235.00 frames. ], tot_loss[loss=0.3985, simple_loss=0.3853, pruned_loss=0.1654, ctc_loss=0.3102, cr_loss=0.4735, over 4833584.60 frames. ], batch size: 117, lr: 4.13e-02, grad_scale: 32.0 2024-09-16 19:55:49,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=5.888199999999999 2024-09-16 19:55:52,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=19254.666666666668, ans=0.125 2024-09-16 19:56:05,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19301.333333333332, ans=0.1069866666666667 2024-09-16 19:56:21,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19348.0, ans=0.10652 2024-09-16 19:56:29,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=19348.0, ans=0.125 2024-09-16 19:56:51,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.89 vs. limit=14.720666666666666 2024-09-16 19:57:04,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=19441.333333333332, ans=0.025 2024-09-16 19:57:05,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=19441.333333333332, ans=0.0 2024-09-16 19:57:08,850 INFO [train.py:1198] (1/2) Epoch 2, batch 300, loss[loss=0.4408, simple_loss=0.4178, pruned_loss=0.1871, ctc_loss=0.3503, cr_loss=0.4883, over 34334.00 frames. ], tot_loss[loss=0.3974, simple_loss=0.3847, pruned_loss=0.1647, ctc_loss=0.3093, cr_loss=0.4742, over 5261830.13 frames. ], batch size: 107, lr: 4.12e-02, grad_scale: 16.0 2024-09-16 19:57:22,992 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=16.56 vs. limit=14.808 2024-09-16 19:57:32,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19534.666666666668, ans=0.10465333333333332 2024-09-16 19:58:10,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.97 vs. limit=14.8605 2024-09-16 19:58:17,205 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.624e+02 3.463e+02 3.941e+02 5.080e+02 1.255e+03, threshold=7.883e+02, percent-clipped=2.0 2024-09-16 19:58:29,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=19674.666666666668, ans=0.125 2024-09-16 19:58:35,487 INFO [train.py:1198] (1/2) Epoch 2, batch 350, loss[loss=0.3568, simple_loss=0.3541, pruned_loss=0.144, ctc_loss=0.2723, cr_loss=0.4249, over 34260.00 frames. ], tot_loss[loss=0.3967, simple_loss=0.3844, pruned_loss=0.1642, ctc_loss=0.3081, cr_loss=0.4744, over 5597490.18 frames. ], batch size: 83, lr: 4.12e-02, grad_scale: 16.0 2024-09-16 19:58:53,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=19768.0, ans=0.0 2024-09-16 19:59:15,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=19814.666666666668, ans=0.2064866666666667 2024-09-16 19:59:31,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=19861.333333333332, ans=0.0 2024-09-16 19:59:49,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.77 vs. limit=22.430999999999997 2024-09-16 19:59:55,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=19954.666666666668, ans=0.125 2024-09-16 19:59:56,885 INFO [train.py:1198] (1/2) Epoch 2, batch 400, loss[loss=0.3656, simple_loss=0.3596, pruned_loss=0.1482, ctc_loss=0.285, cr_loss=0.4557, over 34431.00 frames. ], tot_loss[loss=0.3941, simple_loss=0.3827, pruned_loss=0.1627, ctc_loss=0.3057, cr_loss=0.4741, over 5864803.18 frames. ], batch size: 95, lr: 4.11e-02, grad_scale: 32.0 2024-09-16 20:00:18,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=20001.333333333332, ans=0.5 2024-09-16 20:00:25,400 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:00:30,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=20048.0, ans=0.006511304347826087 2024-09-16 20:00:48,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=20094.666666666668, ans=0.1 2024-09-16 20:01:04,493 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.645e+02 3.389e+02 4.241e+02 5.416e+02 9.018e+02, threshold=8.482e+02, percent-clipped=4.0 2024-09-16 20:01:18,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=20188.0, ans=0.125 2024-09-16 20:01:19,184 INFO [train.py:1198] (1/2) Epoch 2, batch 450, loss[loss=0.3954, simple_loss=0.3851, pruned_loss=0.1622, ctc_loss=0.305, cr_loss=0.5087, over 34699.00 frames. ], tot_loss[loss=0.3937, simple_loss=0.3822, pruned_loss=0.1626, ctc_loss=0.3052, cr_loss=0.4747, over 6052791.73 frames. ], batch size: 97, lr: 4.11e-02, grad_scale: 16.0 2024-09-16 20:01:26,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=20188.0, ans=0.125 2024-09-16 20:01:28,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=15.0 2024-09-16 20:01:36,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=20234.666666666668, ans=0.125 2024-09-16 20:01:39,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=20234.666666666668, ans=0.0064707246376811585 2024-09-16 20:01:49,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=20234.666666666668, ans=0.025 2024-09-16 20:02:29,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=20374.666666666668, ans=0.1 2024-09-16 20:02:38,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=20374.666666666668, ans=0.0 2024-09-16 20:02:45,201 INFO [train.py:1198] (1/2) Epoch 2, batch 500, loss[loss=0.4324, simple_loss=0.4127, pruned_loss=0.1819, ctc_loss=0.336, cr_loss=0.5264, over 34435.00 frames. ], tot_loss[loss=0.3925, simple_loss=0.3814, pruned_loss=0.1619, ctc_loss=0.3041, cr_loss=0.4743, over 6218685.87 frames. ], batch size: 110, lr: 4.10e-02, grad_scale: 16.0 2024-09-16 20:03:52,845 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.941e+02 4.030e+02 4.912e+02 6.719e+02 1.402e+03, threshold=9.824e+02, percent-clipped=13.0 2024-09-16 20:04:07,619 INFO [train.py:1198] (1/2) Epoch 2, batch 550, loss[loss=0.4137, simple_loss=0.406, pruned_loss=0.1693, ctc_loss=0.3208, cr_loss=0.4664, over 33880.00 frames. ], tot_loss[loss=0.3924, simple_loss=0.3814, pruned_loss=0.1618, ctc_loss=0.3037, cr_loss=0.4742, over 6330089.74 frames. ], batch size: 122, lr: 4.09e-02, grad_scale: 16.0 2024-09-16 20:04:12,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=20654.666666666668, ans=0.2 2024-09-16 20:04:12,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=20654.666666666668, ans=0.0 2024-09-16 20:04:19,421 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:05:32,367 INFO [train.py:1198] (1/2) Epoch 2, batch 600, loss[loss=0.3966, simple_loss=0.385, pruned_loss=0.1639, ctc_loss=0.3069, cr_loss=0.4789, over 34219.00 frames. ], tot_loss[loss=0.3919, simple_loss=0.3814, pruned_loss=0.1614, ctc_loss=0.303, cr_loss=0.4739, over 6431788.79 frames. ], batch size: 117, lr: 4.09e-02, grad_scale: 16.0 2024-09-16 20:05:42,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=20888.0, ans=0.125 2024-09-16 20:05:50,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=20934.666666666668, ans=0.2 2024-09-16 20:06:02,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=20934.666666666668, ans=0.1 2024-09-16 20:06:07,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.95 vs. limit=15.0 2024-09-16 20:06:21,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=21028.0, ans=0.125 2024-09-16 20:06:26,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.45 vs. limit=15.0 2024-09-16 20:06:39,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=21074.666666666668, ans=0.09899494936611666 2024-09-16 20:06:40,955 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.226e+02 3.232e+02 3.939e+02 5.131e+02 1.036e+03, threshold=7.878e+02, percent-clipped=1.0 2024-09-16 20:06:55,658 INFO [train.py:1198] (1/2) Epoch 2, batch 650, loss[loss=0.3663, simple_loss=0.3666, pruned_loss=0.1455, ctc_loss=0.2789, cr_loss=0.4805, over 34535.00 frames. ], tot_loss[loss=0.3903, simple_loss=0.3804, pruned_loss=0.1605, ctc_loss=0.3016, cr_loss=0.4739, over 6522918.04 frames. ], batch size: 94, lr: 4.08e-02, grad_scale: 16.0 2024-09-16 20:07:00,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=21121.333333333332, ans=0.0 2024-09-16 20:07:12,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=21168.0, ans=0.025 2024-09-16 20:07:19,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.58 vs. limit=15.0 2024-09-16 20:07:21,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=21168.0, ans=22.5 2024-09-16 20:07:28,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=21214.666666666668, ans=0.125 2024-09-16 20:07:30,566 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:07:48,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=21261.333333333332, ans=0.025 2024-09-16 20:08:14,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=21308.0, ans=0.125 2024-09-16 20:08:17,866 INFO [train.py:1198] (1/2) Epoch 2, batch 700, loss[loss=0.3903, simple_loss=0.3742, pruned_loss=0.1645, ctc_loss=0.3, cr_loss=0.4354, over 34572.00 frames. ], tot_loss[loss=0.3902, simple_loss=0.3806, pruned_loss=0.1603, ctc_loss=0.3013, cr_loss=0.4741, over 6579667.77 frames. ], batch size: 89, lr: 4.08e-02, grad_scale: 16.0 2024-09-16 20:08:27,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=21354.666666666668, ans=0.2 2024-09-16 20:08:29,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=21354.666666666668, ans=0.2 2024-09-16 20:08:30,328 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.60 vs. limit=22.5 2024-09-16 20:08:59,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=21448.0, ans=0.0 2024-09-16 20:09:01,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.97 vs. limit=15.0 2024-09-16 20:09:28,587 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.563e+02 3.532e+02 4.268e+02 5.394e+02 1.372e+03, threshold=8.536e+02, percent-clipped=7.0 2024-09-16 20:09:30,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21541.333333333332, ans=0.1 2024-09-16 20:09:44,933 INFO [train.py:1198] (1/2) Epoch 2, batch 750, loss[loss=0.4008, simple_loss=0.3915, pruned_loss=0.1639, ctc_loss=0.3069, cr_loss=0.5229, over 34424.00 frames. ], tot_loss[loss=0.3889, simple_loss=0.3796, pruned_loss=0.1596, ctc_loss=0.2999, cr_loss=0.4744, over 6623639.54 frames. ], batch size: 95, lr: 4.07e-02, grad_scale: 16.0 2024-09-16 20:09:51,743 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:10:06,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=21634.666666666668, ans=0.125 2024-09-16 20:10:52,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=21774.666666666668, ans=0.025 2024-09-16 20:11:04,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=21774.666666666668, ans=0.0 2024-09-16 20:11:07,032 INFO [train.py:1198] (1/2) Epoch 2, batch 800, loss[loss=0.3407, simple_loss=0.3373, pruned_loss=0.1377, ctc_loss=0.2527, cr_loss=0.4523, over 34447.00 frames. ], tot_loss[loss=0.3885, simple_loss=0.3794, pruned_loss=0.1594, ctc_loss=0.2993, cr_loss=0.4739, over 6658878.77 frames. ], batch size: 85, lr: 4.06e-02, grad_scale: 32.0 2024-09-16 20:11:43,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=21914.666666666668, ans=0.125 2024-09-16 20:11:54,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=21961.333333333332, ans=0.125 2024-09-16 20:12:09,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=21961.333333333332, ans=0.006095362318840581 2024-09-16 20:12:14,090 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.183e+02 3.394e+02 4.460e+02 5.635e+02 9.715e+02, threshold=8.920e+02, percent-clipped=3.0 2024-09-16 20:12:19,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=22008.0, ans=0.006085217391304348 2024-09-16 20:12:25,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=22008.0, ans=0.2 2024-09-16 20:12:26,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.38 vs. limit=15.0 2024-09-16 20:12:28,789 INFO [train.py:1198] (1/2) Epoch 2, batch 850, loss[loss=0.4049, simple_loss=0.4006, pruned_loss=0.1635, ctc_loss=0.3182, cr_loss=0.468, over 34363.00 frames. ], tot_loss[loss=0.3873, simple_loss=0.3786, pruned_loss=0.1587, ctc_loss=0.2982, cr_loss=0.474, over 6691603.42 frames. ], batch size: 103, lr: 4.06e-02, grad_scale: 32.0 2024-09-16 20:12:43,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.27 vs. limit=22.5 2024-09-16 20:12:50,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=22101.333333333332, ans=0.1 2024-09-16 20:13:42,839 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=10.69 vs. limit=10.0 2024-09-16 20:13:44,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2024-09-16 20:13:54,765 INFO [train.py:1198] (1/2) Epoch 2, batch 900, loss[loss=0.357, simple_loss=0.354, pruned_loss=0.1423, ctc_loss=0.2792, cr_loss=0.4903, over 34455.00 frames. ], tot_loss[loss=0.3877, simple_loss=0.3788, pruned_loss=0.1589, ctc_loss=0.2985, cr_loss=0.4745, over 6697143.78 frames. ], batch size: 85, lr: 4.05e-02, grad_scale: 32.0 2024-09-16 20:14:08,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=22288.0, ans=0.125 2024-09-16 20:14:08,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=20.31 vs. limit=15.0 2024-09-16 20:14:20,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=22334.666666666668, ans=0.006014202898550724 2024-09-16 20:14:27,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=22381.333333333332, ans=0.04949747468305833 2024-09-16 20:14:37,797 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:14:47,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=22428.0, ans=0.125 2024-09-16 20:14:58,359 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=23.35 vs. limit=22.5 2024-09-16 20:15:03,843 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.646e+02 3.504e+02 3.918e+02 5.428e+02 9.381e+02, threshold=7.836e+02, percent-clipped=1.0 2024-09-16 20:15:17,020 INFO [train.py:1198] (1/2) Epoch 2, batch 950, loss[loss=0.3782, simple_loss=0.3688, pruned_loss=0.1558, ctc_loss=0.2876, cr_loss=0.4617, over 34673.00 frames. ], tot_loss[loss=0.3883, simple_loss=0.3793, pruned_loss=0.1592, ctc_loss=0.2991, cr_loss=0.4751, over 6700231.02 frames. ], batch size: 87, lr: 4.05e-02, grad_scale: 16.0 2024-09-16 20:15:17,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=22521.333333333332, ans=0.2 2024-09-16 20:15:45,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=22568.0, ans=0.125 2024-09-16 20:15:47,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=22568.0, ans=0.125 2024-09-16 20:15:58,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=22614.666666666668, ans=0.07 2024-09-16 20:15:59,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.59 vs. limit=15.0 2024-09-16 20:16:06,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=22661.333333333332, ans=0.95 2024-09-16 20:16:06,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=22661.333333333332, ans=0.125 2024-09-16 20:16:06,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=22661.333333333332, ans=0.125 2024-09-16 20:16:07,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=22661.333333333332, ans=0.125 2024-09-16 20:16:08,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=22661.333333333332, ans=0.125 2024-09-16 20:16:12,295 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=23.03 vs. limit=22.5 2024-09-16 20:16:39,356 INFO [train.py:1198] (1/2) Epoch 2, batch 1000, loss[loss=0.3675, simple_loss=0.3637, pruned_loss=0.149, ctc_loss=0.2752, cr_loss=0.455, over 34490.00 frames. ], tot_loss[loss=0.3892, simple_loss=0.3801, pruned_loss=0.1596, ctc_loss=0.2997, cr_loss=0.476, over 6691818.37 frames. ], batch size: 90, lr: 4.04e-02, grad_scale: 16.0 2024-09-16 20:17:01,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=22801.333333333332, ans=0.125 2024-09-16 20:17:04,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=22801.333333333332, ans=0.05 2024-09-16 20:17:23,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=22848.0, ans=0.125 2024-09-16 20:17:32,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=22894.666666666668, ans=0.125 2024-09-16 20:17:52,280 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.398e+02 3.282e+02 4.185e+02 5.448e+02 1.184e+03, threshold=8.370e+02, percent-clipped=2.0 2024-09-16 20:18:05,755 INFO [train.py:1198] (1/2) Epoch 2, batch 1050, loss[loss=0.401, simple_loss=0.3952, pruned_loss=0.1635, ctc_loss=0.3052, cr_loss=0.4684, over 34575.00 frames. ], tot_loss[loss=0.3873, simple_loss=0.3787, pruned_loss=0.1587, ctc_loss=0.2981, cr_loss=0.4745, over 6702004.46 frames. ], batch size: 99, lr: 4.03e-02, grad_scale: 16.0 2024-09-16 20:18:14,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=22988.0, ans=0.125 2024-09-16 20:18:15,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2024-09-16 20:18:17,583 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:18:35,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=23034.666666666668, ans=0.0 2024-09-16 20:18:48,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=23081.333333333332, ans=0.035 2024-09-16 20:18:58,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=23128.0, ans=0.1 2024-09-16 20:19:03,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=23128.0, ans=0.1 2024-09-16 20:19:15,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=23174.666666666668, ans=0.1 2024-09-16 20:19:24,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=23174.666666666668, ans=0.005831594202898551 2024-09-16 20:19:27,886 INFO [train.py:1198] (1/2) Epoch 2, batch 1100, loss[loss=0.3809, simple_loss=0.3731, pruned_loss=0.1566, ctc_loss=0.2917, cr_loss=0.4313, over 34370.00 frames. ], tot_loss[loss=0.3864, simple_loss=0.3781, pruned_loss=0.1582, ctc_loss=0.297, cr_loss=0.4742, over 6715019.77 frames. ], batch size: 91, lr: 4.03e-02, grad_scale: 16.0 2024-09-16 20:19:58,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2024-09-16 20:20:09,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=23314.666666666668, ans=0.2 2024-09-16 20:20:23,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.06 vs. limit=15.0 2024-09-16 20:20:38,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.32 vs. limit=15.0 2024-09-16 20:20:39,465 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.372e+02 3.431e+02 4.178e+02 5.648e+02 1.313e+03, threshold=8.357e+02, percent-clipped=3.0 2024-09-16 20:20:51,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=23454.666666666668, ans=0.07 2024-09-16 20:20:52,809 INFO [train.py:1198] (1/2) Epoch 2, batch 1150, loss[loss=0.3595, simple_loss=0.3557, pruned_loss=0.1451, ctc_loss=0.2706, cr_loss=0.4731, over 34343.00 frames. ], tot_loss[loss=0.3861, simple_loss=0.3777, pruned_loss=0.1581, ctc_loss=0.2966, cr_loss=0.4746, over 6713945.80 frames. ], batch size: 91, lr: 4.02e-02, grad_scale: 16.0 2024-09-16 20:20:59,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=23454.666666666668, ans=0.0 2024-09-16 20:21:04,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2024-09-16 20:21:06,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=23454.666666666668, ans=0.125 2024-09-16 20:21:11,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=23501.333333333332, ans=0.125 2024-09-16 20:21:29,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=23548.0, ans=0.2 2024-09-16 20:21:31,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=23548.0, ans=0.125 2024-09-16 20:21:58,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.91 vs. limit=22.5 2024-09-16 20:22:03,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=23641.333333333332, ans=0.125 2024-09-16 20:22:15,030 INFO [train.py:1198] (1/2) Epoch 2, batch 1200, loss[loss=0.4016, simple_loss=0.3916, pruned_loss=0.1657, ctc_loss=0.3058, cr_loss=0.477, over 34581.00 frames. ], tot_loss[loss=0.3872, simple_loss=0.3786, pruned_loss=0.1586, ctc_loss=0.2974, cr_loss=0.4765, over 6706189.97 frames. ], batch size: 99, lr: 4.02e-02, grad_scale: 32.0 2024-09-16 20:22:46,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=23781.333333333332, ans=10.0 2024-09-16 20:23:03,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=23828.0, ans=0.125 2024-09-16 20:23:15,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2024-09-16 20:23:24,556 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.405e+02 3.180e+02 3.695e+02 4.790e+02 1.088e+03, threshold=7.390e+02, percent-clipped=4.0 2024-09-16 20:23:34,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=23874.666666666668, ans=0.125 2024-09-16 20:23:36,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=23921.333333333332, ans=0.95 2024-09-16 20:23:37,785 INFO [train.py:1198] (1/2) Epoch 2, batch 1250, loss[loss=0.3916, simple_loss=0.395, pruned_loss=0.1552, ctc_loss=0.2919, cr_loss=0.4849, over 34370.00 frames. ], tot_loss[loss=0.3863, simple_loss=0.3784, pruned_loss=0.1579, ctc_loss=0.2962, cr_loss=0.4774, over 6740998.82 frames. ], batch size: 107, lr: 4.01e-02, grad_scale: 32.0 2024-09-16 20:24:27,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-09-16 20:24:39,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=24061.333333333332, ans=0.005638840579710145 2024-09-16 20:24:39,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=6.36 vs. limit=12.0 2024-09-16 20:24:41,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.68 vs. limit=15.0 2024-09-16 20:24:44,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=24061.333333333332, ans=0.125 2024-09-16 20:25:03,512 INFO [train.py:1198] (1/2) Epoch 2, batch 1300, loss[loss=0.4035, simple_loss=0.3931, pruned_loss=0.1658, ctc_loss=0.3101, cr_loss=0.5038, over 32928.00 frames. ], tot_loss[loss=0.3851, simple_loss=0.3774, pruned_loss=0.1573, ctc_loss=0.2952, cr_loss=0.4767, over 6746078.41 frames. ], batch size: 130, lr: 4.00e-02, grad_scale: 16.0 2024-09-16 20:25:03,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=24154.666666666668, ans=0.125 2024-09-16 20:25:05,461 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:25:05,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=24154.666666666668, ans=0.0 2024-09-16 20:25:18,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=24201.333333333332, ans=0.125 2024-09-16 20:25:21,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2024-09-16 20:25:25,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=24201.333333333332, ans=0.0 2024-09-16 20:25:28,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=24201.333333333332, ans=0.125 2024-09-16 20:25:35,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=24248.0, ans=0.125 2024-09-16 20:25:38,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=24248.0, ans=0.0055982608695652174 2024-09-16 20:25:40,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=15.0 2024-09-16 20:26:14,283 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.514e+02 3.368e+02 4.137e+02 5.072e+02 9.880e+02, threshold=8.275e+02, percent-clipped=2.0 2024-09-16 20:26:25,924 INFO [train.py:1198] (1/2) Epoch 2, batch 1350, loss[loss=0.3922, simple_loss=0.3815, pruned_loss=0.1621, ctc_loss=0.3013, cr_loss=0.4617, over 34536.00 frames. ], tot_loss[loss=0.3837, simple_loss=0.3766, pruned_loss=0.1565, ctc_loss=0.294, cr_loss=0.4769, over 6766428.73 frames. ], batch size: 94, lr: 4.00e-02, grad_scale: 16.0 2024-09-16 20:26:32,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=24388.0, ans=0.125 2024-09-16 20:27:21,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=24528.0, ans=0.2 2024-09-16 20:27:47,660 INFO [train.py:1198] (1/2) Epoch 2, batch 1400, loss[loss=0.3573, simple_loss=0.3502, pruned_loss=0.1453, ctc_loss=0.276, cr_loss=0.4667, over 34291.00 frames. ], tot_loss[loss=0.3832, simple_loss=0.3762, pruned_loss=0.1562, ctc_loss=0.2934, cr_loss=0.477, over 6778109.36 frames. ], batch size: 80, lr: 3.99e-02, grad_scale: 16.0 2024-09-16 20:28:44,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=24761.333333333332, ans=0.125 2024-09-16 20:28:54,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=24761.333333333332, ans=0.0 2024-09-16 20:29:03,774 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.545e+02 3.531e+02 4.428e+02 5.331e+02 1.401e+03, threshold=8.855e+02, percent-clipped=3.0 2024-09-16 20:29:04,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=24808.0, ans=0.1 2024-09-16 20:29:13,839 INFO [train.py:1198] (1/2) Epoch 2, batch 1450, loss[loss=0.4109, simple_loss=0.3964, pruned_loss=0.1698, ctc_loss=0.3186, cr_loss=0.5527, over 34469.00 frames. ], tot_loss[loss=0.3839, simple_loss=0.3769, pruned_loss=0.1565, ctc_loss=0.294, cr_loss=0.4775, over 6773225.97 frames. ], batch size: 110, lr: 3.98e-02, grad_scale: 8.0 2024-09-16 20:29:15,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=24854.666666666668, ans=0.0 2024-09-16 20:29:37,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=24901.333333333332, ans=0.2 2024-09-16 20:30:05,730 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.04 vs. limit=15.0 2024-09-16 20:30:06,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=24994.666666666668, ans=0.125 2024-09-16 20:30:16,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=24994.666666666668, ans=0.0 2024-09-16 20:30:32,982 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:30:35,972 INFO [train.py:1198] (1/2) Epoch 2, batch 1500, loss[loss=0.3655, simple_loss=0.3683, pruned_loss=0.1446, ctc_loss=0.2761, cr_loss=0.4597, over 34469.00 frames. ], tot_loss[loss=0.3839, simple_loss=0.3772, pruned_loss=0.1564, ctc_loss=0.2935, cr_loss=0.4784, over 6774194.27 frames. ], batch size: 100, lr: 3.98e-02, grad_scale: 8.0 2024-09-16 20:30:58,196 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:31:26,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=25228.0, ans=0.5 2024-09-16 20:31:28,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=25228.0, ans=0.0 2024-09-16 20:31:37,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=12.0 2024-09-16 20:31:38,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=25228.0, ans=0.125 2024-09-16 20:31:40,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=25228.0, ans=0.1 2024-09-16 20:31:51,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=25274.666666666668, ans=0.2 2024-09-16 20:31:52,746 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.822e+02 3.516e+02 4.344e+02 6.034e+02 1.487e+03, threshold=8.688e+02, percent-clipped=4.0 2024-09-16 20:31:55,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=23.63 vs. limit=22.5 2024-09-16 20:32:02,797 INFO [train.py:1198] (1/2) Epoch 2, batch 1550, loss[loss=0.4114, simple_loss=0.3975, pruned_loss=0.17, ctc_loss=0.3185, cr_loss=0.5423, over 34425.00 frames. ], tot_loss[loss=0.3846, simple_loss=0.3773, pruned_loss=0.1569, ctc_loss=0.2946, cr_loss=0.4787, over 6744714.76 frames. ], batch size: 105, lr: 3.97e-02, grad_scale: 8.0 2024-09-16 20:32:04,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=25321.333333333332, ans=0.125 2024-09-16 20:32:06,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2024-09-16 20:32:16,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=25321.333333333332, ans=0.2 2024-09-16 20:32:19,647 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:32:39,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=25414.666666666668, ans=0.125 2024-09-16 20:33:15,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=25508.0, ans=0.125 2024-09-16 20:33:19,110 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.55 vs. limit=15.0 2024-09-16 20:33:20,732 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=28.74 vs. limit=22.5 2024-09-16 20:33:24,776 INFO [train.py:1198] (1/2) Epoch 2, batch 1600, loss[loss=0.4271, simple_loss=0.411, pruned_loss=0.178, ctc_loss=0.3371, cr_loss=0.497, over 34574.00 frames. ], tot_loss[loss=0.384, simple_loss=0.3767, pruned_loss=0.1567, ctc_loss=0.2941, cr_loss=0.4784, over 6725433.99 frames. ], batch size: 99, lr: 3.97e-02, grad_scale: 16.0 2024-09-16 20:33:48,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.93 vs. limit=12.0 2024-09-16 20:33:54,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=25601.333333333332, ans=0.125 2024-09-16 20:34:04,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=25648.0, ans=0.125 2024-09-16 20:34:04,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=25648.0, ans=0.025 2024-09-16 20:34:34,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=25741.333333333332, ans=0.0 2024-09-16 20:34:36,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25741.333333333332, ans=0.1 2024-09-16 20:34:39,031 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.539e+02 3.379e+02 3.831e+02 4.947e+02 9.074e+02, threshold=7.661e+02, percent-clipped=1.0 2024-09-16 20:34:46,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.10 vs. limit=10.0 2024-09-16 20:34:47,152 INFO [train.py:1198] (1/2) Epoch 2, batch 1650, loss[loss=0.4024, simple_loss=0.3916, pruned_loss=0.1652, ctc_loss=0.3119, cr_loss=0.5112, over 34367.00 frames. ], tot_loss[loss=0.3839, simple_loss=0.3767, pruned_loss=0.1566, ctc_loss=0.2938, cr_loss=0.4781, over 6718073.47 frames. ], batch size: 103, lr: 3.96e-02, grad_scale: 16.0 2024-09-16 20:34:47,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=25788.0, ans=0.1 2024-09-16 20:34:50,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=25788.0, ans=0.1 2024-09-16 20:34:57,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=25788.0, ans=0.125 2024-09-16 20:35:02,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=25834.666666666668, ans=0.025 2024-09-16 20:35:10,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=25834.666666666668, ans=0.005253333333333333 2024-09-16 20:35:18,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=25834.666666666668, ans=10.0 2024-09-16 20:35:33,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=25881.333333333332, ans=0.2 2024-09-16 20:35:38,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=25928.0, ans=0.125 2024-09-16 20:35:50,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=25928.0, ans=0.09899494936611666 2024-09-16 20:36:05,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=25974.666666666668, ans=0.125 2024-09-16 20:36:08,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=25974.666666666668, ans=0.2 2024-09-16 20:36:12,773 INFO [train.py:1198] (1/2) Epoch 2, batch 1700, loss[loss=0.327, simple_loss=0.3305, pruned_loss=0.1282, ctc_loss=0.2558, cr_loss=0.4022, over 34302.00 frames. ], tot_loss[loss=0.3825, simple_loss=0.3759, pruned_loss=0.1557, ctc_loss=0.2927, cr_loss=0.4782, over 6742814.73 frames. ], batch size: 80, lr: 3.95e-02, grad_scale: 16.0 2024-09-16 20:36:13,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=26021.333333333332, ans=0.04949747468305833 2024-09-16 20:36:19,917 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:36:24,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=26021.333333333332, ans=0.0 2024-09-16 20:37:06,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.84 vs. limit=15.0 2024-09-16 20:37:10,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=26161.333333333332, ans=0.005182318840579711 2024-09-16 20:37:27,089 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.323e+02 3.141e+02 3.893e+02 4.603e+02 1.039e+03, threshold=7.787e+02, percent-clipped=3.0 2024-09-16 20:37:35,407 INFO [train.py:1198] (1/2) Epoch 2, batch 1750, loss[loss=0.3265, simple_loss=0.326, pruned_loss=0.1297, ctc_loss=0.2506, cr_loss=0.4388, over 34158.00 frames. ], tot_loss[loss=0.3819, simple_loss=0.3755, pruned_loss=0.1554, ctc_loss=0.2921, cr_loss=0.4778, over 6752077.17 frames. ], batch size: 78, lr: 3.95e-02, grad_scale: 16.0 2024-09-16 20:37:40,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26254.666666666668, ans=0.1 2024-09-16 20:37:42,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=26254.666666666668, ans=0.125 2024-09-16 20:37:47,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2024-09-16 20:38:03,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=26301.333333333332, ans=0.1 2024-09-16 20:38:19,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.83 vs. limit=10.0 2024-09-16 20:38:49,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=12.0 2024-09-16 20:38:52,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=26441.333333333332, ans=0.125 2024-09-16 20:38:56,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=26488.0, ans=0.125 2024-09-16 20:38:56,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=26488.0, ans=0.1 2024-09-16 20:38:59,121 INFO [train.py:1198] (1/2) Epoch 2, batch 1800, loss[loss=0.383, simple_loss=0.3817, pruned_loss=0.1532, ctc_loss=0.2827, cr_loss=0.5334, over 34706.00 frames. ], tot_loss[loss=0.3818, simple_loss=0.3753, pruned_loss=0.1554, ctc_loss=0.2918, cr_loss=0.4783, over 6753667.53 frames. ], batch size: 97, lr: 3.94e-02, grad_scale: 16.0 2024-09-16 20:38:59,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=26488.0, ans=0.005111304347826087 2024-09-16 20:39:01,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26488.0, ans=0.1 2024-09-16 20:39:02,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=26488.0, ans=0.125 2024-09-16 20:39:05,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=26488.0, ans=0.0 2024-09-16 20:39:12,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=26488.0, ans=0.0 2024-09-16 20:39:22,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=26534.666666666668, ans=0.1 2024-09-16 20:40:15,055 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.429e+02 3.250e+02 3.917e+02 5.279e+02 1.509e+03, threshold=7.834e+02, percent-clipped=10.0 2024-09-16 20:40:23,468 INFO [train.py:1198] (1/2) Epoch 2, batch 1850, loss[loss=0.3875, simple_loss=0.3811, pruned_loss=0.1577, ctc_loss=0.2901, cr_loss=0.5132, over 34469.00 frames. ], tot_loss[loss=0.3801, simple_loss=0.3742, pruned_loss=0.1545, ctc_loss=0.2904, cr_loss=0.4777, over 6762613.20 frames. ], batch size: 100, lr: 3.93e-02, grad_scale: 16.0 2024-09-16 20:40:43,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=26768.0, ans=0.125 2024-09-16 20:41:16,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=26861.333333333332, ans=0.0 2024-09-16 20:41:37,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.59 vs. limit=22.5 2024-09-16 20:41:45,077 INFO [train.py:1198] (1/2) Epoch 2, batch 1900, loss[loss=0.4067, simple_loss=0.3949, pruned_loss=0.1671, ctc_loss=0.314, cr_loss=0.535, over 34385.00 frames. ], tot_loss[loss=0.3812, simple_loss=0.375, pruned_loss=0.155, ctc_loss=0.2911, cr_loss=0.4786, over 6771208.61 frames. ], batch size: 103, lr: 3.93e-02, grad_scale: 16.0 2024-09-16 20:42:25,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=27048.0, ans=0.125 2024-09-16 20:42:46,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=27094.666666666668, ans=0.5 2024-09-16 20:42:49,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=27094.666666666668, ans=0.1 2024-09-16 20:42:56,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=27141.333333333332, ans=0.07 2024-09-16 20:43:00,911 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.351e+02 3.150e+02 3.876e+02 5.017e+02 8.955e+02, threshold=7.751e+02, percent-clipped=2.0 2024-09-16 20:43:01,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=27141.333333333332, ans=0.125 2024-09-16 20:43:09,074 INFO [train.py:1198] (1/2) Epoch 2, batch 1950, loss[loss=0.3882, simple_loss=0.3744, pruned_loss=0.1615, ctc_loss=0.3008, cr_loss=0.4687, over 34367.00 frames. ], tot_loss[loss=0.3818, simple_loss=0.3761, pruned_loss=0.155, ctc_loss=0.2912, cr_loss=0.4806, over 6788032.78 frames. ], batch size: 91, lr: 3.92e-02, grad_scale: 16.0 2024-09-16 20:43:24,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=27188.0, ans=0.125 2024-09-16 20:43:36,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=27234.666666666668, ans=0.125 2024-09-16 20:44:00,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=27328.0, ans=0.1 2024-09-16 20:44:00,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=27328.0, ans=0.125 2024-09-16 20:44:15,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=27374.666666666668, ans=0.0 2024-09-16 20:44:33,273 INFO [train.py:1198] (1/2) Epoch 2, batch 2000, loss[loss=0.3295, simple_loss=0.3327, pruned_loss=0.1291, ctc_loss=0.2513, cr_loss=0.4502, over 34126.00 frames. ], tot_loss[loss=0.3826, simple_loss=0.3766, pruned_loss=0.1554, ctc_loss=0.292, cr_loss=0.4815, over 6764578.83 frames. ], batch size: 78, lr: 3.91e-02, grad_scale: 32.0 2024-09-16 20:44:38,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=27421.333333333332, ans=0.125 2024-09-16 20:45:00,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=27468.0, ans=0.125 2024-09-16 20:45:15,385 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2024-09-16 20:45:47,419 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.291e+02 3.549e+02 4.210e+02 5.243e+02 1.062e+03, threshold=8.420e+02, percent-clipped=4.0 2024-09-16 20:45:47,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=27608.0, ans=0.1 2024-09-16 20:45:49,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=27608.0, ans=0.09899494936611666 2024-09-16 20:45:55,677 INFO [train.py:1198] (1/2) Epoch 2, batch 2050, loss[loss=0.333, simple_loss=0.3343, pruned_loss=0.1314, ctc_loss=0.254, cr_loss=0.456, over 34502.00 frames. ], tot_loss[loss=0.3808, simple_loss=0.375, pruned_loss=0.1546, ctc_loss=0.2905, cr_loss=0.4803, over 6757029.74 frames. ], batch size: 82, lr: 3.91e-02, grad_scale: 32.0 2024-09-16 20:45:57,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=27654.666666666668, ans=0.1 2024-09-16 20:46:01,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=27654.666666666668, ans=0.125 2024-09-16 20:46:12,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=27701.333333333332, ans=0.0 2024-09-16 20:46:22,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=27701.333333333332, ans=0.025 2024-09-16 20:47:18,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=27841.333333333332, ans=0.0 2024-09-16 20:47:21,077 INFO [train.py:1198] (1/2) Epoch 2, batch 2100, loss[loss=0.3514, simple_loss=0.3631, pruned_loss=0.1345, ctc_loss=0.2581, cr_loss=0.4774, over 34523.00 frames. ], tot_loss[loss=0.3792, simple_loss=0.3742, pruned_loss=0.1536, ctc_loss=0.2888, cr_loss=0.4801, over 6770333.87 frames. ], batch size: 94, lr: 3.90e-02, grad_scale: 16.0 2024-09-16 20:47:22,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=27888.0, ans=0.1 2024-09-16 20:47:27,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=27888.0, ans=0.125 2024-09-16 20:47:32,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=27888.0, ans=0.0048069565217391304 2024-09-16 20:47:37,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=27934.666666666668, ans=0.125 2024-09-16 20:47:45,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=27934.666666666668, ans=0.0047968115942028985 2024-09-16 20:47:46,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=27934.666666666668, ans=0.1 2024-09-16 20:47:46,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-09-16 20:47:50,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=27934.666666666668, ans=0.125 2024-09-16 20:48:18,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.00 vs. limit=15.0 2024-09-16 20:48:36,858 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.591e+02 3.472e+02 4.263e+02 5.157e+02 9.542e+02, threshold=8.526e+02, percent-clipped=1.0 2024-09-16 20:48:43,391 INFO [train.py:1198] (1/2) Epoch 2, batch 2150, loss[loss=0.3914, simple_loss=0.3812, pruned_loss=0.1614, ctc_loss=0.2935, cr_loss=0.507, over 34381.00 frames. ], tot_loss[loss=0.3779, simple_loss=0.3731, pruned_loss=0.153, ctc_loss=0.2876, cr_loss=0.4794, over 6789407.64 frames. ], batch size: 91, lr: 3.90e-02, grad_scale: 16.0 2024-09-16 20:48:43,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=28121.333333333332, ans=0.025 2024-09-16 20:48:55,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=28121.333333333332, ans=0.07 2024-09-16 20:49:16,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=28214.666666666668, ans=0.125 2024-09-16 20:49:34,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=28261.333333333332, ans=0.05 2024-09-16 20:49:40,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=25.28 vs. limit=22.5 2024-09-16 20:49:42,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=28261.333333333332, ans=0.125 2024-09-16 20:49:57,710 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 20:49:57,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=28308.0, ans=0.1 2024-09-16 20:50:05,562 INFO [train.py:1198] (1/2) Epoch 2, batch 2200, loss[loss=0.3671, simple_loss=0.3751, pruned_loss=0.1425, ctc_loss=0.2739, cr_loss=0.4802, over 34449.00 frames. ], tot_loss[loss=0.3776, simple_loss=0.3729, pruned_loss=0.1529, ctc_loss=0.2872, cr_loss=0.4791, over 6784585.97 frames. ], batch size: 100, lr: 3.89e-02, grad_scale: 16.0 2024-09-16 20:50:07,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=28354.666666666668, ans=0.1 2024-09-16 20:50:29,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.93 vs. limit=12.0 2024-09-16 20:50:33,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=28401.333333333332, ans=0.1 2024-09-16 20:51:06,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2024-09-16 20:51:24,752 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.340e+02 3.331e+02 4.235e+02 5.130e+02 1.181e+03, threshold=8.469e+02, percent-clipped=3.0 2024-09-16 20:51:31,558 INFO [train.py:1198] (1/2) Epoch 2, batch 2250, loss[loss=0.3794, simple_loss=0.3805, pruned_loss=0.1509, ctc_loss=0.2822, cr_loss=0.5048, over 34415.00 frames. ], tot_loss[loss=0.3773, simple_loss=0.3727, pruned_loss=0.1527, ctc_loss=0.287, cr_loss=0.4786, over 6781460.19 frames. ], batch size: 95, lr: 3.88e-02, grad_scale: 16.0 2024-09-16 20:51:38,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=28588.0, ans=0.95 2024-09-16 20:51:59,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=28634.666666666668, ans=0.0 2024-09-16 20:52:25,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=28728.0, ans=0.025 2024-09-16 20:52:31,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.38 vs. limit=12.0 2024-09-16 20:52:49,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=28774.666666666668, ans=0.0 2024-09-16 20:52:53,536 INFO [train.py:1198] (1/2) Epoch 2, batch 2300, loss[loss=0.3218, simple_loss=0.3276, pruned_loss=0.1247, ctc_loss=0.2451, cr_loss=0.44, over 34690.00 frames. ], tot_loss[loss=0.3757, simple_loss=0.3711, pruned_loss=0.152, ctc_loss=0.286, cr_loss=0.4772, over 6766924.85 frames. ], batch size: 84, lr: 3.88e-02, grad_scale: 16.0 2024-09-16 20:52:55,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=28821.333333333332, ans=0.125 2024-09-16 20:53:03,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28821.333333333332, ans=0.1 2024-09-16 20:53:11,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=28868.0, ans=0.2 2024-09-16 20:53:14,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=28868.0, ans=0.125 2024-09-16 20:53:30,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=11.13 vs. limit=15.0 2024-09-16 20:53:41,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=28961.333333333332, ans=0.02 2024-09-16 20:53:56,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=28961.333333333332, ans=0.125 2024-09-16 20:54:11,214 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.314e+02 3.243e+02 4.112e+02 5.088e+02 9.535e+02, threshold=8.224e+02, percent-clipped=1.0 2024-09-16 20:54:17,739 INFO [train.py:1198] (1/2) Epoch 2, batch 2350, loss[loss=0.3854, simple_loss=0.3821, pruned_loss=0.1548, ctc_loss=0.2986, cr_loss=0.4868, over 34698.00 frames. ], tot_loss[loss=0.3756, simple_loss=0.3711, pruned_loss=0.1519, ctc_loss=0.2859, cr_loss=0.4779, over 6773672.09 frames. ], batch size: 97, lr: 3.87e-02, grad_scale: 16.0 2024-09-16 20:54:19,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=29054.666666666668, ans=0.125 2024-09-16 20:54:44,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff2.min_abs, batch_count=29101.333333333332, ans=0.1 2024-09-16 20:54:44,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=29101.333333333332, ans=0.0 2024-09-16 20:54:52,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=29148.0, ans=0.025 2024-09-16 20:54:58,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.38 vs. limit=15.0 2024-09-16 20:55:42,072 INFO [train.py:1198] (1/2) Epoch 2, batch 2400, loss[loss=0.3547, simple_loss=0.3527, pruned_loss=0.1424, ctc_loss=0.2617, cr_loss=0.489, over 34593.00 frames. ], tot_loss[loss=0.3761, simple_loss=0.3718, pruned_loss=0.152, ctc_loss=0.2862, cr_loss=0.4794, over 6778653.99 frames. ], batch size: 89, lr: 3.86e-02, grad_scale: 32.0 2024-09-16 20:55:45,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=29288.0, ans=0.1 2024-09-16 20:55:47,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=29288.0, ans=0.1 2024-09-16 20:55:58,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=29334.666666666668, ans=0.05 2024-09-16 20:56:02,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=29334.666666666668, ans=0.004492463768115942 2024-09-16 20:56:13,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=29381.333333333332, ans=0.004482318840579711 2024-09-16 20:56:35,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=29428.0, ans=0.0 2024-09-16 20:56:57,984 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.523e+02 3.005e+02 3.614e+02 5.086e+02 9.177e+02, threshold=7.228e+02, percent-clipped=2.0 2024-09-16 20:57:04,580 INFO [train.py:1198] (1/2) Epoch 2, batch 2450, loss[loss=0.3698, simple_loss=0.3714, pruned_loss=0.1471, ctc_loss=0.2731, cr_loss=0.4882, over 34428.00 frames. ], tot_loss[loss=0.3778, simple_loss=0.3732, pruned_loss=0.1528, ctc_loss=0.2872, cr_loss=0.4807, over 6750817.29 frames. ], batch size: 95, lr: 3.86e-02, grad_scale: 16.0 2024-09-16 20:57:08,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2024-09-16 20:57:22,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=29568.0, ans=0.004441739130434783 2024-09-16 20:57:44,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.59 vs. limit=15.0 2024-09-16 20:57:52,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=29614.666666666668, ans=0.125 2024-09-16 20:57:56,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.76 vs. limit=12.0 2024-09-16 20:58:03,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=29661.333333333332, ans=0.0 2024-09-16 20:58:12,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.04 vs. limit=22.5 2024-09-16 20:58:28,294 INFO [train.py:1198] (1/2) Epoch 2, batch 2500, loss[loss=0.3682, simple_loss=0.3733, pruned_loss=0.1442, ctc_loss=0.2744, cr_loss=0.4933, over 34440.00 frames. ], tot_loss[loss=0.3769, simple_loss=0.3725, pruned_loss=0.1524, ctc_loss=0.2864, cr_loss=0.4797, over 6762092.36 frames. ], batch size: 100, lr: 3.85e-02, grad_scale: 16.0 2024-09-16 20:58:29,357 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2024-09-16 20:58:29,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.03 vs. limit=22.5 2024-09-16 20:59:22,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=29894.666666666668, ans=0.0 2024-09-16 20:59:32,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=16.06 vs. limit=15.0 2024-09-16 20:59:37,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.95 vs. limit=6.0 2024-09-16 20:59:44,517 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.23 vs. limit=15.0 2024-09-16 20:59:45,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=29941.333333333332, ans=0.1 2024-09-16 20:59:48,543 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.541e+02 3.126e+02 3.667e+02 4.417e+02 7.118e+02, threshold=7.334e+02, percent-clipped=0.0 2024-09-16 20:59:53,550 INFO [train.py:1198] (1/2) Epoch 2, batch 2550, loss[loss=0.3321, simple_loss=0.331, pruned_loss=0.1322, ctc_loss=0.2504, cr_loss=0.4644, over 34168.00 frames. ], tot_loss[loss=0.3763, simple_loss=0.372, pruned_loss=0.1521, ctc_loss=0.2858, cr_loss=0.48, over 6766528.95 frames. ], batch size: 78, lr: 3.85e-02, grad_scale: 16.0 2024-09-16 21:00:18,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=30034.666666666668, ans=0.07 2024-09-16 21:00:33,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=30081.333333333332, ans=0.125 2024-09-16 21:00:33,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=30081.333333333332, ans=0.125 2024-09-16 21:00:38,231 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:00:38,408 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:00:48,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=30128.0, ans=0.2 2024-09-16 21:01:10,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.43 vs. limit=15.0 2024-09-16 21:01:17,714 INFO [train.py:1198] (1/2) Epoch 2, batch 2600, loss[loss=0.3749, simple_loss=0.3728, pruned_loss=0.1502, ctc_loss=0.2821, cr_loss=0.5041, over 34373.00 frames. ], tot_loss[loss=0.3767, simple_loss=0.3724, pruned_loss=0.1523, ctc_loss=0.2862, cr_loss=0.4808, over 6761378.75 frames. ], batch size: 91, lr: 3.84e-02, grad_scale: 16.0 2024-09-16 21:01:19,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=30221.333333333332, ans=0.125 2024-09-16 21:01:52,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=30314.666666666668, ans=0.2 2024-09-16 21:02:17,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=30361.333333333332, ans=0.125 2024-09-16 21:02:22,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=30361.333333333332, ans=0.125 2024-09-16 21:02:27,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=30408.0, ans=0.125 2024-09-16 21:02:27,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=12.0 2024-09-16 21:02:28,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=30408.0, ans=0.0 2024-09-16 21:02:36,677 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.335e+02 2.999e+02 3.579e+02 5.025e+02 9.188e+02, threshold=7.158e+02, percent-clipped=3.0 2024-09-16 21:02:41,518 INFO [train.py:1198] (1/2) Epoch 2, batch 2650, loss[loss=0.3922, simple_loss=0.3893, pruned_loss=0.1578, ctc_loss=0.2965, cr_loss=0.5068, over 34234.00 frames. ], tot_loss[loss=0.3766, simple_loss=0.3726, pruned_loss=0.152, ctc_loss=0.2859, cr_loss=0.4817, over 6768541.20 frames. ], batch size: 117, lr: 3.83e-02, grad_scale: 16.0 2024-09-16 21:02:53,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=30454.666666666668, ans=0.0 2024-09-16 21:03:03,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=15.0 2024-09-16 21:03:16,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=30548.0, ans=0.2 2024-09-16 21:03:16,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=30548.0, ans=0.125 2024-09-16 21:03:28,487 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.13 vs. limit=6.0 2024-09-16 21:03:44,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2024-09-16 21:03:57,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=30641.333333333332, ans=0.025 2024-09-16 21:03:59,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=30641.333333333332, ans=0.125 2024-09-16 21:04:03,610 INFO [train.py:1198] (1/2) Epoch 2, batch 2700, loss[loss=0.3879, simple_loss=0.3876, pruned_loss=0.1541, ctc_loss=0.2966, cr_loss=0.5135, over 34618.00 frames. ], tot_loss[loss=0.3765, simple_loss=0.3727, pruned_loss=0.152, ctc_loss=0.2856, cr_loss=0.4819, over 6763209.86 frames. ], batch size: 102, lr: 3.83e-02, grad_scale: 16.0 2024-09-16 21:04:12,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=30688.0, ans=0.025 2024-09-16 21:04:43,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=30781.333333333332, ans=0.025 2024-09-16 21:04:43,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=30781.333333333332, ans=0.1 2024-09-16 21:04:49,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=30781.333333333332, ans=0.125 2024-09-16 21:05:15,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=30874.666666666668, ans=0.125 2024-09-16 21:05:22,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=30874.666666666668, ans=0.125 2024-09-16 21:05:23,208 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.667e+02 3.195e+02 3.864e+02 4.747e+02 7.976e+02, threshold=7.727e+02, percent-clipped=2.0 2024-09-16 21:05:28,279 INFO [train.py:1198] (1/2) Epoch 2, batch 2750, loss[loss=0.3585, simple_loss=0.3552, pruned_loss=0.1447, ctc_loss=0.2683, cr_loss=0.4693, over 34644.00 frames. ], tot_loss[loss=0.375, simple_loss=0.3714, pruned_loss=0.1513, ctc_loss=0.2842, cr_loss=0.4804, over 6760223.58 frames. ], batch size: 88, lr: 3.82e-02, grad_scale: 16.0 2024-09-16 21:05:35,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=30921.333333333332, ans=0.125 2024-09-16 21:05:59,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=31014.666666666668, ans=0.2 2024-09-16 21:06:09,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=31014.666666666668, ans=0.125 2024-09-16 21:06:14,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=31014.666666666668, ans=0.1 2024-09-16 21:06:17,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-09-16 21:06:40,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.39 vs. limit=15.0 2024-09-16 21:06:51,877 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.40 vs. limit=22.5 2024-09-16 21:06:52,680 INFO [train.py:1198] (1/2) Epoch 2, batch 2800, loss[loss=0.474, simple_loss=0.4265, pruned_loss=0.2122, ctc_loss=0.3854, cr_loss=0.5013, over 23593.00 frames. ], tot_loss[loss=0.3755, simple_loss=0.3715, pruned_loss=0.1517, ctc_loss=0.2846, cr_loss=0.4795, over 6738542.94 frames. ], batch size: 244, lr: 3.81e-02, grad_scale: 32.0 2024-09-16 21:07:06,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=31154.666666666668, ans=0.125 2024-09-16 21:07:10,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2024-09-16 21:07:11,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=31201.333333333332, ans=0.004086666666666667 2024-09-16 21:07:14,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=31201.333333333332, ans=0.125 2024-09-16 21:07:19,219 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:07:27,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=31248.0, ans=0.004076521739130434 2024-09-16 21:07:34,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=31248.0, ans=0.07 2024-09-16 21:07:40,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=31294.666666666668, ans=0.125 2024-09-16 21:07:45,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=31294.666666666668, ans=0.125 2024-09-16 21:07:45,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=31294.666666666668, ans=0.125 2024-09-16 21:07:45,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=31294.666666666668, ans=0.125 2024-09-16 21:07:47,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.36 vs. limit=15.0 2024-09-16 21:07:48,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=31294.666666666668, ans=0.0 2024-09-16 21:07:51,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.68 vs. limit=15.0 2024-09-16 21:07:52,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=31294.666666666668, ans=0.2 2024-09-16 21:07:59,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=31341.333333333332, ans=0.0 2024-09-16 21:07:59,572 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=18.66 vs. limit=15.0 2024-09-16 21:08:02,433 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:08:11,638 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.476e+02 3.781e+02 4.655e+02 7.282e+02 1.389e+03, threshold=9.311e+02, percent-clipped=20.0 2024-09-16 21:08:14,937 INFO [train.py:1198] (1/2) Epoch 2, batch 2850, loss[loss=0.3673, simple_loss=0.36, pruned_loss=0.1497, ctc_loss=0.2778, cr_loss=0.4893, over 34481.00 frames. ], tot_loss[loss=0.3772, simple_loss=0.3726, pruned_loss=0.1526, ctc_loss=0.286, cr_loss=0.4804, over 6723069.14 frames. ], batch size: 90, lr: 3.81e-02, grad_scale: 16.0 2024-09-16 21:08:33,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=31434.666666666668, ans=0.1 2024-09-16 21:08:35,505 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.33 vs. limit=15.0 2024-09-16 21:08:46,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=31434.666666666668, ans=0.09899494936611666 2024-09-16 21:09:08,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=15.0 2024-09-16 21:09:29,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=31574.666666666668, ans=0.125 2024-09-16 21:09:34,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=31574.666666666668, ans=0.2 2024-09-16 21:09:38,756 INFO [train.py:1198] (1/2) Epoch 2, batch 2900, loss[loss=0.3884, simple_loss=0.3808, pruned_loss=0.1585, ctc_loss=0.297, cr_loss=0.4896, over 34534.00 frames. ], tot_loss[loss=0.3773, simple_loss=0.3732, pruned_loss=0.1525, ctc_loss=0.2858, cr_loss=0.4812, over 6753990.22 frames. ], batch size: 94, lr: 3.80e-02, grad_scale: 16.0 2024-09-16 21:09:39,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.15 vs. limit=10.0 2024-09-16 21:09:43,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=31621.333333333332, ans=0.0 2024-09-16 21:09:44,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.43 vs. limit=22.5 2024-09-16 21:09:45,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=31621.333333333332, ans=0.5 2024-09-16 21:09:46,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2024-09-16 21:09:54,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=31621.333333333332, ans=0.125 2024-09-16 21:10:00,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=31668.0, ans=0.1 2024-09-16 21:10:11,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=25.73 vs. limit=22.5 2024-09-16 21:10:15,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=31714.666666666668, ans=0.0 2024-09-16 21:10:21,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-09-16 21:10:32,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=31761.333333333332, ans=0.0039649275362318845 2024-09-16 21:10:43,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=31761.333333333332, ans=0.0039649275362318845 2024-09-16 21:10:45,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=31808.0, ans=0.05 2024-09-16 21:10:55,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=31808.0, ans=0.0039547826086956525 2024-09-16 21:11:00,035 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.216e+02 3.037e+02 3.479e+02 4.228e+02 7.442e+02, threshold=6.958e+02, percent-clipped=0.0 2024-09-16 21:11:03,409 INFO [train.py:1198] (1/2) Epoch 2, batch 2950, loss[loss=0.3396, simple_loss=0.3429, pruned_loss=0.1337, ctc_loss=0.2531, cr_loss=0.4551, over 34637.00 frames. ], tot_loss[loss=0.375, simple_loss=0.3712, pruned_loss=0.1514, ctc_loss=0.2839, cr_loss=0.479, over 6748520.64 frames. ], batch size: 88, lr: 3.79e-02, grad_scale: 16.0 2024-09-16 21:11:06,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.47 vs. limit=15.0 2024-09-16 21:12:01,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=31994.666666666668, ans=0.125 2024-09-16 21:12:01,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.92 vs. limit=15.0 2024-09-16 21:12:14,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=32041.333333333332, ans=0.2 2024-09-16 21:12:27,546 INFO [train.py:1198] (1/2) Epoch 2, batch 3000, loss[loss=0.3586, simple_loss=0.3601, pruned_loss=0.142, ctc_loss=0.2698, cr_loss=0.4743, over 34539.00 frames. ], tot_loss[loss=0.3743, simple_loss=0.3708, pruned_loss=0.151, ctc_loss=0.2832, cr_loss=0.4789, over 6750831.24 frames. ], batch size: 94, lr: 3.79e-02, grad_scale: 16.0 2024-09-16 21:12:27,547 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 21:12:32,949 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.0669, 2.9912, 2.2920, 2.9864], device='cuda:1') 2024-09-16 21:12:44,271 INFO [train.py:1230] (1/2) Epoch 2, validation: loss=0.2111, simple_loss=0.3024, pruned_loss=0.04943, ctc_loss=0.1043, cr_loss=1.476e-14, over 944034.00 frames. 2024-09-16 21:12:44,272 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-16 21:12:49,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=32088.0, ans=0.2 2024-09-16 21:12:51,512 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:13:06,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=32134.666666666668, ans=0.1 2024-09-16 21:13:20,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=32181.333333333332, ans=0.035 2024-09-16 21:13:49,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=32274.666666666668, ans=0.125 2024-09-16 21:13:51,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=32274.666666666668, ans=0.125 2024-09-16 21:13:54,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=32274.666666666668, ans=0.0038533333333333336 2024-09-16 21:14:03,259 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.11 vs. limit=10.0 2024-09-16 21:14:03,976 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.573e+02 3.284e+02 3.957e+02 5.077e+02 1.479e+03, threshold=7.913e+02, percent-clipped=3.0 2024-09-16 21:14:04,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=32274.666666666668, ans=0.125 2024-09-16 21:14:05,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=32321.333333333332, ans=0.0 2024-09-16 21:14:07,358 INFO [train.py:1198] (1/2) Epoch 2, batch 3050, loss[loss=0.35, simple_loss=0.3525, pruned_loss=0.1384, ctc_loss=0.2587, cr_loss=0.4727, over 34565.00 frames. ], tot_loss[loss=0.3754, simple_loss=0.3717, pruned_loss=0.1515, ctc_loss=0.2842, cr_loss=0.4795, over 6744585.17 frames. ], batch size: 89, lr: 3.78e-02, grad_scale: 16.0 2024-09-16 21:14:09,721 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2024-09-16 21:14:19,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=32321.333333333332, ans=0.125 2024-09-16 21:14:24,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=32368.0, ans=0.2 2024-09-16 21:14:27,153 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:14:32,570 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.27 vs. limit=10.0 2024-09-16 21:14:33,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=32368.0, ans=0.0038330434782608697 2024-09-16 21:14:51,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=32414.666666666668, ans=0.0 2024-09-16 21:15:19,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.40 vs. limit=22.5 2024-09-16 21:15:28,145 INFO [train.py:1198] (1/2) Epoch 2, batch 3100, loss[loss=0.4173, simple_loss=0.4059, pruned_loss=0.1724, ctc_loss=0.3189, cr_loss=0.5012, over 34251.00 frames. ], tot_loss[loss=0.3744, simple_loss=0.371, pruned_loss=0.151, ctc_loss=0.2834, cr_loss=0.4793, over 6742572.33 frames. ], batch size: 117, lr: 3.78e-02, grad_scale: 16.0 2024-09-16 21:16:00,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=32648.0, ans=0.125 2024-09-16 21:16:14,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=32648.0, ans=0.003772173913043478 2024-09-16 21:16:22,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=32694.666666666668, ans=0.2 2024-09-16 21:16:44,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=32741.333333333332, ans=0.003751884057971014 2024-09-16 21:16:45,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.99 vs. limit=22.5 2024-09-16 21:16:46,032 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.354e+02 3.094e+02 3.851e+02 4.799e+02 9.816e+02, threshold=7.703e+02, percent-clipped=2.0 2024-09-16 21:16:48,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=32788.0, ans=0.025 2024-09-16 21:16:49,461 INFO [train.py:1198] (1/2) Epoch 2, batch 3150, loss[loss=0.3834, simple_loss=0.3876, pruned_loss=0.1509, ctc_loss=0.291, cr_loss=0.4807, over 33743.00 frames. ], tot_loss[loss=0.3731, simple_loss=0.3704, pruned_loss=0.1501, ctc_loss=0.2821, cr_loss=0.4785, over 6748475.49 frames. ], batch size: 122, lr: 3.77e-02, grad_scale: 16.0 2024-09-16 21:16:56,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=32788.0, ans=10.0 2024-09-16 21:17:19,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=32834.666666666664, ans=0.2 2024-09-16 21:17:44,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=32928.0, ans=0.003711304347826087 2024-09-16 21:17:46,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=32928.0, ans=0.0 2024-09-16 21:17:48,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2024-09-16 21:17:51,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=15.0 2024-09-16 21:17:56,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.61 vs. limit=15.0 2024-09-16 21:18:03,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=32974.666666666664, ans=0.035 2024-09-16 21:18:11,745 INFO [train.py:1198] (1/2) Epoch 2, batch 3200, loss[loss=0.3487, simple_loss=0.3558, pruned_loss=0.136, ctc_loss=0.2517, cr_loss=0.4782, over 34555.00 frames. ], tot_loss[loss=0.3726, simple_loss=0.37, pruned_loss=0.1499, ctc_loss=0.2815, cr_loss=0.4786, over 6760858.67 frames. ], batch size: 94, lr: 3.76e-02, grad_scale: 32.0 2024-09-16 21:18:33,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.70 vs. limit=15.0 2024-09-16 21:18:47,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=33114.666666666664, ans=0.125 2024-09-16 21:19:02,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=33161.333333333336, ans=0.125 2024-09-16 21:19:07,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=33161.333333333336, ans=0.125 2024-09-16 21:19:16,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=33208.0, ans=0.125 2024-09-16 21:19:16,860 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:19:24,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=33208.0, ans=0.125 2024-09-16 21:19:30,835 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.279e+02 3.230e+02 3.803e+02 4.505e+02 8.218e+02, threshold=7.607e+02, percent-clipped=2.0 2024-09-16 21:19:32,433 INFO [train.py:1198] (1/2) Epoch 2, batch 3250, loss[loss=0.3997, simple_loss=0.3925, pruned_loss=0.1633, ctc_loss=0.3029, cr_loss=0.4917, over 34672.00 frames. ], tot_loss[loss=0.3725, simple_loss=0.37, pruned_loss=0.1498, ctc_loss=0.2812, cr_loss=0.4788, over 6770768.92 frames. ], batch size: 98, lr: 3.76e-02, grad_scale: 16.0 2024-09-16 21:19:35,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.10 vs. limit=6.0 2024-09-16 21:19:48,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.92 vs. limit=22.5 2024-09-16 21:19:52,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=33301.333333333336, ans=0.125 2024-09-16 21:20:14,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=33348.0, ans=0.07 2024-09-16 21:20:54,742 INFO [train.py:1198] (1/2) Epoch 2, batch 3300, loss[loss=0.4132, simple_loss=0.4011, pruned_loss=0.1705, ctc_loss=0.3161, cr_loss=0.5273, over 33082.00 frames. ], tot_loss[loss=0.3706, simple_loss=0.3682, pruned_loss=0.149, ctc_loss=0.2799, cr_loss=0.4768, over 6769055.53 frames. ], batch size: 130, lr: 3.75e-02, grad_scale: 16.0 2024-09-16 21:21:14,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=33534.666666666664, ans=0.1 2024-09-16 21:21:24,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=33534.666666666664, ans=0.125 2024-09-16 21:21:34,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=33581.333333333336, ans=0.125 2024-09-16 21:21:39,675 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.95 vs. limit=10.0 2024-09-16 21:21:51,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=33628.0, ans=0.1 2024-09-16 21:21:52,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=33628.0, ans=15.0 2024-09-16 21:22:03,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33674.666666666664, ans=0.1 2024-09-16 21:22:04,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=33674.666666666664, ans=0.07 2024-09-16 21:22:14,037 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.509e+02 3.419e+02 4.542e+02 5.776e+02 1.335e+03, threshold=9.085e+02, percent-clipped=9.0 2024-09-16 21:22:15,686 INFO [train.py:1198] (1/2) Epoch 2, batch 3350, loss[loss=0.3897, simple_loss=0.3901, pruned_loss=0.1544, ctc_loss=0.2954, cr_loss=0.5395, over 33838.00 frames. ], tot_loss[loss=0.3722, simple_loss=0.3693, pruned_loss=0.1499, ctc_loss=0.2812, cr_loss=0.478, over 6743691.11 frames. ], batch size: 122, lr: 3.74e-02, grad_scale: 16.0 2024-09-16 21:22:22,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=33721.333333333336, ans=0.125 2024-09-16 21:23:02,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=33814.666666666664, ans=0.003518550724637682 2024-09-16 21:23:22,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.57 vs. limit=22.5 2024-09-16 21:23:29,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=33908.0, ans=0.0 2024-09-16 21:23:33,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=23.14 vs. limit=22.5 2024-09-16 21:23:37,559 INFO [train.py:1198] (1/2) Epoch 2, batch 3400, loss[loss=0.3374, simple_loss=0.3362, pruned_loss=0.1354, ctc_loss=0.2505, cr_loss=0.4415, over 34159.00 frames. ], tot_loss[loss=0.3717, simple_loss=0.369, pruned_loss=0.1496, ctc_loss=0.2809, cr_loss=0.4771, over 6733392.31 frames. ], batch size: 78, lr: 3.74e-02, grad_scale: 16.0 2024-09-16 21:23:50,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=33954.666666666664, ans=0.125 2024-09-16 21:24:16,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=34048.0, ans=0.125 2024-09-16 21:24:18,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.34 vs. limit=15.0 2024-09-16 21:24:23,696 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=15.0 2024-09-16 21:24:46,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=34141.333333333336, ans=0.0034475362318840573 2024-09-16 21:24:57,136 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.687e+02 3.149e+02 3.731e+02 4.409e+02 7.231e+02, threshold=7.462e+02, percent-clipped=0.0 2024-09-16 21:24:58,798 INFO [train.py:1198] (1/2) Epoch 2, batch 3450, loss[loss=0.3831, simple_loss=0.3803, pruned_loss=0.1552, ctc_loss=0.2851, cr_loss=0.4629, over 33114.00 frames. ], tot_loss[loss=0.3717, simple_loss=0.3692, pruned_loss=0.1494, ctc_loss=0.2807, cr_loss=0.4779, over 6745614.88 frames. ], batch size: 130, lr: 3.73e-02, grad_scale: 16.0 2024-09-16 21:24:59,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=34188.0, ans=0.025 2024-09-16 21:25:04,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=34188.0, ans=0.2 2024-09-16 21:25:39,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.05 vs. limit=15.0 2024-09-16 21:25:43,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=34281.333333333336, ans=0.0 2024-09-16 21:25:56,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=34328.0, ans=0.1 2024-09-16 21:26:10,357 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2024-09-16 21:26:18,869 INFO [train.py:1198] (1/2) Epoch 2, batch 3500, loss[loss=0.3352, simple_loss=0.3402, pruned_loss=0.1321, ctc_loss=0.2455, cr_loss=0.4175, over 34476.00 frames. ], tot_loss[loss=0.3702, simple_loss=0.3681, pruned_loss=0.1487, ctc_loss=0.2793, cr_loss=0.4768, over 6747947.22 frames. ], batch size: 85, lr: 3.73e-02, grad_scale: 16.0 2024-09-16 21:26:35,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=34468.0, ans=0.125 2024-09-16 21:26:43,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=34468.0, ans=0.1 2024-09-16 21:26:43,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.86 vs. limit=15.0 2024-09-16 21:27:11,230 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-16 21:27:26,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=34608.0, ans=0.07 2024-09-16 21:27:38,706 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.643e+02 3.543e+02 4.398e+02 5.768e+02 1.037e+03, threshold=8.797e+02, percent-clipped=10.0 2024-09-16 21:27:39,194 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:27:40,249 INFO [train.py:1198] (1/2) Epoch 2, batch 3550, loss[loss=0.3896, simple_loss=0.3892, pruned_loss=0.1557, ctc_loss=0.294, cr_loss=0.4938, over 34395.00 frames. ], tot_loss[loss=0.3702, simple_loss=0.3683, pruned_loss=0.1486, ctc_loss=0.2789, cr_loss=0.4777, over 6757082.27 frames. ], batch size: 103, lr: 3.72e-02, grad_scale: 16.0 2024-09-16 21:27:48,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=34654.666666666664, ans=0.003335942028985508 2024-09-16 21:28:05,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2024-09-16 21:28:30,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=34794.666666666664, ans=0.0 2024-09-16 21:28:31,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=34794.666666666664, ans=0.0 2024-09-16 21:28:37,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.80 vs. limit=10.0 2024-09-16 21:28:57,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=34841.333333333336, ans=0.125 2024-09-16 21:29:00,290 INFO [train.py:1198] (1/2) Epoch 2, batch 3600, loss[loss=0.3679, simple_loss=0.3639, pruned_loss=0.1483, ctc_loss=0.2813, cr_loss=0.4749, over 34491.00 frames. ], tot_loss[loss=0.3704, simple_loss=0.3683, pruned_loss=0.1488, ctc_loss=0.279, cr_loss=0.4783, over 6766757.64 frames. ], batch size: 90, lr: 3.71e-02, grad_scale: 32.0 2024-09-16 21:29:09,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=34888.0, ans=0.0032852173913043483 2024-09-16 21:29:21,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=34934.666666666664, ans=0.125 2024-09-16 21:29:25,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=34934.666666666664, ans=0.125 2024-09-16 21:29:26,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.88 vs. limit=22.5 2024-09-16 21:30:10,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=35074.666666666664, ans=0.125 2024-09-16 21:30:15,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=35074.666666666664, ans=0.125 2024-09-16 21:30:17,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35074.666666666664, ans=0.1 2024-09-16 21:30:21,888 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.231e+02 3.213e+02 4.105e+02 5.041e+02 7.761e+02, threshold=8.211e+02, percent-clipped=0.0 2024-09-16 21:30:21,907 INFO [train.py:1198] (1/2) Epoch 2, batch 3650, loss[loss=0.3785, simple_loss=0.3739, pruned_loss=0.1532, ctc_loss=0.2896, cr_loss=0.4688, over 34428.00 frames. ], tot_loss[loss=0.3689, simple_loss=0.3672, pruned_loss=0.148, ctc_loss=0.2776, cr_loss=0.4775, over 6768921.75 frames. ], batch size: 110, lr: 3.71e-02, grad_scale: 16.0 2024-09-16 21:30:33,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=35121.333333333336, ans=0.95 2024-09-16 21:30:40,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=35168.0, ans=0.125 2024-09-16 21:30:42,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=35168.0, ans=0.125 2024-09-16 21:30:42,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=35168.0, ans=0.07 2024-09-16 21:30:51,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=35168.0, ans=0.0 2024-09-16 21:30:58,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=35214.666666666664, ans=0.0 2024-09-16 21:31:03,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=35214.666666666664, ans=0.0 2024-09-16 21:31:11,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.82 vs. limit=15.0 2024-09-16 21:31:14,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=35261.333333333336, ans=0.125 2024-09-16 21:31:31,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=35308.0, ans=0.125 2024-09-16 21:31:42,938 INFO [train.py:1198] (1/2) Epoch 2, batch 3700, loss[loss=0.3874, simple_loss=0.3893, pruned_loss=0.1539, ctc_loss=0.2898, cr_loss=0.4926, over 34603.00 frames. ], tot_loss[loss=0.3688, simple_loss=0.3674, pruned_loss=0.1479, ctc_loss=0.2774, cr_loss=0.4775, over 6783593.47 frames. ], batch size: 102, lr: 3.70e-02, grad_scale: 16.0 2024-09-16 21:32:09,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=35401.333333333336, ans=0.125 2024-09-16 21:32:12,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=35401.333333333336, ans=0.0031736231884057966 2024-09-16 21:32:14,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=35448.0, ans=0.125 2024-09-16 21:32:15,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=35448.0, ans=0.025 2024-09-16 21:32:33,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=35494.666666666664, ans=0.2 2024-09-16 21:32:33,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.02 vs. limit=22.5 2024-09-16 21:33:03,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=35588.0, ans=0.1 2024-09-16 21:33:04,604 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.630e+02 3.152e+02 3.725e+02 4.555e+02 7.472e+02, threshold=7.450e+02, percent-clipped=0.0 2024-09-16 21:33:04,624 INFO [train.py:1198] (1/2) Epoch 2, batch 3750, loss[loss=0.386, simple_loss=0.3856, pruned_loss=0.1547, ctc_loss=0.2896, cr_loss=0.4762, over 34348.00 frames. ], tot_loss[loss=0.3735, simple_loss=0.3715, pruned_loss=0.15, ctc_loss=0.2811, cr_loss=0.4824, over 6785661.46 frames. ], batch size: 113, lr: 3.69e-02, grad_scale: 16.0 2024-09-16 21:33:09,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=35588.0, ans=0.025 2024-09-16 21:33:11,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35588.0, ans=0.1 2024-09-16 21:33:18,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=35588.0, ans=0.1 2024-09-16 21:33:24,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=35634.666666666664, ans=0.0 2024-09-16 21:33:52,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=35728.0, ans=0.125 2024-09-16 21:34:25,990 INFO [train.py:1198] (1/2) Epoch 2, batch 3800, loss[loss=0.3876, simple_loss=0.3756, pruned_loss=0.1603, ctc_loss=0.301, cr_loss=0.4722, over 29883.00 frames. ], tot_loss[loss=0.3791, simple_loss=0.3753, pruned_loss=0.1531, ctc_loss=0.2864, cr_loss=0.4856, over 6678080.36 frames. ], batch size: 175, lr: 3.69e-02, grad_scale: 16.0 2024-09-16 21:34:34,927 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:34:50,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.76 vs. limit=15.0 2024-09-16 21:35:20,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=35961.333333333336, ans=0.0 2024-09-16 21:35:24,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=35961.333333333336, ans=0.125 2024-09-16 21:35:29,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=35961.333333333336, ans=0.2 2024-09-16 21:35:37,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=36008.0, ans=0.125 2024-09-16 21:35:50,044 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.448e+02 2.990e+02 3.442e+02 4.066e+02 5.954e+02, threshold=6.885e+02, percent-clipped=0.0 2024-09-16 21:35:50,068 INFO [train.py:1198] (1/2) Epoch 2, batch 3850, loss[loss=0.42, simple_loss=0.3952, pruned_loss=0.1796, ctc_loss=0.3383, cr_loss=0.4471, over 23905.00 frames. ], tot_loss[loss=0.3888, simple_loss=0.3808, pruned_loss=0.1589, ctc_loss=0.2976, cr_loss=0.4867, over 6250604.65 frames. ], batch size: 245, lr: 3.68e-02, grad_scale: 16.0 2024-09-16 21:35:52,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=36054.666666666664, ans=0.0 2024-09-16 21:36:10,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=36101.333333333336, ans=0.125 2024-09-16 21:37:03,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=36180.666666666664, ans=0.0 2024-09-16 21:37:19,530 INFO [train.py:1198] (1/2) Epoch 3, batch 0, loss[loss=0.3583, simple_loss=0.3579, pruned_loss=0.1428, ctc_loss=0.2747, cr_loss=0.4542, over 34471.00 frames. ], tot_loss[loss=0.3583, simple_loss=0.3579, pruned_loss=0.1428, ctc_loss=0.2747, cr_loss=0.4542, over 34471.00 frames. ], batch size: 85, lr: 3.50e-02, grad_scale: 32.0 2024-09-16 21:37:19,531 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 21:37:36,951 INFO [train.py:1230] (1/2) Epoch 3, validation: loss=0.217, simple_loss=0.3087, pruned_loss=0.05208, ctc_loss=0.1057, cr_loss=1.694e-14, over 944034.00 frames. 2024-09-16 21:37:36,952 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-16 21:37:48,728 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:37:55,490 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:37:59,617 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2024-09-16 21:38:09,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=12.0 2024-09-16 21:38:10,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=36274.0, ans=0.2 2024-09-16 21:38:26,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=36320.666666666664, ans=0.07 2024-09-16 21:38:46,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=36367.333333333336, ans=0.0 2024-09-16 21:38:46,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=36367.333333333336, ans=0.09899494936611666 2024-09-16 21:39:01,091 INFO [train.py:1198] (1/2) Epoch 3, batch 50, loss[loss=0.3267, simple_loss=0.3265, pruned_loss=0.1309, ctc_loss=0.2442, cr_loss=0.4027, over 34491.00 frames. ], tot_loss[loss=0.3791, simple_loss=0.3751, pruned_loss=0.1534, ctc_loss=0.2863, cr_loss=0.4804, over 1481229.52 frames. ], batch size: 82, lr: 3.49e-02, grad_scale: 32.0 2024-09-16 21:39:01,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2024-09-16 21:39:09,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=36414.0, ans=0.0 2024-09-16 21:39:25,902 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:39:32,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=36507.333333333336, ans=0.5 2024-09-16 21:39:37,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=36507.333333333336, ans=0.07 2024-09-16 21:39:38,896 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.554e+02 3.376e+02 4.062e+02 5.142e+02 9.817e+02, threshold=8.124e+02, percent-clipped=13.0 2024-09-16 21:39:47,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=36507.333333333336, ans=0.125 2024-09-16 21:40:23,228 INFO [train.py:1198] (1/2) Epoch 3, batch 100, loss[loss=0.333, simple_loss=0.3393, pruned_loss=0.1299, ctc_loss=0.2495, cr_loss=0.4271, over 34597.00 frames. ], tot_loss[loss=0.3773, simple_loss=0.3744, pruned_loss=0.1519, ctc_loss=0.2847, cr_loss=0.4835, over 2628380.33 frames. ], batch size: 89, lr: 3.48e-02, grad_scale: 32.0 2024-09-16 21:41:03,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.78 vs. limit=10.0 2024-09-16 21:41:38,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.48 vs. limit=15.0 2024-09-16 21:41:48,788 INFO [train.py:1198] (1/2) Epoch 3, batch 150, loss[loss=0.3238, simple_loss=0.3319, pruned_loss=0.1252, ctc_loss=0.2405, cr_loss=0.4313, over 34497.00 frames. ], tot_loss[loss=0.3701, simple_loss=0.3693, pruned_loss=0.1481, ctc_loss=0.2781, cr_loss=0.4809, over 3557502.82 frames. ], batch size: 82, lr: 3.48e-02, grad_scale: 32.0 2024-09-16 21:41:50,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=36880.666666666664, ans=0.025 2024-09-16 21:42:26,321 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.388e+02 3.191e+02 3.740e+02 4.674e+02 6.890e+02, threshold=7.481e+02, percent-clipped=0.0 2024-09-16 21:42:28,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=36974.0, ans=0.125 2024-09-16 21:42:33,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=36974.0, ans=0.07 2024-09-16 21:42:40,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2024-09-16 21:42:49,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=37020.666666666664, ans=0.035 2024-09-16 21:43:10,560 INFO [train.py:1198] (1/2) Epoch 3, batch 200, loss[loss=0.3793, simple_loss=0.3776, pruned_loss=0.1516, ctc_loss=0.2858, cr_loss=0.5174, over 31895.00 frames. ], tot_loss[loss=0.3666, simple_loss=0.3665, pruned_loss=0.1463, ctc_loss=0.2751, cr_loss=0.4796, over 4272660.36 frames. ], batch size: 145, lr: 3.47e-02, grad_scale: 32.0 2024-09-16 21:43:35,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37160.666666666664, ans=0.1 2024-09-16 21:43:40,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=37160.666666666664, ans=0.1 2024-09-16 21:43:43,812 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:43:57,259 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2024-09-16 21:44:03,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=37254.0, ans=0.125 2024-09-16 21:44:22,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.92 vs. limit=15.0 2024-09-16 21:44:39,049 INFO [train.py:1198] (1/2) Epoch 3, batch 250, loss[loss=0.3709, simple_loss=0.3737, pruned_loss=0.1465, ctc_loss=0.2807, cr_loss=0.4706, over 34228.00 frames. ], tot_loss[loss=0.3653, simple_loss=0.3652, pruned_loss=0.1457, ctc_loss=0.2742, cr_loss=0.4785, over 4834865.32 frames. ], batch size: 117, lr: 3.47e-02, grad_scale: 32.0 2024-09-16 21:44:44,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=37347.333333333336, ans=0.025 2024-09-16 21:44:47,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=37347.333333333336, ans=0.1 2024-09-16 21:44:48,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.38 vs. limit=15.0 2024-09-16 21:45:03,373 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:45:22,449 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 2.909e+02 3.293e+02 4.102e+02 7.429e+02, threshold=6.586e+02, percent-clipped=0.0 2024-09-16 21:45:26,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=37440.666666666664, ans=0.2 2024-09-16 21:45:31,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=37487.333333333336, ans=0.125 2024-09-16 21:45:31,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=37487.333333333336, ans=0.0 2024-09-16 21:45:37,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=37487.333333333336, ans=0.125 2024-09-16 21:45:52,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=37534.0, ans=0.2 2024-09-16 21:45:56,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.67 vs. limit=22.5 2024-09-16 21:46:05,122 INFO [train.py:1198] (1/2) Epoch 3, batch 300, loss[loss=0.3855, simple_loss=0.3814, pruned_loss=0.1559, ctc_loss=0.2879, cr_loss=0.5049, over 34350.00 frames. ], tot_loss[loss=0.364, simple_loss=0.3641, pruned_loss=0.1451, ctc_loss=0.2727, cr_loss=0.4775, over 5263198.93 frames. ], batch size: 107, lr: 3.46e-02, grad_scale: 16.0 2024-09-16 21:46:07,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=37580.666666666664, ans=0.1 2024-09-16 21:46:23,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=37627.333333333336, ans=0.0 2024-09-16 21:46:29,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37627.333333333336, ans=0.1 2024-09-16 21:47:27,133 INFO [train.py:1198] (1/2) Epoch 3, batch 350, loss[loss=0.3212, simple_loss=0.3286, pruned_loss=0.1249, ctc_loss=0.233, cr_loss=0.435, over 34272.00 frames. ], tot_loss[loss=0.3639, simple_loss=0.3642, pruned_loss=0.145, ctc_loss=0.2725, cr_loss=0.4781, over 5596714.30 frames. ], batch size: 83, lr: 3.46e-02, grad_scale: 16.0 2024-09-16 21:47:41,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.65 vs. limit=22.5 2024-09-16 21:47:46,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-16 21:48:06,463 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.247e+02 3.102e+02 3.610e+02 5.266e+02 1.094e+03, threshold=7.220e+02, percent-clipped=8.0 2024-09-16 21:48:17,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.48 vs. limit=15.0 2024-09-16 21:48:27,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=37954.0, ans=15.0 2024-09-16 21:48:53,055 INFO [train.py:1198] (1/2) Epoch 3, batch 400, loss[loss=0.3698, simple_loss=0.3775, pruned_loss=0.145, ctc_loss=0.2672, cr_loss=0.4646, over 34411.00 frames. ], tot_loss[loss=0.3632, simple_loss=0.3637, pruned_loss=0.1446, ctc_loss=0.2717, cr_loss=0.4775, over 5864251.54 frames. ], batch size: 95, lr: 3.45e-02, grad_scale: 32.0 2024-09-16 21:49:31,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=38140.666666666664, ans=0.2 2024-09-16 21:49:37,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=38140.666666666664, ans=0.125 2024-09-16 21:50:12,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=38234.0, ans=0.0 2024-09-16 21:50:15,736 INFO [train.py:1198] (1/2) Epoch 3, batch 450, loss[loss=0.3815, simple_loss=0.3823, pruned_loss=0.1522, ctc_loss=0.2788, cr_loss=0.514, over 34698.00 frames. ], tot_loss[loss=0.3637, simple_loss=0.364, pruned_loss=0.1449, ctc_loss=0.2718, cr_loss=0.4783, over 6053654.08 frames. ], batch size: 97, lr: 3.44e-02, grad_scale: 32.0 2024-09-16 21:50:19,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.56 vs. limit=15.0 2024-09-16 21:50:25,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=38280.666666666664, ans=0.125 2024-09-16 21:50:47,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=38374.0, ans=0.025 2024-09-16 21:50:47,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-09-16 21:50:55,189 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.340e+02 2.882e+02 3.709e+02 4.758e+02 1.162e+03, threshold=7.419e+02, percent-clipped=7.0 2024-09-16 21:51:08,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=38420.666666666664, ans=0.04949747468305833 2024-09-16 21:51:22,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=38467.333333333336, ans=0.125 2024-09-16 21:51:38,117 INFO [train.py:1198] (1/2) Epoch 3, batch 500, loss[loss=0.3854, simple_loss=0.3825, pruned_loss=0.1555, ctc_loss=0.2873, cr_loss=0.497, over 34427.00 frames. ], tot_loss[loss=0.361, simple_loss=0.3621, pruned_loss=0.1435, ctc_loss=0.2695, cr_loss=0.4768, over 6220404.91 frames. ], batch size: 110, lr: 3.44e-02, grad_scale: 32.0 2024-09-16 21:51:38,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=38514.0, ans=0.125 2024-09-16 21:51:46,693 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:51:48,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=38514.0, ans=0.07 2024-09-16 21:52:04,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=38560.666666666664, ans=0.2 2024-09-16 21:52:36,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38654.0, ans=0.1 2024-09-16 21:53:02,586 INFO [train.py:1198] (1/2) Epoch 3, batch 550, loss[loss=0.3818, simple_loss=0.3845, pruned_loss=0.1505, ctc_loss=0.2877, cr_loss=0.5142, over 33811.00 frames. ], tot_loss[loss=0.3606, simple_loss=0.3619, pruned_loss=0.1432, ctc_loss=0.2692, cr_loss=0.4767, over 6329903.52 frames. ], batch size: 122, lr: 3.43e-02, grad_scale: 16.0 2024-09-16 21:53:43,318 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.367e+02 3.199e+02 3.903e+02 4.944e+02 1.085e+03, threshold=7.806e+02, percent-clipped=4.0 2024-09-16 21:53:56,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=38887.333333333336, ans=0.1 2024-09-16 21:53:58,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=38887.333333333336, ans=0.025 2024-09-16 21:54:23,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=38980.666666666664, ans=0.0 2024-09-16 21:54:24,430 INFO [train.py:1198] (1/2) Epoch 3, batch 600, loss[loss=0.4084, simple_loss=0.3985, pruned_loss=0.1674, ctc_loss=0.3151, cr_loss=0.514, over 34195.00 frames. ], tot_loss[loss=0.3605, simple_loss=0.362, pruned_loss=0.1431, ctc_loss=0.2689, cr_loss=0.4761, over 6432626.06 frames. ], batch size: 117, lr: 3.43e-02, grad_scale: 16.0 2024-09-16 21:54:26,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=38980.666666666664, ans=0.125 2024-09-16 21:54:37,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=38980.666666666664, ans=0.125 2024-09-16 21:54:41,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.53 vs. limit=15.0 2024-09-16 21:54:50,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=39027.333333333336, ans=0.0 2024-09-16 21:55:40,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=39167.333333333336, ans=0.125 2024-09-16 21:55:46,586 INFO [train.py:1198] (1/2) Epoch 3, batch 650, loss[loss=0.362, simple_loss=0.3632, pruned_loss=0.143, ctc_loss=0.2757, cr_loss=0.4917, over 34560.00 frames. ], tot_loss[loss=0.3584, simple_loss=0.3605, pruned_loss=0.1419, ctc_loss=0.2672, cr_loss=0.4746, over 6523789.82 frames. ], batch size: 94, lr: 3.42e-02, grad_scale: 16.0 2024-09-16 21:55:49,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.76 vs. limit=10.0 2024-09-16 21:56:04,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.48 vs. limit=6.0 2024-09-16 21:56:18,639 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 21:56:31,704 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.283e+02 2.932e+02 3.405e+02 3.966e+02 9.288e+02, threshold=6.810e+02, percent-clipped=2.0 2024-09-16 21:56:33,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=39307.333333333336, ans=0.0 2024-09-16 21:56:48,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=39354.0, ans=0.125 2024-09-16 21:57:09,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=39400.666666666664, ans=0.125 2024-09-16 21:57:10,401 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.88 vs. limit=10.0 2024-09-16 21:57:12,622 INFO [train.py:1198] (1/2) Epoch 3, batch 700, loss[loss=0.3528, simple_loss=0.353, pruned_loss=0.1407, ctc_loss=0.2627, cr_loss=0.4657, over 34603.00 frames. ], tot_loss[loss=0.359, simple_loss=0.3611, pruned_loss=0.1422, ctc_loss=0.2675, cr_loss=0.4758, over 6581843.13 frames. ], batch size: 89, lr: 3.42e-02, grad_scale: 16.0 2024-09-16 21:57:17,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=39447.333333333336, ans=0.0 2024-09-16 21:57:42,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=39494.0, ans=0.125 2024-09-16 21:57:52,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=39540.666666666664, ans=0.125 2024-09-16 21:58:19,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=39634.0, ans=0.125 2024-09-16 21:58:25,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=39634.0, ans=0.125 2024-09-16 21:58:35,001 INFO [train.py:1198] (1/2) Epoch 3, batch 750, loss[loss=0.3605, simple_loss=0.3677, pruned_loss=0.1411, ctc_loss=0.2617, cr_loss=0.4708, over 34428.00 frames. ], tot_loss[loss=0.3581, simple_loss=0.3603, pruned_loss=0.1417, ctc_loss=0.2668, cr_loss=0.4749, over 6624726.12 frames. ], batch size: 95, lr: 3.41e-02, grad_scale: 16.0 2024-09-16 21:58:43,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=39680.666666666664, ans=0.1 2024-09-16 21:59:07,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=39774.0, ans=0.125 2024-09-16 21:59:09,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=39774.0, ans=0.125 2024-09-16 21:59:15,676 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.207e+02 3.136e+02 4.244e+02 5.252e+02 8.104e+02, threshold=8.488e+02, percent-clipped=5.0 2024-09-16 21:59:34,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=39820.666666666664, ans=0.125 2024-09-16 21:59:41,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=39867.333333333336, ans=0.125 2024-09-16 21:59:43,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.38 vs. limit=15.0 2024-09-16 21:59:58,859 INFO [train.py:1198] (1/2) Epoch 3, batch 800, loss[loss=0.3202, simple_loss=0.3278, pruned_loss=0.1235, ctc_loss=0.2372, cr_loss=0.4541, over 34451.00 frames. ], tot_loss[loss=0.3575, simple_loss=0.3601, pruned_loss=0.1414, ctc_loss=0.2662, cr_loss=0.4751, over 6660069.74 frames. ], batch size: 85, lr: 3.40e-02, grad_scale: 32.0 2024-09-16 21:59:59,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=39914.0, ans=0.1 2024-09-16 22:00:05,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=39914.0, ans=0.0021926086956521743 2024-09-16 22:00:53,867 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:01:03,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.37 vs. limit=10.0 2024-09-16 22:01:23,122 INFO [train.py:1198] (1/2) Epoch 3, batch 850, loss[loss=0.3583, simple_loss=0.3674, pruned_loss=0.1393, ctc_loss=0.262, cr_loss=0.4529, over 34377.00 frames. ], tot_loss[loss=0.3565, simple_loss=0.3593, pruned_loss=0.1408, ctc_loss=0.2651, cr_loss=0.4741, over 6691390.81 frames. ], batch size: 103, lr: 3.40e-02, grad_scale: 32.0 2024-09-16 22:01:31,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=40147.333333333336, ans=0.125 2024-09-16 22:01:39,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=40194.0, ans=0.125 2024-09-16 22:01:39,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=40194.0, ans=0.125 2024-09-16 22:02:03,707 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.335e+02 2.992e+02 3.416e+02 4.131e+02 6.485e+02, threshold=6.831e+02, percent-clipped=0.0 2024-09-16 22:02:18,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=40287.333333333336, ans=0.2 2024-09-16 22:02:21,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=40287.333333333336, ans=22.5 2024-09-16 22:02:31,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=40334.0, ans=15.0 2024-09-16 22:02:32,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=40334.0, ans=0.0021013043478260857 2024-09-16 22:02:34,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=40334.0, ans=0.1 2024-09-16 22:02:45,276 INFO [train.py:1198] (1/2) Epoch 3, batch 900, loss[loss=0.3175, simple_loss=0.3259, pruned_loss=0.1219, ctc_loss=0.2364, cr_loss=0.4488, over 34521.00 frames. ], tot_loss[loss=0.3576, simple_loss=0.3601, pruned_loss=0.1414, ctc_loss=0.2663, cr_loss=0.4752, over 6698301.19 frames. ], batch size: 85, lr: 3.39e-02, grad_scale: 32.0 2024-09-16 22:02:45,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=40380.666666666664, ans=0.1 2024-09-16 22:02:47,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=40380.666666666664, ans=0.125 2024-09-16 22:02:47,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2024-09-16 22:03:03,492 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:03:46,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.30 vs. limit=15.0 2024-09-16 22:03:56,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=40567.333333333336, ans=0.125 2024-09-16 22:03:59,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=40567.333333333336, ans=0.0 2024-09-16 22:04:00,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.96 vs. limit=15.0 2024-09-16 22:04:10,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=40614.0, ans=15.0 2024-09-16 22:04:10,807 INFO [train.py:1198] (1/2) Epoch 3, batch 950, loss[loss=0.3219, simple_loss=0.3349, pruned_loss=0.122, ctc_loss=0.2385, cr_loss=0.4262, over 34720.00 frames. ], tot_loss[loss=0.3577, simple_loss=0.3601, pruned_loss=0.1415, ctc_loss=0.2664, cr_loss=0.475, over 6700476.99 frames. ], batch size: 87, lr: 3.39e-02, grad_scale: 16.0 2024-09-16 22:04:19,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=40614.0, ans=0.0 2024-09-16 22:04:21,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=40614.0, ans=0.2 2024-09-16 22:04:22,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=40614.0, ans=0.125 2024-09-16 22:04:52,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=40707.333333333336, ans=0.1 2024-09-16 22:04:52,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.93 vs. limit=15.0 2024-09-16 22:04:53,424 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.284e+02 3.263e+02 3.915e+02 4.807e+02 7.457e+02, threshold=7.829e+02, percent-clipped=3.0 2024-09-16 22:05:05,696 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.34 vs. limit=6.0 2024-09-16 22:05:09,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-09-16 22:05:20,969 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-09-16 22:05:32,654 INFO [train.py:1198] (1/2) Epoch 3, batch 1000, loss[loss=0.3343, simple_loss=0.3401, pruned_loss=0.1305, ctc_loss=0.2481, cr_loss=0.4457, over 34462.00 frames. ], tot_loss[loss=0.3585, simple_loss=0.3605, pruned_loss=0.142, ctc_loss=0.2672, cr_loss=0.4759, over 6693909.01 frames. ], batch size: 90, lr: 3.38e-02, grad_scale: 16.0 2024-09-16 22:05:38,932 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.38 vs. limit=15.0 2024-09-16 22:05:44,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40847.333333333336, ans=0.1 2024-09-16 22:05:47,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40894.0, ans=0.1 2024-09-16 22:05:51,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2024-09-16 22:05:54,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=40894.0, ans=0.1 2024-09-16 22:05:55,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=40894.0, ans=0.0 2024-09-16 22:05:56,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=40894.0, ans=0.125 2024-09-16 22:06:22,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=40987.333333333336, ans=0.0 2024-09-16 22:06:22,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=40987.333333333336, ans=0.2 2024-09-16 22:06:27,440 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:06:34,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.79 vs. limit=15.0 2024-09-16 22:06:55,064 INFO [train.py:1198] (1/2) Epoch 3, batch 1050, loss[loss=0.3691, simple_loss=0.3769, pruned_loss=0.1438, ctc_loss=0.272, cr_loss=0.484, over 34589.00 frames. ], tot_loss[loss=0.3572, simple_loss=0.3594, pruned_loss=0.1414, ctc_loss=0.2662, cr_loss=0.4755, over 6704291.92 frames. ], batch size: 99, lr: 3.38e-02, grad_scale: 16.0 2024-09-16 22:07:38,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=41174.0, ans=0.0 2024-09-16 22:07:39,666 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.339e+02 2.879e+02 3.397e+02 4.243e+02 6.848e+02, threshold=6.794e+02, percent-clipped=0.0 2024-09-16 22:07:45,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=41174.0, ans=0.07 2024-09-16 22:08:18,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2024-09-16 22:08:20,967 INFO [train.py:1198] (1/2) Epoch 3, batch 1100, loss[loss=0.3498, simple_loss=0.358, pruned_loss=0.1351, ctc_loss=0.2598, cr_loss=0.4827, over 34344.00 frames. ], tot_loss[loss=0.3565, simple_loss=0.359, pruned_loss=0.1409, ctc_loss=0.2658, cr_loss=0.4753, over 6717545.71 frames. ], batch size: 91, lr: 3.37e-02, grad_scale: 16.0 2024-09-16 22:08:26,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=41314.0, ans=0.1 2024-09-16 22:08:34,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=41314.0, ans=0.025 2024-09-16 22:08:36,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=41360.666666666664, ans=0.0 2024-09-16 22:08:40,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=41360.666666666664, ans=0.025 2024-09-16 22:09:28,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=41500.666666666664, ans=0.02 2024-09-16 22:09:42,902 INFO [train.py:1198] (1/2) Epoch 3, batch 1150, loss[loss=0.3627, simple_loss=0.3633, pruned_loss=0.1449, ctc_loss=0.2694, cr_loss=0.4626, over 34387.00 frames. ], tot_loss[loss=0.3565, simple_loss=0.3589, pruned_loss=0.141, ctc_loss=0.2656, cr_loss=0.4751, over 6716637.15 frames. ], batch size: 91, lr: 3.37e-02, grad_scale: 16.0 2024-09-16 22:10:16,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.90 vs. limit=15.0 2024-09-16 22:10:17,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=41640.666666666664, ans=0.0 2024-09-16 22:10:25,891 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.324e+02 2.934e+02 3.585e+02 4.773e+02 1.453e+03, threshold=7.170e+02, percent-clipped=7.0 2024-09-16 22:10:49,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=41734.0, ans=0.2 2024-09-16 22:10:53,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=41734.0, ans=0.0 2024-09-16 22:11:05,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=41780.666666666664, ans=0.0 2024-09-16 22:11:06,821 INFO [train.py:1198] (1/2) Epoch 3, batch 1200, loss[loss=0.3531, simple_loss=0.3673, pruned_loss=0.1348, ctc_loss=0.2551, cr_loss=0.4555, over 34541.00 frames. ], tot_loss[loss=0.3572, simple_loss=0.3596, pruned_loss=0.1413, ctc_loss=0.2663, cr_loss=0.4754, over 6708602.70 frames. ], batch size: 99, lr: 3.36e-02, grad_scale: 32.0 2024-09-16 22:11:39,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=41827.333333333336, ans=0.125 2024-09-16 22:11:45,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=41874.0, ans=0.07 2024-09-16 22:12:10,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.03 vs. limit=15.0 2024-09-16 22:12:19,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=41967.333333333336, ans=0.125 2024-09-16 22:12:31,335 INFO [train.py:1198] (1/2) Epoch 3, batch 1250, loss[loss=0.3864, simple_loss=0.3864, pruned_loss=0.1547, ctc_loss=0.2879, cr_loss=0.4825, over 34353.00 frames. ], tot_loss[loss=0.3574, simple_loss=0.3599, pruned_loss=0.1413, ctc_loss=0.2661, cr_loss=0.4763, over 6742155.87 frames. ], batch size: 107, lr: 3.35e-02, grad_scale: 32.0 2024-09-16 22:12:34,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=42014.0, ans=0.0 2024-09-16 22:12:41,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=22.5 2024-09-16 22:12:46,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=15.0 2024-09-16 22:12:59,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=42060.666666666664, ans=0.2 2024-09-16 22:13:01,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.59 vs. limit=15.0 2024-09-16 22:13:14,260 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.294e+02 2.958e+02 3.395e+02 4.063e+02 7.526e+02, threshold=6.790e+02, percent-clipped=1.0 2024-09-16 22:13:20,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.92 vs. limit=15.0 2024-09-16 22:13:22,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=42154.0, ans=0.0 2024-09-16 22:13:26,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.16 vs. limit=22.5 2024-09-16 22:13:37,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2024-09-16 22:13:44,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=42200.666666666664, ans=0.0 2024-09-16 22:13:54,035 INFO [train.py:1198] (1/2) Epoch 3, batch 1300, loss[loss=0.3558, simple_loss=0.3628, pruned_loss=0.1397, ctc_loss=0.2562, cr_loss=0.4536, over 33106.00 frames. ], tot_loss[loss=0.3566, simple_loss=0.3594, pruned_loss=0.1409, ctc_loss=0.2653, cr_loss=0.4766, over 6743849.23 frames. ], batch size: 130, lr: 3.35e-02, grad_scale: 32.0 2024-09-16 22:14:04,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=42247.333333333336, ans=0.125 2024-09-16 22:14:56,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=42387.333333333336, ans=10.0 2024-09-16 22:15:00,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.91 vs. limit=10.0 2024-09-16 22:15:03,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=42434.0, ans=0.0 2024-09-16 22:15:03,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=42434.0, ans=0.125 2024-09-16 22:15:17,845 INFO [train.py:1198] (1/2) Epoch 3, batch 1350, loss[loss=0.3409, simple_loss=0.3524, pruned_loss=0.1309, ctc_loss=0.2459, cr_loss=0.4597, over 34523.00 frames. ], tot_loss[loss=0.3553, simple_loss=0.3586, pruned_loss=0.14, ctc_loss=0.264, cr_loss=0.4762, over 6763665.37 frames. ], batch size: 94, lr: 3.34e-02, grad_scale: 32.0 2024-09-16 22:15:20,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.28 vs. limit=22.5 2024-09-16 22:15:26,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=42480.666666666664, ans=0.0 2024-09-16 22:15:33,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=42480.666666666664, ans=0.125 2024-09-16 22:15:33,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=42480.666666666664, ans=0.125 2024-09-16 22:16:00,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=42574.0, ans=0.125 2024-09-16 22:16:03,502 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.303e+02 2.967e+02 3.283e+02 4.017e+02 6.362e+02, threshold=6.566e+02, percent-clipped=0.0 2024-09-16 22:16:25,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42667.333333333336, ans=0.1 2024-09-16 22:16:35,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=42667.333333333336, ans=0.0015940579710144922 2024-09-16 22:16:41,495 INFO [train.py:1198] (1/2) Epoch 3, batch 1400, loss[loss=0.3067, simple_loss=0.3172, pruned_loss=0.1166, ctc_loss=0.2227, cr_loss=0.4632, over 34309.00 frames. ], tot_loss[loss=0.3551, simple_loss=0.3584, pruned_loss=0.14, ctc_loss=0.2637, cr_loss=0.4773, over 6775831.59 frames. ], batch size: 80, lr: 3.34e-02, grad_scale: 16.0 2024-09-16 22:16:45,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=42714.0, ans=0.125 2024-09-16 22:16:46,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=42714.0, ans=0.05 2024-09-16 22:16:54,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=42714.0, ans=0.025 2024-09-16 22:17:24,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.15 vs. limit=15.0 2024-09-16 22:17:41,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=12.0 2024-09-16 22:17:47,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=42900.666666666664, ans=0.07 2024-09-16 22:18:03,650 INFO [train.py:1198] (1/2) Epoch 3, batch 1450, loss[loss=0.3674, simple_loss=0.3663, pruned_loss=0.1459, ctc_loss=0.2789, cr_loss=0.5234, over 34450.00 frames. ], tot_loss[loss=0.3556, simple_loss=0.359, pruned_loss=0.1402, ctc_loss=0.2641, cr_loss=0.4776, over 6774343.03 frames. ], batch size: 110, lr: 3.33e-02, grad_scale: 16.0 2024-09-16 22:18:07,792 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2024-09-16 22:18:33,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=42994.0, ans=0.125 2024-09-16 22:18:35,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=42994.0, ans=0.025 2024-09-16 22:18:38,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=43040.666666666664, ans=0.125 2024-09-16 22:18:47,751 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.08 vs. limit=15.0 2024-09-16 22:18:49,581 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.218e+02 3.015e+02 3.443e+02 4.123e+02 7.255e+02, threshold=6.886e+02, percent-clipped=1.0 2024-09-16 22:18:51,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=43040.666666666664, ans=0.125 2024-09-16 22:19:12,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=43134.0, ans=0.0 2024-09-16 22:19:18,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43134.0, ans=0.1 2024-09-16 22:19:29,192 INFO [train.py:1198] (1/2) Epoch 3, batch 1500, loss[loss=0.3662, simple_loss=0.3759, pruned_loss=0.1412, ctc_loss=0.2729, cr_loss=0.4904, over 34464.00 frames. ], tot_loss[loss=0.3558, simple_loss=0.3593, pruned_loss=0.1401, ctc_loss=0.2643, cr_loss=0.4778, over 6774293.65 frames. ], batch size: 100, lr: 3.33e-02, grad_scale: 16.0 2024-09-16 22:19:46,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=43227.333333333336, ans=0.1 2024-09-16 22:20:02,637 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:20:34,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=43367.333333333336, ans=0.04949747468305833 2024-09-16 22:20:51,704 INFO [train.py:1198] (1/2) Epoch 3, batch 1550, loss[loss=0.3607, simple_loss=0.3663, pruned_loss=0.1409, ctc_loss=0.2675, cr_loss=0.4941, over 34442.00 frames. ], tot_loss[loss=0.356, simple_loss=0.3592, pruned_loss=0.1404, ctc_loss=0.2646, cr_loss=0.4777, over 6747744.23 frames. ], batch size: 105, lr: 3.32e-02, grad_scale: 16.0 2024-09-16 22:21:32,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.91 vs. limit=15.0 2024-09-16 22:21:35,879 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.374e+02 2.879e+02 3.622e+02 4.748e+02 8.019e+02, threshold=7.243e+02, percent-clipped=4.0 2024-09-16 22:21:55,135 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2024-09-16 22:22:12,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=43600.666666666664, ans=0.025 2024-09-16 22:22:15,380 INFO [train.py:1198] (1/2) Epoch 3, batch 1600, loss[loss=0.3635, simple_loss=0.368, pruned_loss=0.1427, ctc_loss=0.271, cr_loss=0.4847, over 34582.00 frames. ], tot_loss[loss=0.3568, simple_loss=0.3596, pruned_loss=0.1409, ctc_loss=0.2656, cr_loss=0.4776, over 6725563.56 frames. ], batch size: 99, lr: 3.32e-02, grad_scale: 32.0 2024-09-16 22:22:37,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=43694.0, ans=0.0 2024-09-16 22:23:22,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=15.0 2024-09-16 22:23:22,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.75 vs. limit=6.0 2024-09-16 22:23:39,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=43880.666666666664, ans=0.2 2024-09-16 22:23:40,286 INFO [train.py:1198] (1/2) Epoch 3, batch 1650, loss[loss=0.3597, simple_loss=0.3703, pruned_loss=0.1387, ctc_loss=0.2636, cr_loss=0.4744, over 34402.00 frames. ], tot_loss[loss=0.3561, simple_loss=0.3592, pruned_loss=0.1405, ctc_loss=0.2647, cr_loss=0.4763, over 6719823.29 frames. ], batch size: 103, lr: 3.31e-02, grad_scale: 32.0 2024-09-16 22:23:42,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=43880.666666666664, ans=0.125 2024-09-16 22:23:55,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=43927.333333333336, ans=0.125 2024-09-16 22:23:56,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=43927.333333333336, ans=0.0 2024-09-16 22:24:22,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2024-09-16 22:24:25,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=15.0 2024-09-16 22:24:25,957 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.446e+02 3.071e+02 3.669e+02 4.438e+02 8.428e+02, threshold=7.338e+02, percent-clipped=2.0 2024-09-16 22:24:36,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=44020.666666666664, ans=0.125 2024-09-16 22:24:44,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=44067.333333333336, ans=0.04949747468305833 2024-09-16 22:24:45,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=44067.333333333336, ans=0.1 2024-09-16 22:24:46,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.89 vs. limit=22.5 2024-09-16 22:25:01,718 INFO [train.py:1198] (1/2) Epoch 3, batch 1700, loss[loss=0.3074, simple_loss=0.3144, pruned_loss=0.1193, ctc_loss=0.226, cr_loss=0.415, over 34303.00 frames. ], tot_loss[loss=0.3549, simple_loss=0.3586, pruned_loss=0.1397, ctc_loss=0.2636, cr_loss=0.4766, over 6745049.04 frames. ], batch size: 80, lr: 3.31e-02, grad_scale: 16.0 2024-09-16 22:25:07,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.02 vs. limit=10.0 2024-09-16 22:25:09,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.36 vs. limit=15.0 2024-09-16 22:25:10,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=44114.0, ans=0.0012795652173913045 2024-09-16 22:26:25,791 INFO [train.py:1198] (1/2) Epoch 3, batch 1750, loss[loss=0.3434, simple_loss=0.3397, pruned_loss=0.1381, ctc_loss=0.2579, cr_loss=0.485, over 34133.00 frames. ], tot_loss[loss=0.3546, simple_loss=0.3583, pruned_loss=0.1396, ctc_loss=0.263, cr_loss=0.4763, over 6755288.42 frames. ], batch size: 78, lr: 3.30e-02, grad_scale: 16.0 2024-09-16 22:26:50,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=44394.0, ans=0.025 2024-09-16 22:27:01,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=44440.666666666664, ans=0.2 2024-09-16 22:27:01,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=44440.666666666664, ans=0.1 2024-09-16 22:27:01,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2024-09-16 22:27:13,777 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.309e+02 3.186e+02 3.758e+02 4.613e+02 7.822e+02, threshold=7.516e+02, percent-clipped=1.0 2024-09-16 22:27:15,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=44487.333333333336, ans=0.125 2024-09-16 22:27:41,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=44534.0, ans=0.125 2024-09-16 22:27:49,661 INFO [train.py:1198] (1/2) Epoch 3, batch 1800, loss[loss=0.3783, simple_loss=0.3763, pruned_loss=0.1511, ctc_loss=0.2788, cr_loss=0.5572, over 34696.00 frames. ], tot_loss[loss=0.3546, simple_loss=0.3584, pruned_loss=0.1395, ctc_loss=0.2629, cr_loss=0.4772, over 6758213.52 frames. ], batch size: 97, lr: 3.29e-02, grad_scale: 16.0 2024-09-16 22:28:16,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=44627.333333333336, ans=0.2 2024-09-16 22:28:49,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=44720.666666666664, ans=0.0 2024-09-16 22:28:56,749 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.42 vs. limit=15.0 2024-09-16 22:28:59,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=44767.333333333336, ans=0.1 2024-09-16 22:29:11,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.96 vs. limit=6.0 2024-09-16 22:29:12,471 INFO [train.py:1198] (1/2) Epoch 3, batch 1850, loss[loss=0.3476, simple_loss=0.3601, pruned_loss=0.1329, ctc_loss=0.2516, cr_loss=0.475, over 34452.00 frames. ], tot_loss[loss=0.3539, simple_loss=0.3578, pruned_loss=0.1392, ctc_loss=0.2623, cr_loss=0.4765, over 6763985.25 frames. ], batch size: 100, lr: 3.29e-02, grad_scale: 16.0 2024-09-16 22:29:39,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=44860.666666666664, ans=0.1 2024-09-16 22:29:45,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=44907.333333333336, ans=0.125 2024-09-16 22:29:54,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.67 vs. limit=22.5 2024-09-16 22:30:00,380 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.251e+02 3.056e+02 3.710e+02 4.326e+02 9.039e+02, threshold=7.419e+02, percent-clipped=2.0 2024-09-16 22:30:10,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=44954.0, ans=0.2 2024-09-16 22:30:10,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.76 vs. limit=15.0 2024-09-16 22:30:17,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.55 vs. limit=12.0 2024-09-16 22:30:18,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=45000.666666666664, ans=0.0 2024-09-16 22:30:34,301 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.49 vs. limit=12.0 2024-09-16 22:30:36,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=45047.333333333336, ans=0.125 2024-09-16 22:30:36,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=45047.333333333336, ans=0.0 2024-09-16 22:30:37,950 INFO [train.py:1198] (1/2) Epoch 3, batch 1900, loss[loss=0.3783, simple_loss=0.3816, pruned_loss=0.1491, ctc_loss=0.2802, cr_loss=0.5182, over 34395.00 frames. ], tot_loss[loss=0.3542, simple_loss=0.3584, pruned_loss=0.1392, ctc_loss=0.2624, cr_loss=0.477, over 6773600.04 frames. ], batch size: 103, lr: 3.28e-02, grad_scale: 16.0 2024-09-16 22:30:43,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.74 vs. limit=12.0 2024-09-16 22:30:44,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=45047.333333333336, ans=0.2 2024-09-16 22:30:48,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2024-09-16 22:30:54,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=45094.0, ans=0.125 2024-09-16 22:30:55,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2024-09-16 22:31:09,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=45140.666666666664, ans=0.125 2024-09-16 22:31:28,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=45187.333333333336, ans=0.125 2024-09-16 22:32:00,372 INFO [train.py:1198] (1/2) Epoch 3, batch 1950, loss[loss=0.3381, simple_loss=0.35, pruned_loss=0.1292, ctc_loss=0.2453, cr_loss=0.472, over 34331.00 frames. ], tot_loss[loss=0.3552, simple_loss=0.3596, pruned_loss=0.1396, ctc_loss=0.2629, cr_loss=0.4785, over 6790172.61 frames. ], batch size: 91, lr: 3.28e-02, grad_scale: 16.0 2024-09-16 22:32:04,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=45280.666666666664, ans=0.0010259420289855069 2024-09-16 22:32:20,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=45327.333333333336, ans=0.1 2024-09-16 22:32:22,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=45327.333333333336, ans=0.1 2024-09-16 22:32:35,344 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:32:40,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=45374.0, ans=0.125 2024-09-16 22:32:46,465 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.252e+02 3.264e+02 3.716e+02 4.428e+02 7.286e+02, threshold=7.433e+02, percent-clipped=1.0 2024-09-16 22:32:52,313 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.64 vs. limit=22.5 2024-09-16 22:32:56,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=45420.666666666664, ans=0.0 2024-09-16 22:33:24,256 INFO [train.py:1198] (1/2) Epoch 3, batch 2000, loss[loss=0.3111, simple_loss=0.3214, pruned_loss=0.1189, ctc_loss=0.2237, cr_loss=0.4546, over 34138.00 frames. ], tot_loss[loss=0.3569, simple_loss=0.3607, pruned_loss=0.1405, ctc_loss=0.2645, cr_loss=0.4796, over 6765214.24 frames. ], batch size: 78, lr: 3.27e-02, grad_scale: 32.0 2024-09-16 22:33:33,430 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2024-09-16 22:34:09,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=45607.333333333336, ans=0.1 2024-09-16 22:34:25,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=45654.0, ans=0.2 2024-09-16 22:34:31,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=45700.666666666664, ans=0.125 2024-09-16 22:34:49,239 INFO [train.py:1198] (1/2) Epoch 3, batch 2050, loss[loss=0.314, simple_loss=0.3217, pruned_loss=0.1216, ctc_loss=0.2318, cr_loss=0.4188, over 34497.00 frames. ], tot_loss[loss=0.3557, simple_loss=0.3595, pruned_loss=0.14, ctc_loss=0.2637, cr_loss=0.4778, over 6755291.08 frames. ], batch size: 82, lr: 3.27e-02, grad_scale: 16.0 2024-09-16 22:35:11,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=45794.0, ans=0.125 2024-09-16 22:35:16,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=45794.0, ans=0.2 2024-09-16 22:35:36,990 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.173e+02 3.339e+02 3.844e+02 4.946e+02 8.709e+02, threshold=7.688e+02, percent-clipped=3.0 2024-09-16 22:35:40,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=45887.333333333336, ans=0.125 2024-09-16 22:35:58,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=45934.0, ans=0.0 2024-09-16 22:36:01,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=45934.0, ans=0.125 2024-09-16 22:36:03,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=45934.0, ans=0.1 2024-09-16 22:36:05,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.81 vs. limit=15.0 2024-09-16 22:36:06,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=45934.0, ans=0.2 2024-09-16 22:36:08,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=45934.0, ans=0.000883913043478261 2024-09-16 22:36:11,343 INFO [train.py:1198] (1/2) Epoch 3, batch 2100, loss[loss=0.3521, simple_loss=0.3567, pruned_loss=0.1386, ctc_loss=0.2563, cr_loss=0.4771, over 34530.00 frames. ], tot_loss[loss=0.3539, simple_loss=0.3581, pruned_loss=0.1391, ctc_loss=0.262, cr_loss=0.4768, over 6768572.28 frames. ], batch size: 94, lr: 3.26e-02, grad_scale: 16.0 2024-09-16 22:36:20,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.54 vs. limit=6.0 2024-09-16 22:36:27,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=46027.333333333336, ans=0.125 2024-09-16 22:36:37,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=46027.333333333336, ans=0.125 2024-09-16 22:37:05,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=46120.666666666664, ans=0.125 2024-09-16 22:37:05,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2024-09-16 22:37:34,402 INFO [train.py:1198] (1/2) Epoch 3, batch 2150, loss[loss=0.334, simple_loss=0.3448, pruned_loss=0.1283, ctc_loss=0.2388, cr_loss=0.4662, over 34335.00 frames. ], tot_loss[loss=0.3518, simple_loss=0.3566, pruned_loss=0.138, ctc_loss=0.26, cr_loss=0.4764, over 6786420.86 frames. ], batch size: 91, lr: 3.26e-02, grad_scale: 16.0 2024-09-16 22:37:36,513 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:37:41,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=46214.0, ans=0.125 2024-09-16 22:38:06,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=46260.666666666664, ans=0.015 2024-09-16 22:38:18,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-09-16 22:38:24,157 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.260e+02 2.785e+02 3.162e+02 4.130e+02 9.190e+02, threshold=6.325e+02, percent-clipped=2.0 2024-09-16 22:38:29,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=46354.0, ans=0.0 2024-09-16 22:38:39,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=46354.0, ans=0.2 2024-09-16 22:38:45,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=46400.666666666664, ans=0.0 2024-09-16 22:38:49,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.40 vs. limit=22.5 2024-09-16 22:38:50,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=46400.666666666664, ans=0.125 2024-09-16 22:38:58,558 INFO [train.py:1198] (1/2) Epoch 3, batch 2200, loss[loss=0.3662, simple_loss=0.371, pruned_loss=0.1444, ctc_loss=0.2684, cr_loss=0.4725, over 34435.00 frames. ], tot_loss[loss=0.3518, simple_loss=0.3566, pruned_loss=0.138, ctc_loss=0.2601, cr_loss=0.4769, over 6782336.69 frames. ], batch size: 100, lr: 3.25e-02, grad_scale: 16.0 2024-09-16 22:39:18,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=46494.0, ans=0.125 2024-09-16 22:39:28,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.00 vs. limit=6.0 2024-09-16 22:39:40,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=46540.666666666664, ans=0.125 2024-09-16 22:40:21,195 INFO [train.py:1198] (1/2) Epoch 3, batch 2250, loss[loss=0.3571, simple_loss=0.3636, pruned_loss=0.1405, ctc_loss=0.2527, cr_loss=0.4767, over 34400.00 frames. ], tot_loss[loss=0.352, simple_loss=0.3568, pruned_loss=0.138, ctc_loss=0.2604, cr_loss=0.4767, over 6779735.57 frames. ], batch size: 95, lr: 3.25e-02, grad_scale: 16.0 2024-09-16 22:40:44,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=46727.333333333336, ans=0.0 2024-09-16 22:40:46,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=46727.333333333336, ans=0.125 2024-09-16 22:40:48,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2024-09-16 22:41:04,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=46774.0, ans=0.0 2024-09-16 22:41:08,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=46774.0, ans=0.1 2024-09-16 22:41:10,218 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.361e+02 2.982e+02 3.389e+02 4.259e+02 6.908e+02, threshold=6.778e+02, percent-clipped=6.0 2024-09-16 22:41:40,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2024-09-16 22:41:44,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2024-09-16 22:41:44,815 INFO [train.py:1198] (1/2) Epoch 3, batch 2300, loss[loss=0.3212, simple_loss=0.3304, pruned_loss=0.123, ctc_loss=0.2386, cr_loss=0.4591, over 34245.00 frames. ], tot_loss[loss=0.3498, simple_loss=0.3548, pruned_loss=0.137, ctc_loss=0.2588, cr_loss=0.475, over 6767699.96 frames. ], batch size: 83, lr: 3.24e-02, grad_scale: 16.0 2024-09-16 22:41:46,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=46914.0, ans=0.1 2024-09-16 22:42:01,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=46960.666666666664, ans=0.125 2024-09-16 22:42:16,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=46960.666666666664, ans=0.125 2024-09-16 22:42:18,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2024-09-16 22:42:49,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=47054.0, ans=0.1 2024-09-16 22:43:08,916 INFO [train.py:1198] (1/2) Epoch 3, batch 2350, loss[loss=0.369, simple_loss=0.3728, pruned_loss=0.1455, ctc_loss=0.2721, cr_loss=0.4935, over 34712.00 frames. ], tot_loss[loss=0.3499, simple_loss=0.355, pruned_loss=0.137, ctc_loss=0.2586, cr_loss=0.4759, over 6772883.49 frames. ], batch size: 97, lr: 3.24e-02, grad_scale: 16.0 2024-09-16 22:43:15,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=47147.333333333336, ans=0.0006201449275362314 2024-09-16 22:43:32,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=47194.0, ans=0.0006099999999999994 2024-09-16 22:43:56,596 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.270e+02 3.043e+02 3.666e+02 4.629e+02 1.557e+03, threshold=7.332e+02, percent-clipped=3.0 2024-09-16 22:44:10,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=47287.333333333336, ans=0.1 2024-09-16 22:44:23,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=47334.0, ans=0.125 2024-09-16 22:44:29,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=47334.0, ans=0.0 2024-09-16 22:44:31,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=47380.666666666664, ans=0.05 2024-09-16 22:44:32,771 INFO [train.py:1198] (1/2) Epoch 3, batch 2400, loss[loss=0.328, simple_loss=0.3354, pruned_loss=0.1274, ctc_loss=0.2393, cr_loss=0.447, over 34592.00 frames. ], tot_loss[loss=0.3499, simple_loss=0.3551, pruned_loss=0.137, ctc_loss=0.2585, cr_loss=0.4761, over 6777421.37 frames. ], batch size: 89, lr: 3.23e-02, grad_scale: 32.0 2024-09-16 22:44:43,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=47380.666666666664, ans=0.0 2024-09-16 22:44:52,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=47427.333333333336, ans=0.125 2024-09-16 22:45:32,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=47520.666666666664, ans=0.125 2024-09-16 22:45:57,498 INFO [train.py:1198] (1/2) Epoch 3, batch 2450, loss[loss=0.3576, simple_loss=0.3677, pruned_loss=0.138, ctc_loss=0.2602, cr_loss=0.4895, over 34406.00 frames. ], tot_loss[loss=0.3521, simple_loss=0.3566, pruned_loss=0.1381, ctc_loss=0.2605, cr_loss=0.4777, over 6752702.33 frames. ], batch size: 95, lr: 3.23e-02, grad_scale: 16.0 2024-09-16 22:46:22,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=47660.666666666664, ans=0.125 2024-09-16 22:46:25,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=47660.666666666664, ans=0.07 2024-09-16 22:46:38,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=47707.333333333336, ans=0.125 2024-09-16 22:46:38,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=47707.333333333336, ans=0.125 2024-09-16 22:46:46,791 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.351e+02 3.059e+02 3.354e+02 4.219e+02 5.885e+02, threshold=6.707e+02, percent-clipped=0.0 2024-09-16 22:47:07,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=15.0 2024-09-16 22:47:20,014 INFO [train.py:1198] (1/2) Epoch 3, batch 2500, loss[loss=0.3709, simple_loss=0.3726, pruned_loss=0.1471, ctc_loss=0.2746, cr_loss=0.4999, over 34447.00 frames. ], tot_loss[loss=0.3513, simple_loss=0.3562, pruned_loss=0.1377, ctc_loss=0.2596, cr_loss=0.4772, over 6764463.10 frames. ], batch size: 100, lr: 3.22e-02, grad_scale: 16.0 2024-09-16 22:47:31,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=47847.333333333336, ans=0.125 2024-09-16 22:48:04,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=47940.666666666664, ans=0.125 2024-09-16 22:48:36,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=48034.0, ans=0.125 2024-09-16 22:48:44,145 INFO [train.py:1198] (1/2) Epoch 3, batch 2550, loss[loss=0.3123, simple_loss=0.3245, pruned_loss=0.1184, ctc_loss=0.226, cr_loss=0.4516, over 34131.00 frames. ], tot_loss[loss=0.3507, simple_loss=0.3557, pruned_loss=0.1373, ctc_loss=0.2589, cr_loss=0.4771, over 6767280.95 frames. ], batch size: 78, lr: 3.22e-02, grad_scale: 16.0 2024-09-16 22:49:10,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=48127.333333333336, ans=0.2 2024-09-16 22:49:24,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=48174.0, ans=0.07 2024-09-16 22:49:35,416 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.496e+02 3.130e+02 3.782e+02 4.274e+02 8.911e+02, threshold=7.564e+02, percent-clipped=4.0 2024-09-16 22:50:03,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=48267.333333333336, ans=0.0003766666666666658 2024-09-16 22:50:08,242 INFO [train.py:1198] (1/2) Epoch 3, batch 2600, loss[loss=0.3373, simple_loss=0.3428, pruned_loss=0.1319, ctc_loss=0.2469, cr_loss=0.4664, over 34377.00 frames. ], tot_loss[loss=0.3519, simple_loss=0.3567, pruned_loss=0.138, ctc_loss=0.2601, cr_loss=0.4783, over 6762643.49 frames. ], batch size: 91, lr: 3.21e-02, grad_scale: 16.0 2024-09-16 22:50:15,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=48314.0, ans=0.125 2024-09-16 22:50:21,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=48314.0, ans=0.125 2024-09-16 22:50:28,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=48360.666666666664, ans=0.0 2024-09-16 22:50:34,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.65 vs. limit=12.0 2024-09-16 22:50:55,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=48454.0, ans=0.0003360869565217396 2024-09-16 22:51:02,735 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 22:51:05,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=48454.0, ans=0.125 2024-09-16 22:51:20,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=48500.666666666664, ans=0.00032594202898550764 2024-09-16 22:51:22,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=48500.666666666664, ans=0.0 2024-09-16 22:51:30,439 INFO [train.py:1198] (1/2) Epoch 3, batch 2650, loss[loss=0.3477, simple_loss=0.3621, pruned_loss=0.1318, ctc_loss=0.2565, cr_loss=0.4556, over 34261.00 frames. ], tot_loss[loss=0.3517, simple_loss=0.3567, pruned_loss=0.1378, ctc_loss=0.2597, cr_loss=0.4781, over 6770291.45 frames. ], batch size: 117, lr: 3.21e-02, grad_scale: 16.0 2024-09-16 22:51:58,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=48594.0, ans=0.0 2024-09-16 22:52:17,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.85 vs. limit=15.0 2024-09-16 22:52:20,997 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.304e+02 2.994e+02 3.371e+02 4.494e+02 7.537e+02, threshold=6.741e+02, percent-clipped=0.0 2024-09-16 22:52:36,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=48734.0, ans=0.025 2024-09-16 22:52:46,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=48734.0, ans=0.0 2024-09-16 22:52:54,078 INFO [train.py:1198] (1/2) Epoch 3, batch 2700, loss[loss=0.3418, simple_loss=0.3544, pruned_loss=0.1304, ctc_loss=0.2505, cr_loss=0.4578, over 34618.00 frames. ], tot_loss[loss=0.3518, simple_loss=0.357, pruned_loss=0.1377, ctc_loss=0.2596, cr_loss=0.4782, over 6764206.80 frames. ], batch size: 102, lr: 3.20e-02, grad_scale: 16.0 2024-09-16 22:52:59,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=48780.666666666664, ans=0.125 2024-09-16 22:53:28,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.16 vs. limit=22.5 2024-09-16 22:53:32,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=48874.0, ans=0.1 2024-09-16 22:53:32,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=48874.0, ans=0.125 2024-09-16 22:53:54,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.95 vs. limit=15.0 2024-09-16 22:53:57,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=48920.666666666664, ans=0.1 2024-09-16 22:54:18,708 INFO [train.py:1198] (1/2) Epoch 3, batch 2750, loss[loss=0.337, simple_loss=0.3396, pruned_loss=0.1333, ctc_loss=0.2491, cr_loss=0.4463, over 34639.00 frames. ], tot_loss[loss=0.35, simple_loss=0.3553, pruned_loss=0.137, ctc_loss=0.2584, cr_loss=0.476, over 6762887.72 frames. ], batch size: 88, lr: 3.20e-02, grad_scale: 16.0 2024-09-16 22:54:35,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=49060.666666666664, ans=0.125 2024-09-16 22:54:40,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49060.666666666664, ans=0.1 2024-09-16 22:54:51,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-09-16 22:54:51,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=49107.333333333336, ans=0.125 2024-09-16 22:55:08,122 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.457e+02 3.123e+02 3.715e+02 5.079e+02 8.589e+02, threshold=7.429e+02, percent-clipped=6.0 2024-09-16 22:55:16,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=49154.0, ans=0.0 2024-09-16 22:55:21,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=49154.0, ans=0.0 2024-09-16 22:55:21,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=49154.0, ans=0.125 2024-09-16 22:55:43,026 INFO [train.py:1198] (1/2) Epoch 3, batch 2800, loss[loss=0.4112, simple_loss=0.3908, pruned_loss=0.1737, ctc_loss=0.3275, cr_loss=0.469, over 23181.00 frames. ], tot_loss[loss=0.3504, simple_loss=0.3555, pruned_loss=0.1373, ctc_loss=0.2588, cr_loss=0.4759, over 6741406.08 frames. ], batch size: 245, lr: 3.19e-02, grad_scale: 32.0 2024-09-16 22:55:51,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=49247.333333333336, ans=0.125 2024-09-16 22:55:55,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.64 vs. limit=15.0 2024-09-16 22:56:22,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=49340.666666666664, ans=0.0 2024-09-16 22:57:07,530 INFO [train.py:1198] (1/2) Epoch 3, batch 2850, loss[loss=0.3507, simple_loss=0.3502, pruned_loss=0.1401, ctc_loss=0.255, cr_loss=0.4996, over 34505.00 frames. ], tot_loss[loss=0.3523, simple_loss=0.3568, pruned_loss=0.1383, ctc_loss=0.2605, cr_loss=0.477, over 6724003.71 frames. ], batch size: 90, lr: 3.19e-02, grad_scale: 16.0 2024-09-16 22:57:09,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=49480.666666666664, ans=0.125 2024-09-16 22:57:19,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=49480.666666666664, ans=0.125 2024-09-16 22:57:19,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=49480.666666666664, ans=0.125 2024-09-16 22:57:30,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=49527.333333333336, ans=0.0 2024-09-16 22:57:46,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2024-09-16 22:57:48,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=49574.0, ans=0.1 2024-09-16 22:57:58,592 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.208e+02 3.030e+02 3.531e+02 4.850e+02 7.377e+02, threshold=7.062e+02, percent-clipped=0.0 2024-09-16 22:58:17,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.68 vs. limit=22.5 2024-09-16 22:58:29,585 INFO [train.py:1198] (1/2) Epoch 3, batch 2900, loss[loss=0.3272, simple_loss=0.3411, pruned_loss=0.1246, ctc_loss=0.2388, cr_loss=0.4108, over 34544.00 frames. ], tot_loss[loss=0.3529, simple_loss=0.3577, pruned_loss=0.1384, ctc_loss=0.2606, cr_loss=0.4784, over 6755158.75 frames. ], batch size: 94, lr: 3.18e-02, grad_scale: 16.0 2024-09-16 22:58:54,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=49760.666666666664, ans=5.202898550724783e-05 2024-09-16 22:59:21,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.71 vs. limit=15.0 2024-09-16 22:59:24,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2024-09-16 22:59:53,639 INFO [train.py:1198] (1/2) Epoch 3, batch 2950, loss[loss=0.3301, simple_loss=0.3427, pruned_loss=0.1251, ctc_loss=0.2409, cr_loss=0.4807, over 34617.00 frames. ], tot_loss[loss=0.35, simple_loss=0.3554, pruned_loss=0.137, ctc_loss=0.2585, cr_loss=0.4754, over 6749246.63 frames. ], batch size: 88, lr: 3.18e-02, grad_scale: 8.0 2024-09-16 23:00:05,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=49947.333333333336, ans=0.125 2024-09-16 23:00:11,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2024-09-16 23:00:11,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=49994.0, ans=1.3043478260879593e-06 2024-09-16 23:00:14,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=16.45 vs. limit=15.0 2024-09-16 23:00:41,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-09-16 23:00:45,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.03 vs. limit=15.0 2024-09-16 23:00:47,754 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.464e+02 3.153e+02 3.852e+02 5.148e+02 8.569e+02, threshold=7.703e+02, percent-clipped=5.0 2024-09-16 23:00:56,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=50087.333333333336, ans=0.125 2024-09-16 23:01:07,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=50134.0, ans=0.0 2024-09-16 23:01:17,540 INFO [train.py:1198] (1/2) Epoch 3, batch 3000, loss[loss=0.3212, simple_loss=0.3332, pruned_loss=0.1223, ctc_loss=0.2302, cr_loss=0.4654, over 34544.00 frames. ], tot_loss[loss=0.349, simple_loss=0.3546, pruned_loss=0.1365, ctc_loss=0.2576, cr_loss=0.4751, over 6747845.62 frames. ], batch size: 94, lr: 3.17e-02, grad_scale: 8.0 2024-09-16 23:01:17,540 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 23:01:34,432 INFO [train.py:1230] (1/2) Epoch 3, validation: loss=0.1956, simple_loss=0.2897, pruned_loss=0.04188, ctc_loss=0.08893, cr_loss=1.446e-14, over 944034.00 frames. 2024-09-16 23:01:34,433 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-16 23:01:38,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=50180.666666666664, ans=0.5 2024-09-16 23:01:56,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=50227.333333333336, ans=0.2 2024-09-16 23:02:04,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=50227.333333333336, ans=0.125 2024-09-16 23:02:13,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=50274.0, ans=0.125 2024-09-16 23:02:18,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=50274.0, ans=0.2 2024-09-16 23:02:21,920 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:02:26,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=50320.666666666664, ans=0.125 2024-09-16 23:02:57,279 INFO [train.py:1198] (1/2) Epoch 3, batch 3050, loss[loss=0.3238, simple_loss=0.3334, pruned_loss=0.1241, ctc_loss=0.239, cr_loss=0.4542, over 34565.00 frames. ], tot_loss[loss=0.3505, simple_loss=0.3558, pruned_loss=0.1372, ctc_loss=0.259, cr_loss=0.4767, over 6740837.38 frames. ], batch size: 89, lr: 3.17e-02, grad_scale: 4.0 2024-09-16 23:03:04,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=50414.0, ans=0.125 2024-09-16 23:03:24,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.57 vs. limit=22.5 2024-09-16 23:03:33,830 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.81 vs. limit=10.0 2024-09-16 23:03:34,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=50507.333333333336, ans=0.05 2024-09-16 23:03:41,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=50507.333333333336, ans=0.0 2024-09-16 23:03:43,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.85 vs. limit=15.0 2024-09-16 23:03:44,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.63 vs. limit=5.0 2024-09-16 23:03:50,717 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.240e+02 3.049e+02 3.636e+02 4.264e+02 1.099e+03, threshold=7.272e+02, percent-clipped=1.0 2024-09-16 23:03:52,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=50554.0, ans=0.07 2024-09-16 23:03:58,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=50554.0, ans=0.95 2024-09-16 23:04:16,960 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:04:18,127 INFO [train.py:1198] (1/2) Epoch 3, batch 3100, loss[loss=0.3912, simple_loss=0.3898, pruned_loss=0.1559, ctc_loss=0.2955, cr_loss=0.5437, over 34174.00 frames. ], tot_loss[loss=0.3502, simple_loss=0.3555, pruned_loss=0.137, ctc_loss=0.2586, cr_loss=0.4764, over 6741102.40 frames. ], batch size: 117, lr: 3.16e-02, grad_scale: 8.0 2024-09-16 23:05:02,620 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2024-09-16 23:05:21,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50834.0, ans=0.1 2024-09-16 23:05:28,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.41 vs. limit=15.0 2024-09-16 23:05:33,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.67 vs. limit=15.0 2024-09-16 23:05:38,742 INFO [train.py:1198] (1/2) Epoch 3, batch 3150, loss[loss=0.3702, simple_loss=0.3781, pruned_loss=0.1447, ctc_loss=0.2653, cr_loss=0.4943, over 33916.00 frames. ], tot_loss[loss=0.3501, simple_loss=0.3556, pruned_loss=0.1369, ctc_loss=0.2583, cr_loss=0.4768, over 6748507.85 frames. ], batch size: 122, lr: 3.16e-02, grad_scale: 8.0 2024-09-16 23:05:44,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=50880.666666666664, ans=0.125 2024-09-16 23:05:58,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=50927.333333333336, ans=0.07 2024-09-16 23:06:01,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=50927.333333333336, ans=0.0 2024-09-16 23:06:22,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.94 vs. limit=10.0 2024-09-16 23:06:27,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=51020.666666666664, ans=0.0 2024-09-16 23:06:33,433 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.914e+02 3.392e+02 4.456e+02 7.434e+02, threshold=6.784e+02, percent-clipped=2.0 2024-09-16 23:06:53,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.77 vs. limit=10.0 2024-09-16 23:06:54,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=51067.333333333336, ans=0.125 2024-09-16 23:06:54,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=51067.333333333336, ans=0.2 2024-09-16 23:07:00,978 INFO [train.py:1198] (1/2) Epoch 3, batch 3200, loss[loss=0.3459, simple_loss=0.359, pruned_loss=0.132, ctc_loss=0.2508, cr_loss=0.4662, over 34550.00 frames. ], tot_loss[loss=0.3489, simple_loss=0.3547, pruned_loss=0.1363, ctc_loss=0.2572, cr_loss=0.4766, over 6762742.81 frames. ], batch size: 94, lr: 3.15e-02, grad_scale: 16.0 2024-09-16 23:07:11,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-09-16 23:08:01,383 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.28 vs. limit=15.0 2024-09-16 23:08:05,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=51300.666666666664, ans=0.125 2024-09-16 23:08:08,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=51300.666666666664, ans=0.125 2024-09-16 23:08:21,671 INFO [train.py:1198] (1/2) Epoch 3, batch 3250, loss[loss=0.3691, simple_loss=0.3732, pruned_loss=0.1462, ctc_loss=0.2659, cr_loss=0.485, over 34649.00 frames. ], tot_loss[loss=0.349, simple_loss=0.3549, pruned_loss=0.1363, ctc_loss=0.257, cr_loss=0.4772, over 6771681.82 frames. ], batch size: 98, lr: 3.15e-02, grad_scale: 16.0 2024-09-16 23:08:40,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=51394.0, ans=0.125 2024-09-16 23:08:45,848 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.289e-02 2024-09-16 23:08:45,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=51394.0, ans=0.0 2024-09-16 23:08:50,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=51394.0, ans=0.125 2024-09-16 23:09:13,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=51487.333333333336, ans=0.125 2024-09-16 23:09:14,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=51487.333333333336, ans=0.95 2024-09-16 23:09:16,003 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.325e+02 3.092e+02 3.713e+02 4.332e+02 8.210e+02, threshold=7.426e+02, percent-clipped=1.0 2024-09-16 23:09:18,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2024-09-16 23:09:32,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=51534.0, ans=0.1 2024-09-16 23:09:35,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=51534.0, ans=0.125 2024-09-16 23:09:41,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.94 vs. limit=6.0 2024-09-16 23:09:43,335 INFO [train.py:1198] (1/2) Epoch 3, batch 3300, loss[loss=0.3632, simple_loss=0.3705, pruned_loss=0.141, ctc_loss=0.267, cr_loss=0.5103, over 33114.00 frames. ], tot_loss[loss=0.3473, simple_loss=0.3534, pruned_loss=0.1355, ctc_loss=0.2558, cr_loss=0.4757, over 6769799.00 frames. ], batch size: 130, lr: 3.14e-02, grad_scale: 16.0 2024-09-16 23:09:43,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=51580.666666666664, ans=0.125 2024-09-16 23:09:51,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=51580.666666666664, ans=0.2 2024-09-16 23:10:11,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51627.333333333336, ans=0.1 2024-09-16 23:10:27,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=51674.0, ans=0.0 2024-09-16 23:10:39,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=51720.666666666664, ans=0.07 2024-09-16 23:10:43,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=51720.666666666664, ans=0.125 2024-09-16 23:10:49,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=51767.333333333336, ans=0.025 2024-09-16 23:10:59,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=51767.333333333336, ans=0.125 2024-09-16 23:11:00,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=51767.333333333336, ans=0.0 2024-09-16 23:11:03,638 INFO [train.py:1198] (1/2) Epoch 3, batch 3350, loss[loss=0.3698, simple_loss=0.3779, pruned_loss=0.1445, ctc_loss=0.2672, cr_loss=0.4803, over 33847.00 frames. ], tot_loss[loss=0.3492, simple_loss=0.3549, pruned_loss=0.1365, ctc_loss=0.2574, cr_loss=0.4762, over 6743461.27 frames. ], batch size: 122, lr: 3.14e-02, grad_scale: 16.0 2024-09-16 23:11:11,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=51814.0, ans=0.125 2024-09-16 23:11:16,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=51814.0, ans=0.05 2024-09-16 23:11:22,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=51860.666666666664, ans=0.125 2024-09-16 23:11:50,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=51907.333333333336, ans=0.125 2024-09-16 23:11:55,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.97 vs. limit=15.0 2024-09-16 23:11:57,989 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.293e+02 2.966e+02 3.324e+02 3.966e+02 5.811e+02, threshold=6.649e+02, percent-clipped=0.0 2024-09-16 23:12:02,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=15.0 2024-09-16 23:12:11,788 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.29 vs. limit=15.0 2024-09-16 23:12:23,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=52047.333333333336, ans=0.0 2024-09-16 23:12:25,165 INFO [train.py:1198] (1/2) Epoch 3, batch 3400, loss[loss=0.3096, simple_loss=0.317, pruned_loss=0.1199, ctc_loss=0.2225, cr_loss=0.4468, over 34177.00 frames. ], tot_loss[loss=0.3487, simple_loss=0.3544, pruned_loss=0.1363, ctc_loss=0.2569, cr_loss=0.4752, over 6732615.99 frames. ], batch size: 78, lr: 3.13e-02, grad_scale: 16.0 2024-09-16 23:12:26,139 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.20 vs. limit=22.5 2024-09-16 23:12:35,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=52047.333333333336, ans=0.125 2024-09-16 23:12:46,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=52094.0, ans=0.125 2024-09-16 23:13:03,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=52140.666666666664, ans=0.07 2024-09-16 23:13:05,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=52140.666666666664, ans=0.0 2024-09-16 23:13:11,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=52187.333333333336, ans=0.0 2024-09-16 23:13:18,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=52187.333333333336, ans=0.125 2024-09-16 23:13:26,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=52187.333333333336, ans=0.95 2024-09-16 23:13:29,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=52234.0, ans=15.0 2024-09-16 23:13:31,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=52234.0, ans=0.2 2024-09-16 23:13:46,486 INFO [train.py:1198] (1/2) Epoch 3, batch 3450, loss[loss=0.3704, simple_loss=0.3769, pruned_loss=0.1441, ctc_loss=0.2795, cr_loss=0.4957, over 33120.00 frames. ], tot_loss[loss=0.348, simple_loss=0.3543, pruned_loss=0.1357, ctc_loss=0.2561, cr_loss=0.475, over 6744988.08 frames. ], batch size: 130, lr: 3.13e-02, grad_scale: 16.0 2024-09-16 23:13:50,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=52280.666666666664, ans=0.125 2024-09-16 23:14:23,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=52374.0, ans=0.1 2024-09-16 23:14:38,915 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.295e+02 3.018e+02 3.437e+02 4.075e+02 7.177e+02, threshold=6.874e+02, percent-clipped=1.0 2024-09-16 23:14:48,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=52467.333333333336, ans=0.125 2024-09-16 23:14:53,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=52467.333333333336, ans=0.125 2024-09-16 23:15:03,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=52467.333333333336, ans=0.0 2024-09-16 23:15:06,027 INFO [train.py:1198] (1/2) Epoch 3, batch 3500, loss[loss=0.3051, simple_loss=0.319, pruned_loss=0.1152, ctc_loss=0.2187, cr_loss=0.4258, over 34477.00 frames. ], tot_loss[loss=0.3478, simple_loss=0.3538, pruned_loss=0.1358, ctc_loss=0.2562, cr_loss=0.4752, over 6747509.81 frames. ], batch size: 85, lr: 3.12e-02, grad_scale: 16.0 2024-09-16 23:15:06,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=52514.0, ans=0.0 2024-09-16 23:15:41,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=52607.333333333336, ans=0.125 2024-09-16 23:16:27,134 INFO [train.py:1198] (1/2) Epoch 3, batch 3550, loss[loss=0.3543, simple_loss=0.3644, pruned_loss=0.1365, ctc_loss=0.2616, cr_loss=0.4742, over 34397.00 frames. ], tot_loss[loss=0.3472, simple_loss=0.3536, pruned_loss=0.1353, ctc_loss=0.2556, cr_loss=0.4747, over 6756805.57 frames. ], batch size: 103, lr: 3.12e-02, grad_scale: 16.0 2024-09-16 23:16:38,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=52747.333333333336, ans=0.2 2024-09-16 23:16:47,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.39 vs. limit=15.0 2024-09-16 23:17:20,188 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.211e+02 3.101e+02 3.651e+02 4.643e+02 1.800e+03, threshold=7.301e+02, percent-clipped=7.0 2024-09-16 23:17:48,054 INFO [train.py:1198] (1/2) Epoch 3, batch 3600, loss[loss=0.3228, simple_loss=0.3322, pruned_loss=0.1242, ctc_loss=0.2311, cr_loss=0.4696, over 34468.00 frames. ], tot_loss[loss=0.348, simple_loss=0.3542, pruned_loss=0.1358, ctc_loss=0.2561, cr_loss=0.4759, over 6766213.22 frames. ], batch size: 90, lr: 3.11e-02, grad_scale: 32.0 2024-09-16 23:17:53,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=52980.666666666664, ans=0.05 2024-09-16 23:18:22,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.36 vs. limit=15.0 2024-09-16 23:18:28,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=53074.0, ans=0.1 2024-09-16 23:18:37,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=53120.666666666664, ans=0.125 2024-09-16 23:18:56,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=53167.333333333336, ans=0.125 2024-09-16 23:19:08,320 INFO [train.py:1198] (1/2) Epoch 3, batch 3650, loss[loss=0.3613, simple_loss=0.3659, pruned_loss=0.1418, ctc_loss=0.2636, cr_loss=0.5087, over 34413.00 frames. ], tot_loss[loss=0.3464, simple_loss=0.3528, pruned_loss=0.135, ctc_loss=0.2549, cr_loss=0.474, over 6769097.78 frames. ], batch size: 110, lr: 3.11e-02, grad_scale: 16.0 2024-09-16 23:19:48,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=53307.333333333336, ans=0.0 2024-09-16 23:20:03,454 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.418e+02 3.210e+02 3.848e+02 4.734e+02 1.435e+03, threshold=7.695e+02, percent-clipped=3.0 2024-09-16 23:20:11,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=53400.666666666664, ans=0.0 2024-09-16 23:20:22,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=53400.666666666664, ans=0.1 2024-09-16 23:20:28,971 INFO [train.py:1198] (1/2) Epoch 3, batch 3700, loss[loss=0.3487, simple_loss=0.3615, pruned_loss=0.1329, ctc_loss=0.2582, cr_loss=0.4647, over 34640.00 frames. ], tot_loss[loss=0.3458, simple_loss=0.3528, pruned_loss=0.1346, ctc_loss=0.2539, cr_loss=0.4733, over 6784473.51 frames. ], batch size: 102, lr: 3.10e-02, grad_scale: 16.0 2024-09-16 23:20:31,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.59 vs. limit=15.0 2024-09-16 23:20:42,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=53447.333333333336, ans=0.0 2024-09-16 23:20:50,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=53494.0, ans=0.125 2024-09-16 23:21:06,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=53540.666666666664, ans=0.2 2024-09-16 23:21:18,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.16 vs. limit=15.0 2024-09-16 23:21:36,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=12.0 2024-09-16 23:21:49,817 INFO [train.py:1198] (1/2) Epoch 3, batch 3750, loss[loss=0.3632, simple_loss=0.3728, pruned_loss=0.1408, ctc_loss=0.2613, cr_loss=0.4968, over 34350.00 frames. ], tot_loss[loss=0.3491, simple_loss=0.356, pruned_loss=0.136, ctc_loss=0.2564, cr_loss=0.4773, over 6786493.96 frames. ], batch size: 113, lr: 3.10e-02, grad_scale: 16.0 2024-09-16 23:22:10,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.88 vs. limit=22.5 2024-09-16 23:22:25,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=53774.0, ans=0.125 2024-09-16 23:22:35,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=53774.0, ans=0.125 2024-09-16 23:22:44,996 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.476e+02 2.890e+02 3.425e+02 4.505e+02 7.853e+02, threshold=6.850e+02, percent-clipped=1.0 2024-09-16 23:22:45,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=53820.666666666664, ans=0.025 2024-09-16 23:22:53,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.65 vs. limit=22.5 2024-09-16 23:23:01,387 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:23:10,643 INFO [train.py:1198] (1/2) Epoch 3, batch 3800, loss[loss=0.3979, simple_loss=0.3858, pruned_loss=0.1647, ctc_loss=0.3015, cr_loss=0.5072, over 30012.00 frames. ], tot_loss[loss=0.3543, simple_loss=0.3595, pruned_loss=0.1388, ctc_loss=0.2617, cr_loss=0.4807, over 6675611.37 frames. ], batch size: 175, lr: 3.09e-02, grad_scale: 16.0 2024-09-16 23:23:12,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=53914.0, ans=0.0 2024-09-16 23:23:35,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.97 vs. limit=22.5 2024-09-16 23:23:37,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=53960.666666666664, ans=0.2 2024-09-16 23:23:46,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=54007.333333333336, ans=0.125 2024-09-16 23:23:47,819 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:23:59,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=54054.0, ans=0.125 2024-09-16 23:24:09,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=54054.0, ans=0.125 2024-09-16 23:24:12,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=54054.0, ans=0.125 2024-09-16 23:24:17,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=54100.666666666664, ans=0.1 2024-09-16 23:24:32,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=54147.333333333336, ans=10.0 2024-09-16 23:24:34,025 INFO [train.py:1198] (1/2) Epoch 3, batch 3850, loss[loss=0.4141, simple_loss=0.3973, pruned_loss=0.1731, ctc_loss=0.3259, cr_loss=0.4865, over 22443.00 frames. ], tot_loss[loss=0.3636, simple_loss=0.3646, pruned_loss=0.1444, ctc_loss=0.2724, cr_loss=0.4826, over 6249881.11 frames. ], batch size: 245, lr: 3.09e-02, grad_scale: 16.0 2024-09-16 23:24:40,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=54147.333333333336, ans=0.125 2024-09-16 23:25:12,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=54240.666666666664, ans=0.125 2024-09-16 23:26:07,753 INFO [train.py:1198] (1/2) Epoch 4, batch 0, loss[loss=0.3396, simple_loss=0.3419, pruned_loss=0.1338, ctc_loss=0.2523, cr_loss=0.4824, over 34423.00 frames. ], tot_loss[loss=0.3396, simple_loss=0.3419, pruned_loss=0.1338, ctc_loss=0.2523, cr_loss=0.4824, over 34423.00 frames. ], batch size: 85, lr: 2.89e-02, grad_scale: 32.0 2024-09-16 23:26:07,753 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-16 23:26:24,518 INFO [train.py:1230] (1/2) Epoch 4, validation: loss=0.2009, simple_loss=0.2952, pruned_loss=0.04419, ctc_loss=0.09061, cr_loss=1.573e-14, over 944034.00 frames. 2024-09-16 23:26:24,518 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-16 23:26:37,555 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.433e+02 2.924e+02 3.197e+02 3.707e+02 7.404e+02, threshold=6.394e+02, percent-clipped=1.0 2024-09-16 23:26:43,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.72 vs. limit=15.0 2024-09-16 23:26:44,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=54315.333333333336, ans=0.125 2024-09-16 23:26:59,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=54362.0, ans=0.125 2024-09-16 23:27:45,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=54502.0, ans=0.125 2024-09-16 23:27:46,689 INFO [train.py:1198] (1/2) Epoch 4, batch 50, loss[loss=0.3003, simple_loss=0.3137, pruned_loss=0.1129, ctc_loss=0.2216, cr_loss=0.4214, over 34508.00 frames. ], tot_loss[loss=0.3509, simple_loss=0.3565, pruned_loss=0.1372, ctc_loss=0.2588, cr_loss=0.4756, over 1482182.68 frames. ], batch size: 82, lr: 2.88e-02, grad_scale: 32.0 2024-09-16 23:28:11,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=54548.666666666664, ans=0.2 2024-09-16 23:28:22,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=54595.333333333336, ans=0.0 2024-09-16 23:28:30,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=54595.333333333336, ans=0.0 2024-09-16 23:28:39,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.67 vs. limit=12.0 2024-09-16 23:28:51,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=54642.0, ans=0.1 2024-09-16 23:29:12,621 INFO [train.py:1198] (1/2) Epoch 4, batch 100, loss[loss=0.3321, simple_loss=0.3379, pruned_loss=0.1301, ctc_loss=0.2385, cr_loss=0.4602, over 34597.00 frames. ], tot_loss[loss=0.3505, simple_loss=0.3568, pruned_loss=0.1367, ctc_loss=0.2579, cr_loss=0.479, over 2631683.35 frames. ], batch size: 89, lr: 2.88e-02, grad_scale: 32.0 2024-09-16 23:29:25,646 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.272e+02 2.977e+02 3.569e+02 4.252e+02 6.666e+02, threshold=7.137e+02, percent-clipped=3.0 2024-09-16 23:29:27,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=54782.0, ans=0.2 2024-09-16 23:29:28,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.53 vs. limit=6.0 2024-09-16 23:29:45,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=54828.666666666664, ans=0.125 2024-09-16 23:29:55,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=54828.666666666664, ans=0.125 2024-09-16 23:29:58,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=54828.666666666664, ans=0.125 2024-09-16 23:29:59,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=54875.333333333336, ans=0.0 2024-09-16 23:30:12,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=54875.333333333336, ans=0.1 2024-09-16 23:30:33,374 INFO [train.py:1198] (1/2) Epoch 4, batch 150, loss[loss=0.3029, simple_loss=0.3153, pruned_loss=0.1144, ctc_loss=0.2244, cr_loss=0.4239, over 34467.00 frames. ], tot_loss[loss=0.3449, simple_loss=0.3528, pruned_loss=0.1337, ctc_loss=0.2527, cr_loss=0.4748, over 3559449.25 frames. ], batch size: 82, lr: 2.87e-02, grad_scale: 32.0 2024-09-16 23:31:02,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.83 vs. limit=10.0 2024-09-16 23:31:10,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=55062.0, ans=0.025 2024-09-16 23:31:43,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=55155.333333333336, ans=0.125 2024-09-16 23:31:55,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.98 vs. limit=15.0 2024-09-16 23:31:57,415 INFO [train.py:1198] (1/2) Epoch 4, batch 200, loss[loss=0.3566, simple_loss=0.3661, pruned_loss=0.1378, ctc_loss=0.262, cr_loss=0.4776, over 31794.00 frames. ], tot_loss[loss=0.3419, simple_loss=0.3503, pruned_loss=0.1323, ctc_loss=0.2498, cr_loss=0.4738, over 4273167.61 frames. ], batch size: 145, lr: 2.87e-02, grad_scale: 32.0 2024-09-16 23:32:09,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=55202.0, ans=0.125 2024-09-16 23:32:12,435 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.264e+02 2.793e+02 3.152e+02 3.844e+02 5.852e+02, threshold=6.304e+02, percent-clipped=0.0 2024-09-16 23:32:13,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.39 vs. limit=15.0 2024-09-16 23:32:22,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=55248.666666666664, ans=0.125 2024-09-16 23:32:32,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=55295.333333333336, ans=0.125 2024-09-16 23:32:33,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=55295.333333333336, ans=0.0 2024-09-16 23:32:33,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=55295.333333333336, ans=0.125 2024-09-16 23:32:42,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=55295.333333333336, ans=0.125 2024-09-16 23:32:43,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=55295.333333333336, ans=0.0 2024-09-16 23:32:57,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.95 vs. limit=22.5 2024-09-16 23:33:14,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=55388.666666666664, ans=0.07 2024-09-16 23:33:21,070 INFO [train.py:1198] (1/2) Epoch 4, batch 250, loss[loss=0.3607, simple_loss=0.3715, pruned_loss=0.1381, ctc_loss=0.2689, cr_loss=0.4998, over 34254.00 frames. ], tot_loss[loss=0.3405, simple_loss=0.3495, pruned_loss=0.1315, ctc_loss=0.2483, cr_loss=0.472, over 4834626.32 frames. ], batch size: 117, lr: 2.86e-02, grad_scale: 32.0 2024-09-16 23:34:12,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=55575.333333333336, ans=0.0 2024-09-16 23:34:27,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.99 vs. limit=15.0 2024-09-16 23:34:38,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=55622.0, ans=0.125 2024-09-16 23:34:43,085 INFO [train.py:1198] (1/2) Epoch 4, batch 300, loss[loss=0.3549, simple_loss=0.3615, pruned_loss=0.1383, ctc_loss=0.2591, cr_loss=0.496, over 34330.00 frames. ], tot_loss[loss=0.3397, simple_loss=0.3489, pruned_loss=0.1311, ctc_loss=0.2477, cr_loss=0.4715, over 5262669.78 frames. ], batch size: 107, lr: 2.86e-02, grad_scale: 32.0 2024-09-16 23:34:56,272 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.262e+02 2.941e+02 3.679e+02 4.621e+02 6.608e+02, threshold=7.357e+02, percent-clipped=1.0 2024-09-16 23:35:03,779 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2024-09-16 23:35:14,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.46 vs. limit=15.0 2024-09-16 23:35:27,219 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:36:08,284 INFO [train.py:1198] (1/2) Epoch 4, batch 350, loss[loss=0.2947, simple_loss=0.3131, pruned_loss=0.1084, ctc_loss=0.2078, cr_loss=0.4509, over 34295.00 frames. ], tot_loss[loss=0.3406, simple_loss=0.3497, pruned_loss=0.1314, ctc_loss=0.2484, cr_loss=0.473, over 5597880.62 frames. ], batch size: 83, lr: 2.86e-02, grad_scale: 16.0 2024-09-16 23:36:10,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=55902.0, ans=0.125 2024-09-16 23:36:20,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.87 vs. limit=22.5 2024-09-16 23:36:24,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.76 vs. limit=15.0 2024-09-16 23:36:24,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=55948.666666666664, ans=0.125 2024-09-16 23:36:30,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.50 vs. limit=15.0 2024-09-16 23:36:35,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2024-09-16 23:37:12,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.90 vs. limit=22.5 2024-09-16 23:37:15,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=56042.0, ans=0.0 2024-09-16 23:37:17,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=56042.0, ans=0.125 2024-09-16 23:37:27,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.93 vs. limit=15.0 2024-09-16 23:37:30,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=56088.666666666664, ans=0.125 2024-09-16 23:37:36,310 INFO [train.py:1198] (1/2) Epoch 4, batch 400, loss[loss=0.3274, simple_loss=0.3431, pruned_loss=0.1225, ctc_loss=0.2383, cr_loss=0.4739, over 34416.00 frames. ], tot_loss[loss=0.339, simple_loss=0.3485, pruned_loss=0.1306, ctc_loss=0.2471, cr_loss=0.4729, over 5864932.98 frames. ], batch size: 95, lr: 2.85e-02, grad_scale: 32.0 2024-09-16 23:37:44,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=56135.333333333336, ans=0.2 2024-09-16 23:37:50,991 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.488e+02 3.091e+02 3.669e+02 4.695e+02 1.808e+03, threshold=7.339e+02, percent-clipped=1.0 2024-09-16 23:37:56,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=56182.0, ans=0.0 2024-09-16 23:37:56,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=22.5 2024-09-16 23:38:22,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=56228.666666666664, ans=0.125 2024-09-16 23:38:22,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=56228.666666666664, ans=0.1 2024-09-16 23:38:53,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=56322.0, ans=0.1 2024-09-16 23:38:58,140 INFO [train.py:1198] (1/2) Epoch 4, batch 450, loss[loss=0.35, simple_loss=0.357, pruned_loss=0.1356, ctc_loss=0.2558, cr_loss=0.518, over 34685.00 frames. ], tot_loss[loss=0.3392, simple_loss=0.3486, pruned_loss=0.1307, ctc_loss=0.2473, cr_loss=0.473, over 6054269.73 frames. ], batch size: 97, lr: 2.85e-02, grad_scale: 32.0 2024-09-16 23:39:01,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=56368.666666666664, ans=0.2 2024-09-16 23:39:14,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=56415.333333333336, ans=0.0 2024-09-16 23:39:16,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=56415.333333333336, ans=0.125 2024-09-16 23:39:38,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=56462.0, ans=0.125 2024-09-16 23:39:43,436 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:39:58,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=56508.666666666664, ans=0.125 2024-09-16 23:39:59,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=56508.666666666664, ans=0.125 2024-09-16 23:40:00,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=56508.666666666664, ans=12.0 2024-09-16 23:40:01,541 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:40:17,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=56555.333333333336, ans=0.125 2024-09-16 23:40:23,987 INFO [train.py:1198] (1/2) Epoch 4, batch 500, loss[loss=0.3604, simple_loss=0.3693, pruned_loss=0.1392, ctc_loss=0.2628, cr_loss=0.5179, over 34468.00 frames. ], tot_loss[loss=0.3375, simple_loss=0.3472, pruned_loss=0.1299, ctc_loss=0.2458, cr_loss=0.4728, over 6221377.01 frames. ], batch size: 110, lr: 2.84e-02, grad_scale: 32.0 2024-09-16 23:40:37,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=56602.0, ans=0.125 2024-09-16 23:40:38,875 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.236e+02 2.767e+02 3.242e+02 4.053e+02 6.418e+02, threshold=6.485e+02, percent-clipped=0.0 2024-09-16 23:40:39,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=56648.666666666664, ans=0.125 2024-09-16 23:41:07,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.34 vs. limit=15.0 2024-09-16 23:41:20,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=56742.0, ans=0.125 2024-09-16 23:41:23,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=56742.0, ans=0.125 2024-09-16 23:41:31,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=56788.666666666664, ans=0.125 2024-09-16 23:41:33,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-09-16 23:41:44,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=56835.333333333336, ans=0.07 2024-09-16 23:41:45,997 INFO [train.py:1198] (1/2) Epoch 4, batch 550, loss[loss=0.3623, simple_loss=0.3721, pruned_loss=0.14, ctc_loss=0.2654, cr_loss=0.4853, over 33842.00 frames. ], tot_loss[loss=0.3374, simple_loss=0.347, pruned_loss=0.1299, ctc_loss=0.2459, cr_loss=0.4719, over 6330307.25 frames. ], batch size: 122, lr: 2.84e-02, grad_scale: 32.0 2024-09-16 23:42:15,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=56882.0, ans=0.125 2024-09-16 23:42:16,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.37 vs. limit=22.5 2024-09-16 23:42:32,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=56928.666666666664, ans=0.125 2024-09-16 23:42:58,484 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:42:58,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=57022.0, ans=0.0 2024-09-16 23:42:58,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=57022.0, ans=0.0 2024-09-16 23:42:58,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=57022.0, ans=0.0 2024-09-16 23:43:00,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=57022.0, ans=0.125 2024-09-16 23:43:08,122 INFO [train.py:1198] (1/2) Epoch 4, batch 600, loss[loss=0.3511, simple_loss=0.3651, pruned_loss=0.1328, ctc_loss=0.2552, cr_loss=0.5069, over 34263.00 frames. ], tot_loss[loss=0.3373, simple_loss=0.3472, pruned_loss=0.1297, ctc_loss=0.2456, cr_loss=0.4717, over 6432107.89 frames. ], batch size: 117, lr: 2.84e-02, grad_scale: 32.0 2024-09-16 23:43:08,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=57068.666666666664, ans=0.0 2024-09-16 23:43:09,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.66 vs. limit=12.0 2024-09-16 23:43:26,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.298e+02 3.120e+02 3.952e+02 5.064e+02 7.013e+02, threshold=7.904e+02, percent-clipped=4.0 2024-09-16 23:43:33,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=57115.333333333336, ans=0.125 2024-09-16 23:44:12,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=57208.666666666664, ans=0.0 2024-09-16 23:44:12,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=57208.666666666664, ans=0.0 2024-09-16 23:44:33,051 INFO [train.py:1198] (1/2) Epoch 4, batch 650, loss[loss=0.3266, simple_loss=0.3372, pruned_loss=0.1242, ctc_loss=0.2392, cr_loss=0.4924, over 34537.00 frames. ], tot_loss[loss=0.3354, simple_loss=0.3459, pruned_loss=0.1287, ctc_loss=0.2439, cr_loss=0.4705, over 6522571.33 frames. ], batch size: 94, lr: 2.83e-02, grad_scale: 32.0 2024-09-16 23:44:38,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=57302.0, ans=0.125 2024-09-16 23:44:43,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=57302.0, ans=0.125 2024-09-16 23:45:32,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.04 vs. limit=22.5 2024-09-16 23:45:38,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=57488.666666666664, ans=0.1 2024-09-16 23:45:54,759 INFO [train.py:1198] (1/2) Epoch 4, batch 700, loss[loss=0.3211, simple_loss=0.3278, pruned_loss=0.1244, ctc_loss=0.2328, cr_loss=0.4733, over 34589.00 frames. ], tot_loss[loss=0.336, simple_loss=0.3463, pruned_loss=0.1289, ctc_loss=0.2445, cr_loss=0.4716, over 6580906.63 frames. ], batch size: 89, lr: 2.83e-02, grad_scale: 32.0 2024-09-16 23:46:09,428 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.243e+02 3.002e+02 3.825e+02 4.465e+02 6.340e+02, threshold=7.650e+02, percent-clipped=0.0 2024-09-16 23:46:23,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.88 vs. limit=15.0 2024-09-16 23:46:31,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=57628.666666666664, ans=0.0 2024-09-16 23:46:40,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.57 vs. limit=22.5 2024-09-16 23:46:44,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=57675.333333333336, ans=0.0 2024-09-16 23:46:44,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.51 vs. limit=15.0 2024-09-16 23:46:57,703 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.25 vs. limit=12.0 2024-09-16 23:47:18,673 INFO [train.py:1198] (1/2) Epoch 4, batch 750, loss[loss=0.3374, simple_loss=0.3486, pruned_loss=0.1287, ctc_loss=0.2451, cr_loss=0.4941, over 34430.00 frames. ], tot_loss[loss=0.3345, simple_loss=0.3453, pruned_loss=0.1282, ctc_loss=0.2432, cr_loss=0.4706, over 6621357.02 frames. ], batch size: 95, lr: 2.82e-02, grad_scale: 16.0 2024-09-16 23:47:20,639 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:47:23,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=57768.666666666664, ans=0.125 2024-09-16 23:47:23,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=57768.666666666664, ans=0.1 2024-09-16 23:47:35,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.36 vs. limit=12.0 2024-09-16 23:47:38,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=57815.333333333336, ans=0.0 2024-09-16 23:47:51,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=57862.0, ans=0.0 2024-09-16 23:48:24,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=57955.333333333336, ans=0.125 2024-09-16 23:48:29,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=57955.333333333336, ans=0.2 2024-09-16 23:48:32,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=57955.333333333336, ans=0.2 2024-09-16 23:48:40,599 INFO [train.py:1198] (1/2) Epoch 4, batch 800, loss[loss=0.2809, simple_loss=0.3044, pruned_loss=0.101, ctc_loss=0.1987, cr_loss=0.3913, over 34448.00 frames. ], tot_loss[loss=0.3334, simple_loss=0.3445, pruned_loss=0.1276, ctc_loss=0.2422, cr_loss=0.4698, over 6658074.07 frames. ], batch size: 85, lr: 2.82e-02, grad_scale: 32.0 2024-09-16 23:48:42,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-09-16 23:48:56,863 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.137e+02 3.018e+02 3.503e+02 4.227e+02 1.301e+03, threshold=7.007e+02, percent-clipped=2.0 2024-09-16 23:49:10,868 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.34 vs. limit=15.0 2024-09-16 23:49:30,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=58142.0, ans=0.0 2024-09-16 23:50:02,316 INFO [train.py:1198] (1/2) Epoch 4, batch 850, loss[loss=0.3265, simple_loss=0.3483, pruned_loss=0.1203, ctc_loss=0.2293, cr_loss=0.4544, over 34363.00 frames. ], tot_loss[loss=0.333, simple_loss=0.3442, pruned_loss=0.1273, ctc_loss=0.2416, cr_loss=0.4699, over 6691902.37 frames. ], batch size: 103, lr: 2.81e-02, grad_scale: 32.0 2024-09-16 23:50:04,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=58235.333333333336, ans=0.1 2024-09-16 23:50:20,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=58282.0, ans=0.0 2024-09-16 23:50:30,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=58282.0, ans=0.125 2024-09-16 23:50:52,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=58375.333333333336, ans=0.09899494936611666 2024-09-16 23:50:52,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=58375.333333333336, ans=0.0 2024-09-16 23:51:02,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=58375.333333333336, ans=0.1 2024-09-16 23:51:07,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=58375.333333333336, ans=0.125 2024-09-16 23:51:20,603 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:51:28,378 INFO [train.py:1198] (1/2) Epoch 4, batch 900, loss[loss=0.295, simple_loss=0.311, pruned_loss=0.1096, ctc_loss=0.2085, cr_loss=0.4566, over 34466.00 frames. ], tot_loss[loss=0.3339, simple_loss=0.3448, pruned_loss=0.1279, ctc_loss=0.2425, cr_loss=0.4704, over 6697508.30 frames. ], batch size: 85, lr: 2.81e-02, grad_scale: 32.0 2024-09-16 23:51:44,607 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.334e+02 2.894e+02 3.245e+02 4.058e+02 8.066e+02, threshold=6.490e+02, percent-clipped=1.0 2024-09-16 23:52:03,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=58562.0, ans=0.125 2024-09-16 23:52:44,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=58655.333333333336, ans=0.0 2024-09-16 23:52:50,308 INFO [train.py:1198] (1/2) Epoch 4, batch 950, loss[loss=0.2927, simple_loss=0.3116, pruned_loss=0.1071, ctc_loss=0.2103, cr_loss=0.4356, over 34691.00 frames. ], tot_loss[loss=0.3346, simple_loss=0.3452, pruned_loss=0.1282, ctc_loss=0.2433, cr_loss=0.4711, over 6701107.19 frames. ], batch size: 87, lr: 2.81e-02, grad_scale: 8.0 2024-09-16 23:52:56,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=58702.0, ans=0.125 2024-09-16 23:52:58,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=58702.0, ans=0.125 2024-09-16 23:53:59,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=58888.666666666664, ans=0.125 2024-09-16 23:54:03,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=58888.666666666664, ans=0.1 2024-09-16 23:54:04,694 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-16 23:54:12,372 INFO [train.py:1198] (1/2) Epoch 4, batch 1000, loss[loss=0.3358, simple_loss=0.3467, pruned_loss=0.1286, ctc_loss=0.2441, cr_loss=0.4697, over 34486.00 frames. ], tot_loss[loss=0.3355, simple_loss=0.3458, pruned_loss=0.1288, ctc_loss=0.2442, cr_loss=0.4719, over 6694159.02 frames. ], batch size: 90, lr: 2.80e-02, grad_scale: 8.0 2024-09-16 23:54:33,868 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.449e+02 2.984e+02 3.585e+02 4.260e+02 8.252e+02, threshold=7.170e+02, percent-clipped=5.0 2024-09-16 23:54:34,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=58982.0, ans=0.125 2024-09-16 23:54:48,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2024-09-16 23:55:00,511 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.67 vs. limit=6.0 2024-09-16 23:55:14,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59075.333333333336, ans=0.1 2024-09-16 23:55:38,158 INFO [train.py:1198] (1/2) Epoch 4, batch 1050, loss[loss=0.3328, simple_loss=0.3522, pruned_loss=0.1243, ctc_loss=0.2323, cr_loss=0.4587, over 34561.00 frames. ], tot_loss[loss=0.3345, simple_loss=0.3451, pruned_loss=0.1282, ctc_loss=0.2431, cr_loss=0.4711, over 6702876.94 frames. ], batch size: 99, lr: 2.80e-02, grad_scale: 8.0 2024-09-16 23:55:43,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2024-09-16 23:56:11,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=59262.0, ans=0.125 2024-09-16 23:56:33,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2024-09-16 23:57:00,414 INFO [train.py:1198] (1/2) Epoch 4, batch 1100, loss[loss=0.3391, simple_loss=0.346, pruned_loss=0.1311, ctc_loss=0.2507, cr_loss=0.4943, over 34321.00 frames. ], tot_loss[loss=0.3349, simple_loss=0.3454, pruned_loss=0.1284, ctc_loss=0.2433, cr_loss=0.4722, over 6716521.57 frames. ], batch size: 91, lr: 2.79e-02, grad_scale: 8.0 2024-09-16 23:57:19,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.88 vs. limit=6.0 2024-09-16 23:57:19,997 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.311e+02 2.987e+02 3.543e+02 4.297e+02 9.177e+02, threshold=7.085e+02, percent-clipped=1.0 2024-09-16 23:57:33,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=59495.333333333336, ans=0.125 2024-09-16 23:57:34,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=59495.333333333336, ans=0.0 2024-09-16 23:57:38,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=5.59 vs. limit=12.0 2024-09-16 23:57:41,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=59495.333333333336, ans=0.0 2024-09-16 23:58:03,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=59542.0, ans=0.1 2024-09-16 23:58:26,236 INFO [train.py:1198] (1/2) Epoch 4, batch 1150, loss[loss=0.3269, simple_loss=0.3396, pruned_loss=0.1242, ctc_loss=0.2389, cr_loss=0.4513, over 34738.00 frames. ], tot_loss[loss=0.3352, simple_loss=0.3457, pruned_loss=0.1286, ctc_loss=0.2436, cr_loss=0.4718, over 6716423.97 frames. ], batch size: 92, lr: 2.79e-02, grad_scale: 8.0 2024-09-16 23:58:33,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=59635.333333333336, ans=0.0 2024-09-16 23:58:39,939 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2024-09-16 23:59:09,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=59728.666666666664, ans=0.125 2024-09-16 23:59:48,479 INFO [train.py:1198] (1/2) Epoch 4, batch 1200, loss[loss=0.3283, simple_loss=0.3503, pruned_loss=0.1207, ctc_loss=0.2309, cr_loss=0.4678, over 34563.00 frames. ], tot_loss[loss=0.3361, simple_loss=0.3466, pruned_loss=0.129, ctc_loss=0.2443, cr_loss=0.4728, over 6709925.15 frames. ], batch size: 99, lr: 2.79e-02, grad_scale: 16.0 2024-09-17 00:00:08,040 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.373e+02 2.789e+02 3.366e+02 3.721e+02 1.088e+03, threshold=6.731e+02, percent-clipped=2.0 2024-09-17 00:00:08,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=59915.333333333336, ans=0.125 2024-09-17 00:00:33,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=59962.0, ans=0.125 2024-09-17 00:00:56,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.58 vs. limit=15.0 2024-09-17 00:01:10,438 INFO [train.py:1198] (1/2) Epoch 4, batch 1250, loss[loss=0.3363, simple_loss=0.3552, pruned_loss=0.1252, ctc_loss=0.2388, cr_loss=0.4777, over 34336.00 frames. ], tot_loss[loss=0.336, simple_loss=0.3468, pruned_loss=0.1287, ctc_loss=0.2437, cr_loss=0.4731, over 6742909.08 frames. ], batch size: 107, lr: 2.78e-02, grad_scale: 16.0 2024-09-17 00:01:17,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=60102.0, ans=0.09899494936611666 2024-09-17 00:01:27,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=60148.666666666664, ans=0.0 2024-09-17 00:01:33,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=60148.666666666664, ans=0.0 2024-09-17 00:02:13,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=60242.0, ans=0.025 2024-09-17 00:02:36,252 INFO [train.py:1198] (1/2) Epoch 4, batch 1300, loss[loss=0.3512, simple_loss=0.3645, pruned_loss=0.1338, ctc_loss=0.2554, cr_loss=0.4814, over 33041.00 frames. ], tot_loss[loss=0.3345, simple_loss=0.3456, pruned_loss=0.128, ctc_loss=0.2425, cr_loss=0.4723, over 6746009.29 frames. ], batch size: 130, lr: 2.78e-02, grad_scale: 16.0 2024-09-17 00:02:47,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.13 vs. limit=15.0 2024-09-17 00:02:56,013 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.299e+02 2.793e+02 3.079e+02 3.599e+02 6.675e+02, threshold=6.158e+02, percent-clipped=0.0 2024-09-17 00:02:56,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=60382.0, ans=0.125 2024-09-17 00:03:06,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=60382.0, ans=0.125 2024-09-17 00:03:25,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=60475.333333333336, ans=0.125 2024-09-17 00:03:32,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=60475.333333333336, ans=0.025 2024-09-17 00:03:58,256 INFO [train.py:1198] (1/2) Epoch 4, batch 1350, loss[loss=0.3337, simple_loss=0.3504, pruned_loss=0.1253, ctc_loss=0.2402, cr_loss=0.4584, over 34537.00 frames. ], tot_loss[loss=0.3335, simple_loss=0.345, pruned_loss=0.1274, ctc_loss=0.2415, cr_loss=0.4716, over 6765944.15 frames. ], batch size: 94, lr: 2.77e-02, grad_scale: 16.0 2024-09-17 00:04:34,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=60662.0, ans=0.125 2024-09-17 00:04:41,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=60662.0, ans=0.0 2024-09-17 00:04:42,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=60662.0, ans=0.125 2024-09-17 00:04:56,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.79 vs. limit=15.0 2024-09-17 00:05:08,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=22.5 2024-09-17 00:05:20,237 INFO [train.py:1198] (1/2) Epoch 4, batch 1400, loss[loss=0.3055, simple_loss=0.3185, pruned_loss=0.1154, ctc_loss=0.2146, cr_loss=0.4687, over 34282.00 frames. ], tot_loss[loss=0.3332, simple_loss=0.3446, pruned_loss=0.1273, ctc_loss=0.2414, cr_loss=0.4722, over 6777867.39 frames. ], batch size: 80, lr: 2.77e-02, grad_scale: 16.0 2024-09-17 00:05:38,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=60848.666666666664, ans=0.015 2024-09-17 00:05:40,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.85 vs. limit=22.5 2024-09-17 00:05:41,272 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.205e+02 3.019e+02 3.682e+02 4.323e+02 9.362e+02, threshold=7.364e+02, percent-clipped=4.0 2024-09-17 00:06:20,194 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.11 vs. limit=10.0 2024-09-17 00:06:22,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=60942.0, ans=0.0 2024-09-17 00:06:24,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=60942.0, ans=0.125 2024-09-17 00:06:36,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=60988.666666666664, ans=0.1 2024-09-17 00:06:36,978 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.59 vs. limit=15.0 2024-09-17 00:06:38,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2024-09-17 00:06:45,763 INFO [train.py:1198] (1/2) Epoch 4, batch 1450, loss[loss=0.3618, simple_loss=0.3699, pruned_loss=0.14, ctc_loss=0.2644, cr_loss=0.5241, over 34464.00 frames. ], tot_loss[loss=0.3337, simple_loss=0.3451, pruned_loss=0.1275, ctc_loss=0.2419, cr_loss=0.4725, over 6775311.35 frames. ], batch size: 110, lr: 2.77e-02, grad_scale: 16.0 2024-09-17 00:06:47,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=61035.333333333336, ans=0.025 2024-09-17 00:07:00,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=61082.0, ans=0.125 2024-09-17 00:07:26,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=61128.666666666664, ans=0.125 2024-09-17 00:08:00,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=61222.0, ans=0.125 2024-09-17 00:08:02,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=61222.0, ans=0.125 2024-09-17 00:08:07,208 INFO [train.py:1198] (1/2) Epoch 4, batch 1500, loss[loss=0.3496, simple_loss=0.3606, pruned_loss=0.1346, ctc_loss=0.2495, cr_loss=0.4854, over 34443.00 frames. ], tot_loss[loss=0.3335, simple_loss=0.3451, pruned_loss=0.1273, ctc_loss=0.2416, cr_loss=0.4721, over 6775016.26 frames. ], batch size: 100, lr: 2.76e-02, grad_scale: 16.0 2024-09-17 00:08:19,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=61268.666666666664, ans=0.125 2024-09-17 00:08:27,065 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.312e+02 2.905e+02 3.361e+02 4.444e+02 8.655e+02, threshold=6.723e+02, percent-clipped=1.0 2024-09-17 00:08:29,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=61315.333333333336, ans=0.125 2024-09-17 00:08:32,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=61315.333333333336, ans=0.1 2024-09-17 00:08:43,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.04 vs. limit=22.5 2024-09-17 00:08:53,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=61362.0, ans=0.1 2024-09-17 00:09:00,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=61408.666666666664, ans=0.0 2024-09-17 00:09:04,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.00 vs. limit=15.0 2024-09-17 00:09:16,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=61455.333333333336, ans=0.125 2024-09-17 00:09:29,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=61502.0, ans=0.125 2024-09-17 00:09:31,098 INFO [train.py:1198] (1/2) Epoch 4, batch 1550, loss[loss=0.3403, simple_loss=0.3576, pruned_loss=0.1268, ctc_loss=0.242, cr_loss=0.5238, over 34426.00 frames. ], tot_loss[loss=0.3338, simple_loss=0.345, pruned_loss=0.1277, ctc_loss=0.2421, cr_loss=0.4725, over 6745178.70 frames. ], batch size: 105, lr: 2.76e-02, grad_scale: 16.0 2024-09-17 00:09:33,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=61502.0, ans=0.0 2024-09-17 00:09:43,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=61502.0, ans=0.0 2024-09-17 00:09:48,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=61548.666666666664, ans=0.125 2024-09-17 00:09:49,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=61548.666666666664, ans=0.0 2024-09-17 00:09:51,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=61548.666666666664, ans=0.1 2024-09-17 00:09:59,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=61548.666666666664, ans=0.0 2024-09-17 00:10:02,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=61548.666666666664, ans=0.125 2024-09-17 00:10:29,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=61642.0, ans=0.0 2024-09-17 00:10:44,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=61688.666666666664, ans=0.0 2024-09-17 00:10:44,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=61688.666666666664, ans=0.5 2024-09-17 00:10:50,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=61688.666666666664, ans=0.09899494936611666 2024-09-17 00:10:53,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2024-09-17 00:10:55,124 INFO [train.py:1198] (1/2) Epoch 4, batch 1600, loss[loss=0.3435, simple_loss=0.3527, pruned_loss=0.1324, ctc_loss=0.2526, cr_loss=0.4746, over 34552.00 frames. ], tot_loss[loss=0.3339, simple_loss=0.345, pruned_loss=0.1277, ctc_loss=0.2424, cr_loss=0.4718, over 6724665.54 frames. ], batch size: 99, lr: 2.75e-02, grad_scale: 32.0 2024-09-17 00:11:05,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=61735.333333333336, ans=0.125 2024-09-17 00:11:06,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=61735.333333333336, ans=0.0 2024-09-17 00:11:14,646 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.279e+02 3.329e+02 4.231e+02 5.421e+02 9.728e+02, threshold=8.462e+02, percent-clipped=9.0 2024-09-17 00:11:18,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2024-09-17 00:11:21,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=61782.0, ans=0.2 2024-09-17 00:11:34,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2024-09-17 00:11:36,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=61828.666666666664, ans=0.5 2024-09-17 00:11:51,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=61875.333333333336, ans=0.125 2024-09-17 00:11:51,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=61875.333333333336, ans=0.0 2024-09-17 00:12:05,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=15.16 vs. limit=15.0 2024-09-17 00:12:07,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=61922.0, ans=0.0 2024-09-17 00:12:17,133 INFO [train.py:1198] (1/2) Epoch 4, batch 1650, loss[loss=0.3571, simple_loss=0.3682, pruned_loss=0.1368, ctc_loss=0.2656, cr_loss=0.4809, over 34411.00 frames. ], tot_loss[loss=0.3347, simple_loss=0.3456, pruned_loss=0.1282, ctc_loss=0.2431, cr_loss=0.472, over 6718406.96 frames. ], batch size: 103, lr: 2.75e-02, grad_scale: 32.0 2024-09-17 00:12:43,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=62015.333333333336, ans=0.1 2024-09-17 00:12:53,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=62062.0, ans=0.04949747468305833 2024-09-17 00:13:03,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=62062.0, ans=0.1 2024-09-17 00:13:42,752 INFO [train.py:1198] (1/2) Epoch 4, batch 1700, loss[loss=0.29, simple_loss=0.3111, pruned_loss=0.1061, ctc_loss=0.2023, cr_loss=0.4074, over 34307.00 frames. ], tot_loss[loss=0.3341, simple_loss=0.3454, pruned_loss=0.1277, ctc_loss=0.2424, cr_loss=0.4717, over 6744242.64 frames. ], batch size: 80, lr: 2.75e-02, grad_scale: 32.0 2024-09-17 00:13:49,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=62202.0, ans=0.0 2024-09-17 00:14:02,233 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.239e+02 2.953e+02 3.549e+02 4.445e+02 7.212e+02, threshold=7.098e+02, percent-clipped=0.0 2024-09-17 00:14:28,895 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:14:56,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=62388.666666666664, ans=0.5 2024-09-17 00:15:04,654 INFO [train.py:1198] (1/2) Epoch 4, batch 1750, loss[loss=0.3089, simple_loss=0.3166, pruned_loss=0.1188, ctc_loss=0.2237, cr_loss=0.4698, over 34176.00 frames. ], tot_loss[loss=0.3333, simple_loss=0.3447, pruned_loss=0.1273, ctc_loss=0.2417, cr_loss=0.4708, over 6753263.75 frames. ], batch size: 78, lr: 2.74e-02, grad_scale: 32.0 2024-09-17 00:15:55,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=62575.333333333336, ans=0.0 2024-09-17 00:15:57,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=62575.333333333336, ans=0.125 2024-09-17 00:16:21,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=62622.0, ans=0.125 2024-09-17 00:16:26,116 INFO [train.py:1198] (1/2) Epoch 4, batch 1800, loss[loss=0.34, simple_loss=0.3502, pruned_loss=0.13, ctc_loss=0.2491, cr_loss=0.4954, over 34709.00 frames. ], tot_loss[loss=0.3334, simple_loss=0.3447, pruned_loss=0.1274, ctc_loss=0.2418, cr_loss=0.4713, over 6755131.00 frames. ], batch size: 97, lr: 2.74e-02, grad_scale: 32.0 2024-09-17 00:16:29,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=62668.666666666664, ans=0.125 2024-09-17 00:16:30,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.17 vs. limit=22.5 2024-09-17 00:16:47,444 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.353e+02 3.155e+02 3.770e+02 4.662e+02 6.874e+02, threshold=7.539e+02, percent-clipped=0.0 2024-09-17 00:16:59,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=62762.0, ans=0.0 2024-09-17 00:17:05,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=62762.0, ans=0.125 2024-09-17 00:17:06,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.64 vs. limit=15.0 2024-09-17 00:17:31,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.49 vs. limit=15.0 2024-09-17 00:17:52,285 INFO [train.py:1198] (1/2) Epoch 4, batch 1850, loss[loss=0.3427, simple_loss=0.3588, pruned_loss=0.1287, ctc_loss=0.246, cr_loss=0.5027, over 34435.00 frames. ], tot_loss[loss=0.3325, simple_loss=0.344, pruned_loss=0.1269, ctc_loss=0.2409, cr_loss=0.471, over 6761735.51 frames. ], batch size: 100, lr: 2.73e-02, grad_scale: 32.0 2024-09-17 00:18:30,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=62995.333333333336, ans=0.125 2024-09-17 00:19:13,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=15.0 2024-09-17 00:19:13,837 INFO [train.py:1198] (1/2) Epoch 4, batch 1900, loss[loss=0.3447, simple_loss=0.356, pruned_loss=0.1316, ctc_loss=0.2496, cr_loss=0.5059, over 34366.00 frames. ], tot_loss[loss=0.3326, simple_loss=0.3445, pruned_loss=0.1269, ctc_loss=0.2409, cr_loss=0.4714, over 6772345.31 frames. ], batch size: 103, lr: 2.73e-02, grad_scale: 32.0 2024-09-17 00:19:19,315 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:19:21,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=63135.333333333336, ans=0.125 2024-09-17 00:19:33,615 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.287e+02 2.960e+02 3.354e+02 4.315e+02 8.990e+02, threshold=6.708e+02, percent-clipped=2.0 2024-09-17 00:20:23,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=63322.0, ans=0.2 2024-09-17 00:20:37,577 INFO [train.py:1198] (1/2) Epoch 4, batch 1950, loss[loss=0.3223, simple_loss=0.3393, pruned_loss=0.1209, ctc_loss=0.2236, cr_loss=0.4704, over 34378.00 frames. ], tot_loss[loss=0.3339, simple_loss=0.3458, pruned_loss=0.1274, ctc_loss=0.2417, cr_loss=0.4726, over 6789143.56 frames. ], batch size: 91, lr: 2.73e-02, grad_scale: 32.0 2024-09-17 00:20:52,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=63415.333333333336, ans=0.0 2024-09-17 00:20:54,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63415.333333333336, ans=0.1 2024-09-17 00:20:56,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=63415.333333333336, ans=0.125 2024-09-17 00:21:07,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.35 vs. limit=15.0 2024-09-17 00:21:12,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=63462.0, ans=0.2 2024-09-17 00:21:26,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.32 vs. limit=15.0 2024-09-17 00:21:32,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=63508.666666666664, ans=0.125 2024-09-17 00:22:02,052 INFO [train.py:1198] (1/2) Epoch 4, batch 2000, loss[loss=0.2884, simple_loss=0.3033, pruned_loss=0.1072, ctc_loss=0.2064, cr_loss=0.4442, over 34152.00 frames. ], tot_loss[loss=0.3343, simple_loss=0.3462, pruned_loss=0.1275, ctc_loss=0.242, cr_loss=0.4731, over 6764314.40 frames. ], batch size: 78, lr: 2.72e-02, grad_scale: 32.0 2024-09-17 00:22:07,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=63602.0, ans=0.0 2024-09-17 00:22:21,907 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.280e+02 2.939e+02 3.604e+02 4.426e+02 8.549e+02, threshold=7.209e+02, percent-clipped=1.0 2024-09-17 00:22:22,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=63648.666666666664, ans=0.025 2024-09-17 00:22:32,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=63648.666666666664, ans=0.2 2024-09-17 00:22:53,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=63742.0, ans=15.0 2024-09-17 00:23:23,917 INFO [train.py:1198] (1/2) Epoch 4, batch 2050, loss[loss=0.3039, simple_loss=0.3187, pruned_loss=0.1142, ctc_loss=0.2166, cr_loss=0.4344, over 34485.00 frames. ], tot_loss[loss=0.3333, simple_loss=0.345, pruned_loss=0.1273, ctc_loss=0.2412, cr_loss=0.4725, over 6754459.22 frames. ], batch size: 82, lr: 2.72e-02, grad_scale: 32.0 2024-09-17 00:23:26,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-17 00:23:34,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=63835.333333333336, ans=0.05 2024-09-17 00:23:39,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=63882.0, ans=0.125 2024-09-17 00:23:54,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2024-09-17 00:24:18,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=63975.333333333336, ans=0.025 2024-09-17 00:24:26,910 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:24:28,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=63975.333333333336, ans=0.125 2024-09-17 00:24:41,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=64022.0, ans=0.125 2024-09-17 00:24:47,829 INFO [train.py:1198] (1/2) Epoch 4, batch 2100, loss[loss=0.3201, simple_loss=0.3403, pruned_loss=0.1178, ctc_loss=0.2269, cr_loss=0.475, over 34558.00 frames. ], tot_loss[loss=0.3312, simple_loss=0.3433, pruned_loss=0.1261, ctc_loss=0.2393, cr_loss=0.4714, over 6769602.95 frames. ], batch size: 94, lr: 2.71e-02, grad_scale: 32.0 2024-09-17 00:24:51,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2024-09-17 00:25:03,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=64068.666666666664, ans=0.0 2024-09-17 00:25:09,243 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 2.993e+02 3.582e+02 4.594e+02 7.214e+02, threshold=7.164e+02, percent-clipped=2.0 2024-09-17 00:25:10,586 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.38 vs. limit=15.0 2024-09-17 00:25:11,619 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=12.0 2024-09-17 00:25:19,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=64115.333333333336, ans=0.09899494936611666 2024-09-17 00:25:27,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=64162.0, ans=0.125 2024-09-17 00:25:40,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=64208.666666666664, ans=0.0 2024-09-17 00:26:11,301 INFO [train.py:1198] (1/2) Epoch 4, batch 2150, loss[loss=0.3418, simple_loss=0.3524, pruned_loss=0.1311, ctc_loss=0.2487, cr_loss=0.4798, over 34363.00 frames. ], tot_loss[loss=0.3297, simple_loss=0.3423, pruned_loss=0.1254, ctc_loss=0.2381, cr_loss=0.4705, over 6788286.49 frames. ], batch size: 91, lr: 2.71e-02, grad_scale: 32.0 2024-09-17 00:26:18,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=64302.0, ans=0.125 2024-09-17 00:26:25,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=64302.0, ans=0.2 2024-09-17 00:26:48,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=5.81 vs. limit=12.0 2024-09-17 00:27:07,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=64442.0, ans=0.1 2024-09-17 00:27:16,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=64488.666666666664, ans=0.5 2024-09-17 00:27:24,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=64488.666666666664, ans=0.0 2024-09-17 00:27:33,752 INFO [train.py:1198] (1/2) Epoch 4, batch 2200, loss[loss=0.349, simple_loss=0.3592, pruned_loss=0.1345, ctc_loss=0.2519, cr_loss=0.4837, over 34443.00 frames. ], tot_loss[loss=0.3301, simple_loss=0.3425, pruned_loss=0.1256, ctc_loss=0.2385, cr_loss=0.4711, over 6783253.07 frames. ], batch size: 100, lr: 2.71e-02, grad_scale: 16.0 2024-09-17 00:27:47,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.21 vs. limit=10.0 2024-09-17 00:27:56,478 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.564e+02 3.258e+02 4.175e+02 5.439e+02 1.042e+03, threshold=8.351e+02, percent-clipped=7.0 2024-09-17 00:27:58,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=64582.0, ans=0.125 2024-09-17 00:28:00,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=64582.0, ans=0.07 2024-09-17 00:28:11,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=64628.666666666664, ans=0.0 2024-09-17 00:28:26,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=64675.333333333336, ans=0.2 2024-09-17 00:28:32,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=64675.333333333336, ans=0.2 2024-09-17 00:28:59,229 INFO [train.py:1198] (1/2) Epoch 4, batch 2250, loss[loss=0.3609, simple_loss=0.3644, pruned_loss=0.1428, ctc_loss=0.2623, cr_loss=0.4815, over 34432.00 frames. ], tot_loss[loss=0.3301, simple_loss=0.3426, pruned_loss=0.1255, ctc_loss=0.2385, cr_loss=0.4707, over 6780067.03 frames. ], batch size: 95, lr: 2.70e-02, grad_scale: 16.0 2024-09-17 00:29:19,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=64815.333333333336, ans=0.0 2024-09-17 00:29:23,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-09-17 00:29:38,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=64862.0, ans=0.1 2024-09-17 00:29:41,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=64862.0, ans=0.04949747468305833 2024-09-17 00:30:09,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=64955.333333333336, ans=0.0 2024-09-17 00:30:11,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=64955.333333333336, ans=0.0 2024-09-17 00:30:20,725 INFO [train.py:1198] (1/2) Epoch 4, batch 2300, loss[loss=0.2915, simple_loss=0.3082, pruned_loss=0.108, ctc_loss=0.2042, cr_loss=0.4441, over 34288.00 frames. ], tot_loss[loss=0.3284, simple_loss=0.3411, pruned_loss=0.1248, ctc_loss=0.237, cr_loss=0.4678, over 6764928.56 frames. ], batch size: 83, lr: 2.70e-02, grad_scale: 16.0 2024-09-17 00:30:40,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=65048.666666666664, ans=0.1 2024-09-17 00:30:41,733 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.200e+02 3.136e+02 4.255e+02 5.955e+02 1.224e+03, threshold=8.510e+02, percent-clipped=3.0 2024-09-17 00:30:45,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=65048.666666666664, ans=0.1 2024-09-17 00:30:53,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=65095.333333333336, ans=0.125 2024-09-17 00:31:08,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=65142.0, ans=0.0 2024-09-17 00:31:14,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=65142.0, ans=0.125 2024-09-17 00:31:19,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65142.0, ans=0.1 2024-09-17 00:31:34,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=65188.666666666664, ans=0.125 2024-09-17 00:31:39,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.74 vs. limit=15.0 2024-09-17 00:31:43,548 INFO [train.py:1198] (1/2) Epoch 4, batch 2350, loss[loss=0.3459, simple_loss=0.3539, pruned_loss=0.1338, ctc_loss=0.257, cr_loss=0.47, over 34689.00 frames. ], tot_loss[loss=0.3279, simple_loss=0.341, pruned_loss=0.1244, ctc_loss=0.2364, cr_loss=0.4674, over 6771924.51 frames. ], batch size: 97, lr: 2.70e-02, grad_scale: 16.0 2024-09-17 00:31:46,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65235.333333333336, ans=0.1 2024-09-17 00:31:50,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=65235.333333333336, ans=0.125 2024-09-17 00:32:04,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=65282.0, ans=0.0 2024-09-17 00:32:17,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.64 vs. limit=15.0 2024-09-17 00:32:38,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65375.333333333336, ans=0.1 2024-09-17 00:32:41,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=65375.333333333336, ans=0.125 2024-09-17 00:32:56,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=65422.0, ans=0.125 2024-09-17 00:33:07,341 INFO [train.py:1198] (1/2) Epoch 4, batch 2400, loss[loss=0.3172, simple_loss=0.3286, pruned_loss=0.1204, ctc_loss=0.2287, cr_loss=0.4825, over 34573.00 frames. ], tot_loss[loss=0.3288, simple_loss=0.3417, pruned_loss=0.1249, ctc_loss=0.2372, cr_loss=0.4688, over 6775660.61 frames. ], batch size: 89, lr: 2.69e-02, grad_scale: 32.0 2024-09-17 00:33:21,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2024-09-17 00:33:28,507 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.394e+02 2.957e+02 3.678e+02 4.663e+02 1.113e+03, threshold=7.356e+02, percent-clipped=2.0 2024-09-17 00:33:31,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1.whitening_limit, batch_count=65515.333333333336, ans=10.0 2024-09-17 00:33:40,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=65562.0, ans=0.125 2024-09-17 00:33:53,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=65562.0, ans=0.0 2024-09-17 00:34:23,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=65655.33333333333, ans=0.0 2024-09-17 00:34:29,542 INFO [train.py:1198] (1/2) Epoch 4, batch 2450, loss[loss=0.3378, simple_loss=0.3473, pruned_loss=0.1299, ctc_loss=0.2454, cr_loss=0.4891, over 34414.00 frames. ], tot_loss[loss=0.3308, simple_loss=0.3432, pruned_loss=0.1259, ctc_loss=0.2391, cr_loss=0.4707, over 6750578.07 frames. ], batch size: 95, lr: 2.69e-02, grad_scale: 32.0 2024-09-17 00:34:31,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=65702.0, ans=0.2 2024-09-17 00:34:56,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=65748.66666666667, ans=0.125 2024-09-17 00:34:56,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=65748.66666666667, ans=0.125 2024-09-17 00:34:56,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.55 vs. limit=15.0 2024-09-17 00:35:13,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=65795.33333333333, ans=0.125 2024-09-17 00:35:15,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=65795.33333333333, ans=0.0 2024-09-17 00:35:40,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.95 vs. limit=15.0 2024-09-17 00:35:43,614 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2024-09-17 00:35:52,831 INFO [train.py:1198] (1/2) Epoch 4, batch 2500, loss[loss=0.3311, simple_loss=0.348, pruned_loss=0.1246, ctc_loss=0.23, cr_loss=0.4727, over 34460.00 frames. ], tot_loss[loss=0.3302, simple_loss=0.3429, pruned_loss=0.1255, ctc_loss=0.2385, cr_loss=0.4705, over 6762734.60 frames. ], batch size: 100, lr: 2.68e-02, grad_scale: 32.0 2024-09-17 00:35:55,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-09-17 00:36:16,107 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.316e+02 2.934e+02 3.346e+02 4.379e+02 8.345e+02, threshold=6.691e+02, percent-clipped=1.0 2024-09-17 00:36:43,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=66075.33333333333, ans=0.125 2024-09-17 00:36:54,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=66075.33333333333, ans=0.025 2024-09-17 00:37:01,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2024-09-17 00:37:05,278 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.50 vs. limit=12.0 2024-09-17 00:37:09,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=66122.0, ans=0.125 2024-09-17 00:37:17,371 INFO [train.py:1198] (1/2) Epoch 4, batch 2550, loss[loss=0.3051, simple_loss=0.3101, pruned_loss=0.1183, ctc_loss=0.224, cr_loss=0.468, over 34179.00 frames. ], tot_loss[loss=0.3298, simple_loss=0.3427, pruned_loss=0.1252, ctc_loss=0.238, cr_loss=0.4701, over 6766388.66 frames. ], batch size: 78, lr: 2.68e-02, grad_scale: 16.0 2024-09-17 00:37:37,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.92 vs. limit=15.0 2024-09-17 00:37:46,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=66215.33333333333, ans=0.125 2024-09-17 00:37:50,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=66262.0, ans=0.95 2024-09-17 00:38:07,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.36 vs. limit=6.0 2024-09-17 00:38:08,858 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:38:15,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=66308.66666666667, ans=0.125 2024-09-17 00:38:35,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=66355.33333333333, ans=0.125 2024-09-17 00:38:39,553 INFO [train.py:1198] (1/2) Epoch 4, batch 2600, loss[loss=0.3371, simple_loss=0.3524, pruned_loss=0.1276, ctc_loss=0.2359, cr_loss=0.4882, over 34358.00 frames. ], tot_loss[loss=0.3301, simple_loss=0.3431, pruned_loss=0.1253, ctc_loss=0.2383, cr_loss=0.4709, over 6762811.44 frames. ], batch size: 91, lr: 2.68e-02, grad_scale: 16.0 2024-09-17 00:38:44,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=66402.0, ans=0.2 2024-09-17 00:38:49,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=66402.0, ans=0.0 2024-09-17 00:38:54,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=66402.0, ans=0.125 2024-09-17 00:38:59,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2024-09-17 00:39:03,886 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.348e+02 2.861e+02 3.375e+02 4.444e+02 6.996e+02, threshold=6.751e+02, percent-clipped=1.0 2024-09-17 00:39:12,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=66495.33333333333, ans=0.2 2024-09-17 00:39:34,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2024-09-17 00:40:04,443 INFO [train.py:1198] (1/2) Epoch 4, batch 2650, loss[loss=0.3477, simple_loss=0.3595, pruned_loss=0.133, ctc_loss=0.2498, cr_loss=0.4962, over 34247.00 frames. ], tot_loss[loss=0.3302, simple_loss=0.3433, pruned_loss=0.1253, ctc_loss=0.2381, cr_loss=0.4713, over 6769928.92 frames. ], batch size: 117, lr: 2.67e-02, grad_scale: 16.0 2024-09-17 00:40:14,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=66635.33333333333, ans=0.2 2024-09-17 00:40:29,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=66682.0, ans=0.125 2024-09-17 00:40:33,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=66682.0, ans=0.125 2024-09-17 00:40:36,144 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.56 vs. limit=15.0 2024-09-17 00:40:37,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=66728.66666666667, ans=0.0 2024-09-17 00:41:00,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.97 vs. limit=22.5 2024-09-17 00:41:13,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=66822.0, ans=0.1 2024-09-17 00:41:19,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=66822.0, ans=0.125 2024-09-17 00:41:24,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=66868.66666666667, ans=0.2 2024-09-17 00:41:25,699 INFO [train.py:1198] (1/2) Epoch 4, batch 2700, loss[loss=0.3313, simple_loss=0.3458, pruned_loss=0.1258, ctc_loss=0.2347, cr_loss=0.46, over 34592.00 frames. ], tot_loss[loss=0.3308, simple_loss=0.3437, pruned_loss=0.1256, ctc_loss=0.2386, cr_loss=0.4718, over 6763769.25 frames. ], batch size: 102, lr: 2.67e-02, grad_scale: 16.0 2024-09-17 00:41:31,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=66868.66666666667, ans=15.0 2024-09-17 00:41:47,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=66915.33333333333, ans=0.0 2024-09-17 00:41:48,663 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.463e+02 3.033e+02 3.723e+02 4.865e+02 8.007e+02, threshold=7.447e+02, percent-clipped=1.0 2024-09-17 00:41:52,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=66915.33333333333, ans=0.2 2024-09-17 00:42:00,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=66962.0, ans=0.025 2024-09-17 00:42:02,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=66962.0, ans=0.0 2024-09-17 00:42:17,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=67008.66666666667, ans=0.2 2024-09-17 00:42:39,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=67055.33333333333, ans=0.1 2024-09-17 00:42:48,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67102.0, ans=0.1 2024-09-17 00:42:49,509 INFO [train.py:1198] (1/2) Epoch 4, batch 2750, loss[loss=0.3121, simple_loss=0.323, pruned_loss=0.1186, ctc_loss=0.2302, cr_loss=0.4454, over 34637.00 frames. ], tot_loss[loss=0.3287, simple_loss=0.3418, pruned_loss=0.1247, ctc_loss=0.2371, cr_loss=0.4694, over 6761770.44 frames. ], batch size: 88, lr: 2.67e-02, grad_scale: 16.0 2024-09-17 00:42:53,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.33 vs. limit=15.0 2024-09-17 00:43:45,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=67242.0, ans=0.025 2024-09-17 00:44:04,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=67288.66666666667, ans=0.0 2024-09-17 00:44:07,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=67288.66666666667, ans=0.125 2024-09-17 00:44:09,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=67288.66666666667, ans=0.1 2024-09-17 00:44:11,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=67288.66666666667, ans=0.125 2024-09-17 00:44:13,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=67335.33333333333, ans=15.0 2024-09-17 00:44:13,964 INFO [train.py:1198] (1/2) Epoch 4, batch 2800, loss[loss=0.4104, simple_loss=0.3874, pruned_loss=0.1748, ctc_loss=0.321, cr_loss=0.4929, over 23369.00 frames. ], tot_loss[loss=0.3293, simple_loss=0.3421, pruned_loss=0.1251, ctc_loss=0.2377, cr_loss=0.4696, over 6738141.05 frames. ], batch size: 244, lr: 2.66e-02, grad_scale: 32.0 2024-09-17 00:44:16,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=67335.33333333333, ans=0.125 2024-09-17 00:44:32,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=67382.0, ans=0.2 2024-09-17 00:44:36,802 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.407e+02 3.419e+02 3.901e+02 4.955e+02 1.016e+03, threshold=7.802e+02, percent-clipped=4.0 2024-09-17 00:44:45,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.71 vs. limit=15.0 2024-09-17 00:45:00,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=67428.66666666667, ans=0.0 2024-09-17 00:45:00,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=67428.66666666667, ans=0.0 2024-09-17 00:45:02,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.01 vs. limit=22.5 2024-09-17 00:45:14,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.98 vs. limit=15.0 2024-09-17 00:45:16,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=67475.33333333333, ans=0.0 2024-09-17 00:45:20,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=67522.0, ans=0.125 2024-09-17 00:45:31,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=67522.0, ans=0.125 2024-09-17 00:45:35,827 INFO [train.py:1198] (1/2) Epoch 4, batch 2850, loss[loss=0.3071, simple_loss=0.3229, pruned_loss=0.1148, ctc_loss=0.2163, cr_loss=0.4581, over 34478.00 frames. ], tot_loss[loss=0.3298, simple_loss=0.3424, pruned_loss=0.1254, ctc_loss=0.2382, cr_loss=0.4695, over 6722088.32 frames. ], batch size: 90, lr: 2.66e-02, grad_scale: 32.0 2024-09-17 00:46:13,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67662.0, ans=0.1 2024-09-17 00:46:46,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=67755.33333333333, ans=0.125 2024-09-17 00:46:59,091 INFO [train.py:1198] (1/2) Epoch 4, batch 2900, loss[loss=0.3148, simple_loss=0.3366, pruned_loss=0.1146, ctc_loss=0.2233, cr_loss=0.4769, over 34509.00 frames. ], tot_loss[loss=0.3308, simple_loss=0.3436, pruned_loss=0.1257, ctc_loss=0.2386, cr_loss=0.472, over 6753029.33 frames. ], batch size: 94, lr: 2.66e-02, grad_scale: 32.0 2024-09-17 00:47:23,796 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.272e+02 2.839e+02 3.392e+02 4.181e+02 6.413e+02, threshold=6.783e+02, percent-clipped=0.0 2024-09-17 00:47:31,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=67848.66666666667, ans=0.1 2024-09-17 00:47:47,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=67895.33333333333, ans=0.0 2024-09-17 00:47:47,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=67895.33333333333, ans=0.0 2024-09-17 00:47:50,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=67942.0, ans=0.2 2024-09-17 00:48:20,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=67988.66666666667, ans=0.1 2024-09-17 00:48:23,096 INFO [train.py:1198] (1/2) Epoch 4, batch 2950, loss[loss=0.3125, simple_loss=0.3252, pruned_loss=0.1191, ctc_loss=0.2199, cr_loss=0.4446, over 34647.00 frames. ], tot_loss[loss=0.3284, simple_loss=0.3415, pruned_loss=0.1246, ctc_loss=0.2367, cr_loss=0.4693, over 6747409.87 frames. ], batch size: 88, lr: 2.65e-02, grad_scale: 16.0 2024-09-17 00:48:24,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=68035.33333333333, ans=0.0 2024-09-17 00:48:36,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=68035.33333333333, ans=0.125 2024-09-17 00:48:40,676 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.66 vs. limit=10.0 2024-09-17 00:48:41,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=68082.0, ans=0.125 2024-09-17 00:48:55,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=68128.66666666667, ans=0.125 2024-09-17 00:48:58,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2024-09-17 00:49:14,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=68175.33333333333, ans=0.09899494936611666 2024-09-17 00:49:24,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.27 vs. limit=15.0 2024-09-17 00:49:46,923 INFO [train.py:1198] (1/2) Epoch 4, batch 3000, loss[loss=0.3107, simple_loss=0.3316, pruned_loss=0.1133, ctc_loss=0.2204, cr_loss=0.4752, over 34556.00 frames. ], tot_loss[loss=0.3271, simple_loss=0.3406, pruned_loss=0.1239, ctc_loss=0.2358, cr_loss=0.4683, over 6748100.89 frames. ], batch size: 94, lr: 2.65e-02, grad_scale: 16.0 2024-09-17 00:49:46,923 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 00:49:51,202 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.7259, 1.9754, 2.4274, 2.2695, 2.3515, 1.9633, 2.1752, 2.1136], device='cuda:1') 2024-09-17 00:50:03,654 INFO [train.py:1230] (1/2) Epoch 4, validation: loss=0.1849, simple_loss=0.2806, pruned_loss=0.03699, ctc_loss=0.07615, cr_loss=1.463e-14, over 944034.00 frames. 2024-09-17 00:50:03,654 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 00:50:28,325 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.195e+02 2.916e+02 3.568e+02 4.544e+02 1.430e+03, threshold=7.136e+02, percent-clipped=7.0 2024-09-17 00:50:28,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=68315.33333333333, ans=0.125 2024-09-17 00:50:44,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68362.0, ans=0.1 2024-09-17 00:51:13,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=68455.33333333333, ans=0.2 2024-09-17 00:51:24,753 INFO [train.py:1198] (1/2) Epoch 4, batch 3050, loss[loss=0.3094, simple_loss=0.3254, pruned_loss=0.1162, ctc_loss=0.2178, cr_loss=0.4371, over 34581.00 frames. ], tot_loss[loss=0.3281, simple_loss=0.3415, pruned_loss=0.1243, ctc_loss=0.2364, cr_loss=0.4687, over 6741469.07 frames. ], batch size: 89, lr: 2.64e-02, grad_scale: 16.0 2024-09-17 00:51:33,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=68502.0, ans=0.1 2024-09-17 00:51:41,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=68548.66666666667, ans=0.125 2024-09-17 00:51:43,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=68548.66666666667, ans=0.0 2024-09-17 00:51:48,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=68548.66666666667, ans=0.0 2024-09-17 00:52:01,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=68595.33333333333, ans=0.1 2024-09-17 00:52:07,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=68595.33333333333, ans=0.125 2024-09-17 00:52:14,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=68642.0, ans=0.0 2024-09-17 00:52:20,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=68642.0, ans=0.05 2024-09-17 00:52:26,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=68642.0, ans=0.125 2024-09-17 00:52:26,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=68642.0, ans=0.125 2024-09-17 00:52:28,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=68642.0, ans=0.025 2024-09-17 00:52:41,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=68688.66666666667, ans=0.0 2024-09-17 00:52:41,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=68688.66666666667, ans=0.125 2024-09-17 00:52:47,534 INFO [train.py:1198] (1/2) Epoch 4, batch 3100, loss[loss=0.3523, simple_loss=0.3648, pruned_loss=0.1346, ctc_loss=0.2557, cr_loss=0.4862, over 34211.00 frames. ], tot_loss[loss=0.3278, simple_loss=0.3413, pruned_loss=0.1242, ctc_loss=0.2361, cr_loss=0.4685, over 6740405.82 frames. ], batch size: 117, lr: 2.64e-02, grad_scale: 16.0 2024-09-17 00:52:47,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=68735.33333333333, ans=0.1 2024-09-17 00:53:07,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=68782.0, ans=0.125 2024-09-17 00:53:12,515 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.374e+02 2.973e+02 3.730e+02 4.462e+02 1.042e+03, threshold=7.460e+02, percent-clipped=2.0 2024-09-17 00:53:27,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=68828.66666666667, ans=0.05 2024-09-17 00:53:45,128 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:53:56,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=68922.0, ans=0.1 2024-09-17 00:54:02,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=68922.0, ans=0.025 2024-09-17 00:54:06,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=68922.0, ans=0.125 2024-09-17 00:54:09,212 INFO [train.py:1198] (1/2) Epoch 4, batch 3150, loss[loss=0.3465, simple_loss=0.3621, pruned_loss=0.1309, ctc_loss=0.2506, cr_loss=0.4705, over 33914.00 frames. ], tot_loss[loss=0.3281, simple_loss=0.3414, pruned_loss=0.1244, ctc_loss=0.2364, cr_loss=0.4694, over 6745799.52 frames. ], batch size: 122, lr: 2.64e-02, grad_scale: 16.0 2024-09-17 00:54:36,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=69015.33333333333, ans=0.125 2024-09-17 00:55:07,779 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:55:13,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.46 vs. limit=22.5 2024-09-17 00:55:28,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=69155.33333333333, ans=0.125 2024-09-17 00:55:29,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=69202.0, ans=0.2 2024-09-17 00:55:31,237 INFO [train.py:1198] (1/2) Epoch 4, batch 3200, loss[loss=0.324, simple_loss=0.3333, pruned_loss=0.1236, ctc_loss=0.243, cr_loss=0.473, over 34526.00 frames. ], tot_loss[loss=0.3275, simple_loss=0.3409, pruned_loss=0.1241, ctc_loss=0.236, cr_loss=0.4693, over 6758788.58 frames. ], batch size: 94, lr: 2.63e-02, grad_scale: 32.0 2024-09-17 00:55:33,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=69202.0, ans=0.95 2024-09-17 00:55:42,147 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.65 vs. limit=10.0 2024-09-17 00:55:46,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.31 vs. limit=15.0 2024-09-17 00:55:55,373 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.337e+02 2.920e+02 3.450e+02 4.226e+02 6.704e+02, threshold=6.899e+02, percent-clipped=0.0 2024-09-17 00:55:55,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=69248.66666666667, ans=0.125 2024-09-17 00:55:55,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=69248.66666666667, ans=0.125 2024-09-17 00:56:08,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=69295.33333333333, ans=0.1 2024-09-17 00:56:19,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=69342.0, ans=0.0 2024-09-17 00:56:43,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=69388.66666666667, ans=0.0 2024-09-17 00:56:51,591 INFO [train.py:1198] (1/2) Epoch 4, batch 3250, loss[loss=0.3553, simple_loss=0.3644, pruned_loss=0.1369, ctc_loss=0.2616, cr_loss=0.5011, over 34638.00 frames. ], tot_loss[loss=0.3276, simple_loss=0.3412, pruned_loss=0.124, ctc_loss=0.236, cr_loss=0.4705, over 6769193.70 frames. ], batch size: 98, lr: 2.63e-02, grad_scale: 32.0 2024-09-17 00:57:06,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=69482.0, ans=0.2 2024-09-17 00:57:06,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=69482.0, ans=0.0 2024-09-17 00:57:14,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=69482.0, ans=0.125 2024-09-17 00:57:14,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=69482.0, ans=0.125 2024-09-17 00:57:19,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=69482.0, ans=0.125 2024-09-17 00:57:28,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=69528.66666666667, ans=0.025 2024-09-17 00:57:41,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=69575.33333333333, ans=0.125 2024-09-17 00:57:42,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.09 vs. limit=15.0 2024-09-17 00:57:51,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=69575.33333333333, ans=0.0 2024-09-17 00:57:54,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=69575.33333333333, ans=0.2 2024-09-17 00:58:13,607 INFO [train.py:1198] (1/2) Epoch 4, batch 3300, loss[loss=0.3603, simple_loss=0.3697, pruned_loss=0.139, ctc_loss=0.2654, cr_loss=0.4942, over 32943.00 frames. ], tot_loss[loss=0.3263, simple_loss=0.3399, pruned_loss=0.1234, ctc_loss=0.235, cr_loss=0.469, over 6767909.76 frames. ], batch size: 130, lr: 2.63e-02, grad_scale: 16.0 2024-09-17 00:58:14,047 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 00:58:14,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=69668.66666666667, ans=0.125 2024-09-17 00:58:39,246 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.232e+02 2.964e+02 3.537e+02 4.406e+02 8.484e+02, threshold=7.075e+02, percent-clipped=3.0 2024-09-17 00:58:45,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.86 vs. limit=15.0 2024-09-17 00:59:16,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.29 vs. limit=15.0 2024-09-17 00:59:27,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=69855.33333333333, ans=0.05 2024-09-17 00:59:29,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten.whitening_limit, batch_count=69855.33333333333, ans=15.0 2024-09-17 00:59:32,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=69902.0, ans=0.125 2024-09-17 00:59:33,471 INFO [train.py:1198] (1/2) Epoch 4, batch 3350, loss[loss=0.3564, simple_loss=0.3647, pruned_loss=0.1378, ctc_loss=0.2576, cr_loss=0.522, over 33845.00 frames. ], tot_loss[loss=0.3274, simple_loss=0.3408, pruned_loss=0.124, ctc_loss=0.236, cr_loss=0.4699, over 6744159.50 frames. ], batch size: 122, lr: 2.62e-02, grad_scale: 16.0 2024-09-17 00:59:46,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=69902.0, ans=0.0 2024-09-17 00:59:46,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=69902.0, ans=0.125 2024-09-17 01:00:11,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.24 vs. limit=12.0 2024-09-17 01:00:18,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=69995.33333333333, ans=0.2 2024-09-17 01:00:26,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=70042.0, ans=0.125 2024-09-17 01:00:36,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.55 vs. limit=15.0 2024-09-17 01:00:54,973 INFO [train.py:1198] (1/2) Epoch 4, batch 3400, loss[loss=0.2747, simple_loss=0.2949, pruned_loss=0.09971, ctc_loss=0.1945, cr_loss=0.4021, over 34136.00 frames. ], tot_loss[loss=0.3273, simple_loss=0.3406, pruned_loss=0.124, ctc_loss=0.236, cr_loss=0.4691, over 6733639.89 frames. ], batch size: 78, lr: 2.62e-02, grad_scale: 16.0 2024-09-17 01:00:55,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.16 vs. limit=22.5 2024-09-17 01:01:04,789 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:01:09,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=70182.0, ans=0.0 2024-09-17 01:01:14,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=70182.0, ans=0.2 2024-09-17 01:01:20,349 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 3.086e+02 3.732e+02 4.553e+02 9.433e+02, threshold=7.465e+02, percent-clipped=3.0 2024-09-17 01:01:28,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70228.66666666667, ans=0.1 2024-09-17 01:01:40,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=70228.66666666667, ans=0.125 2024-09-17 01:01:53,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.94 vs. limit=22.5 2024-09-17 01:02:10,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=70322.0, ans=0.0 2024-09-17 01:02:14,664 INFO [train.py:1198] (1/2) Epoch 4, batch 3450, loss[loss=0.3405, simple_loss=0.3565, pruned_loss=0.1282, ctc_loss=0.246, cr_loss=0.4742, over 33149.00 frames. ], tot_loss[loss=0.3267, simple_loss=0.3404, pruned_loss=0.1236, ctc_loss=0.2353, cr_loss=0.469, over 6745911.40 frames. ], batch size: 130, lr: 2.62e-02, grad_scale: 16.0 2024-09-17 01:02:26,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=70368.66666666667, ans=0.125 2024-09-17 01:02:52,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=70462.0, ans=0.125 2024-09-17 01:03:01,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=70462.0, ans=0.1 2024-09-17 01:03:06,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.92 vs. limit=10.0 2024-09-17 01:03:20,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2024-09-17 01:03:36,053 INFO [train.py:1198] (1/2) Epoch 4, batch 3500, loss[loss=0.2848, simple_loss=0.3087, pruned_loss=0.1019, ctc_loss=0.2002, cr_loss=0.4281, over 34501.00 frames. ], tot_loss[loss=0.3252, simple_loss=0.3393, pruned_loss=0.1228, ctc_loss=0.2339, cr_loss=0.4678, over 6746401.64 frames. ], batch size: 85, lr: 2.61e-02, grad_scale: 16.0 2024-09-17 01:03:52,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70648.66666666667, ans=0.1 2024-09-17 01:04:01,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.412e+02 2.950e+02 3.353e+02 4.376e+02 7.021e+02, threshold=6.706e+02, percent-clipped=0.0 2024-09-17 01:04:01,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=70648.66666666667, ans=0.125 2024-09-17 01:04:06,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=70695.33333333333, ans=0.125 2024-09-17 01:04:14,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=70695.33333333333, ans=0.125 2024-09-17 01:04:14,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=70695.33333333333, ans=0.125 2024-09-17 01:04:16,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=70695.33333333333, ans=0.125 2024-09-17 01:04:18,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.24 vs. limit=22.5 2024-09-17 01:04:22,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=70742.0, ans=0.125 2024-09-17 01:04:31,072 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.82 vs. limit=22.5 2024-09-17 01:04:31,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.10 vs. limit=10.0 2024-09-17 01:04:40,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70788.66666666667, ans=0.1 2024-09-17 01:04:56,324 INFO [train.py:1198] (1/2) Epoch 4, batch 3550, loss[loss=0.3482, simple_loss=0.3625, pruned_loss=0.1326, ctc_loss=0.246, cr_loss=0.4845, over 34358.00 frames. ], tot_loss[loss=0.3253, simple_loss=0.3395, pruned_loss=0.1228, ctc_loss=0.2336, cr_loss=0.468, over 6756160.85 frames. ], batch size: 103, lr: 2.61e-02, grad_scale: 16.0 2024-09-17 01:04:59,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.76 vs. limit=10.0 2024-09-17 01:05:06,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=70835.33333333333, ans=0.0 2024-09-17 01:05:22,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2024-09-17 01:05:36,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=70928.66666666667, ans=0.125 2024-09-17 01:05:38,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=70928.66666666667, ans=0.0 2024-09-17 01:06:06,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.78 vs. limit=12.0 2024-09-17 01:06:16,598 INFO [train.py:1198] (1/2) Epoch 4, batch 3600, loss[loss=0.3098, simple_loss=0.3255, pruned_loss=0.1167, ctc_loss=0.22, cr_loss=0.4179, over 34453.00 frames. ], tot_loss[loss=0.3254, simple_loss=0.3396, pruned_loss=0.1229, ctc_loss=0.2335, cr_loss=0.4684, over 6765460.66 frames. ], batch size: 90, lr: 2.61e-02, grad_scale: 32.0 2024-09-17 01:06:44,430 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.498e+02 2.954e+02 3.473e+02 4.549e+02 1.165e+03, threshold=6.946e+02, percent-clipped=9.0 2024-09-17 01:06:49,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=71162.0, ans=0.125 2024-09-17 01:06:52,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=71162.0, ans=0.07 2024-09-17 01:07:08,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=71208.66666666667, ans=0.0 2024-09-17 01:07:19,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=71255.33333333333, ans=0.95 2024-09-17 01:07:24,563 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2024-09-17 01:07:36,803 INFO [train.py:1198] (1/2) Epoch 4, batch 3650, loss[loss=0.3194, simple_loss=0.344, pruned_loss=0.116, ctc_loss=0.2229, cr_loss=0.4531, over 34441.00 frames. ], tot_loss[loss=0.325, simple_loss=0.3391, pruned_loss=0.1228, ctc_loss=0.2334, cr_loss=0.4686, over 6768226.27 frames. ], batch size: 110, lr: 2.60e-02, grad_scale: 16.0 2024-09-17 01:08:04,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=71348.66666666667, ans=0.0 2024-09-17 01:08:07,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=71395.33333333333, ans=0.035 2024-09-17 01:08:08,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=71395.33333333333, ans=0.0 2024-09-17 01:08:14,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=71395.33333333333, ans=0.05 2024-09-17 01:08:14,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=12.0 2024-09-17 01:08:31,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=71442.0, ans=0.0 2024-09-17 01:08:35,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=71442.0, ans=0.1 2024-09-17 01:08:55,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=71535.33333333333, ans=0.1 2024-09-17 01:08:56,855 INFO [train.py:1198] (1/2) Epoch 4, batch 3700, loss[loss=0.3352, simple_loss=0.3517, pruned_loss=0.1259, ctc_loss=0.2411, cr_loss=0.465, over 34611.00 frames. ], tot_loss[loss=0.3248, simple_loss=0.3391, pruned_loss=0.1225, ctc_loss=0.2333, cr_loss=0.4688, over 6783479.10 frames. ], batch size: 102, lr: 2.60e-02, grad_scale: 16.0 2024-09-17 01:08:58,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=71535.33333333333, ans=0.125 2024-09-17 01:09:19,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=71582.0, ans=0.0 2024-09-17 01:09:22,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=71582.0, ans=0.2 2024-09-17 01:09:24,168 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.323e+02 2.958e+02 3.715e+02 5.213e+02 8.868e+02, threshold=7.430e+02, percent-clipped=9.0 2024-09-17 01:09:42,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=71628.66666666667, ans=0.1 2024-09-17 01:09:45,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=71675.33333333333, ans=0.0 2024-09-17 01:10:12,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=71722.0, ans=0.2 2024-09-17 01:10:18,188 INFO [train.py:1198] (1/2) Epoch 4, batch 3750, loss[loss=0.341, simple_loss=0.3543, pruned_loss=0.1291, ctc_loss=0.2448, cr_loss=0.5161, over 34330.00 frames. ], tot_loss[loss=0.3288, simple_loss=0.3428, pruned_loss=0.1243, ctc_loss=0.2363, cr_loss=0.4736, over 6785456.24 frames. ], batch size: 113, lr: 2.60e-02, grad_scale: 16.0 2024-09-17 01:10:18,921 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2024-09-17 01:10:23,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.02 vs. limit=15.0 2024-09-17 01:10:29,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=71768.66666666667, ans=0.2 2024-09-17 01:10:34,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2024-09-17 01:10:42,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=71815.33333333333, ans=0.2 2024-09-17 01:11:20,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=71908.66666666667, ans=0.125 2024-09-17 01:11:24,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=71955.33333333333, ans=0.04949747468305833 2024-09-17 01:11:38,972 INFO [train.py:1198] (1/2) Epoch 4, batch 3800, loss[loss=0.3499, simple_loss=0.3522, pruned_loss=0.1378, ctc_loss=0.2608, cr_loss=0.4954, over 29719.00 frames. ], tot_loss[loss=0.3341, simple_loss=0.3466, pruned_loss=0.1272, ctc_loss=0.2414, cr_loss=0.4776, over 6675597.63 frames. ], batch size: 175, lr: 2.59e-02, grad_scale: 16.0 2024-09-17 01:12:07,536 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.462e+02 2.882e+02 3.137e+02 3.640e+02 5.937e+02, threshold=6.273e+02, percent-clipped=0.0 2024-09-17 01:12:08,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=72048.66666666667, ans=0.125 2024-09-17 01:12:09,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=72048.66666666667, ans=0.1 2024-09-17 01:12:31,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=72142.0, ans=0.025 2024-09-17 01:12:36,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=72142.0, ans=0.125 2024-09-17 01:12:39,594 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.087e-02 2024-09-17 01:12:43,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.37 vs. limit=10.0 2024-09-17 01:12:46,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=72188.66666666667, ans=0.125 2024-09-17 01:12:54,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=72188.66666666667, ans=0.125 2024-09-17 01:13:02,217 INFO [train.py:1198] (1/2) Epoch 4, batch 3850, loss[loss=0.3841, simple_loss=0.3726, pruned_loss=0.1582, ctc_loss=0.2985, cr_loss=0.4851, over 23338.00 frames. ], tot_loss[loss=0.3427, simple_loss=0.3514, pruned_loss=0.1323, ctc_loss=0.2514, cr_loss=0.4786, over 6252417.73 frames. ], batch size: 244, lr: 2.59e-02, grad_scale: 16.0 2024-09-17 01:13:03,371 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2024-09-17 01:13:27,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=72282.0, ans=0.0 2024-09-17 01:13:32,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=72282.0, ans=0.0 2024-09-17 01:14:30,796 INFO [train.py:1198] (1/2) Epoch 5, batch 0, loss[loss=0.324, simple_loss=0.3322, pruned_loss=0.1249, ctc_loss=0.2353, cr_loss=0.4693, over 34443.00 frames. ], tot_loss[loss=0.324, simple_loss=0.3322, pruned_loss=0.1249, ctc_loss=0.2353, cr_loss=0.4693, over 34443.00 frames. ], batch size: 85, lr: 2.41e-02, grad_scale: 32.0 2024-09-17 01:14:30,797 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 01:14:47,452 INFO [train.py:1230] (1/2) Epoch 5, validation: loss=0.1895, simple_loss=0.2861, pruned_loss=0.03847, ctc_loss=0.08021, cr_loss=1.544e-14, over 944034.00 frames. 2024-09-17 01:14:47,453 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 01:14:59,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.78 vs. limit=15.0 2024-09-17 01:15:00,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=72356.66666666667, ans=0.2 2024-09-17 01:15:07,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=72403.33333333333, ans=0.125 2024-09-17 01:15:31,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=15.0 2024-09-17 01:15:37,145 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.28 vs. limit=22.5 2024-09-17 01:15:40,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=72496.66666666667, ans=0.0 2024-09-17 01:15:59,317 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.622e+02 3.077e+02 3.615e+02 4.305e+02 9.427e+02, threshold=7.230e+02, percent-clipped=8.0 2024-09-17 01:15:59,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=72543.33333333333, ans=0.1 2024-09-17 01:16:11,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.44 vs. limit=15.0 2024-09-17 01:16:12,363 INFO [train.py:1198] (1/2) Epoch 5, batch 50, loss[loss=0.3064, simple_loss=0.32, pruned_loss=0.1146, ctc_loss=0.2243, cr_loss=0.4692, over 34471.00 frames. ], tot_loss[loss=0.3312, simple_loss=0.344, pruned_loss=0.1257, ctc_loss=0.2401, cr_loss=0.4762, over 1480970.97 frames. ], batch size: 82, lr: 2.41e-02, grad_scale: 16.0 2024-09-17 01:16:14,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=72590.0, ans=0.125 2024-09-17 01:16:15,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=72590.0, ans=0.125 2024-09-17 01:16:17,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=72590.0, ans=0.0 2024-09-17 01:16:24,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=72590.0, ans=0.1 2024-09-17 01:16:32,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=72636.66666666667, ans=0.0 2024-09-17 01:16:45,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=72683.33333333333, ans=0.2 2024-09-17 01:16:53,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.80 vs. limit=22.5 2024-09-17 01:17:01,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=15.0 2024-09-17 01:17:02,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=72730.0, ans=0.0 2024-09-17 01:17:30,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=72776.66666666667, ans=0.0 2024-09-17 01:17:34,686 INFO [train.py:1198] (1/2) Epoch 5, batch 100, loss[loss=0.3099, simple_loss=0.3249, pruned_loss=0.1161, ctc_loss=0.221, cr_loss=0.4648, over 34599.00 frames. ], tot_loss[loss=0.3308, simple_loss=0.3444, pruned_loss=0.1252, ctc_loss=0.2386, cr_loss=0.4775, over 2629381.52 frames. ], batch size: 89, lr: 2.40e-02, grad_scale: 16.0 2024-09-17 01:17:36,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=72823.33333333333, ans=0.2 2024-09-17 01:17:51,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=72870.0, ans=0.0 2024-09-17 01:18:20,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=72916.66666666667, ans=0.125 2024-09-17 01:18:20,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=72916.66666666667, ans=0.125 2024-09-17 01:18:28,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=72963.33333333333, ans=0.125 2024-09-17 01:18:28,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=72963.33333333333, ans=0.0 2024-09-17 01:18:41,017 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2024-09-17 01:18:45,064 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.373e+02 2.834e+02 3.545e+02 4.485e+02 1.080e+03, threshold=7.090e+02, percent-clipped=1.0 2024-09-17 01:18:59,778 INFO [train.py:1198] (1/2) Epoch 5, batch 150, loss[loss=0.2943, simple_loss=0.3096, pruned_loss=0.1093, ctc_loss=0.212, cr_loss=0.4517, over 34466.00 frames. ], tot_loss[loss=0.3252, simple_loss=0.34, pruned_loss=0.1224, ctc_loss=0.2335, cr_loss=0.4738, over 3556556.29 frames. ], batch size: 82, lr: 2.40e-02, grad_scale: 16.0 2024-09-17 01:19:55,892 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.16 vs. limit=10.0 2024-09-17 01:20:19,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=73290.0, ans=0.125 2024-09-17 01:20:21,111 INFO [train.py:1198] (1/2) Epoch 5, batch 200, loss[loss=0.3464, simple_loss=0.3575, pruned_loss=0.1335, ctc_loss=0.2462, cr_loss=0.4782, over 31915.00 frames. ], tot_loss[loss=0.3229, simple_loss=0.3378, pruned_loss=0.1214, ctc_loss=0.2319, cr_loss=0.471, over 4270259.35 frames. ], batch size: 146, lr: 2.40e-02, grad_scale: 16.0 2024-09-17 01:20:26,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.00 vs. limit=22.5 2024-09-17 01:21:00,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=73383.33333333333, ans=0.0 2024-09-17 01:21:28,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=73476.66666666667, ans=0.0 2024-09-17 01:21:29,333 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.349e+02 2.893e+02 3.331e+02 4.122e+02 7.671e+02, threshold=6.662e+02, percent-clipped=1.0 2024-09-17 01:21:42,548 INFO [train.py:1198] (1/2) Epoch 5, batch 250, loss[loss=0.3337, simple_loss=0.3517, pruned_loss=0.1246, ctc_loss=0.2385, cr_loss=0.4729, over 34283.00 frames. ], tot_loss[loss=0.3222, simple_loss=0.3374, pruned_loss=0.121, ctc_loss=0.2313, cr_loss=0.4701, over 4832926.50 frames. ], batch size: 117, lr: 2.39e-02, grad_scale: 16.0 2024-09-17 01:22:02,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73570.0, ans=0.1 2024-09-17 01:22:44,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=73663.33333333333, ans=0.125 2024-09-17 01:22:55,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=73710.0, ans=0.125 2024-09-17 01:23:08,249 INFO [train.py:1198] (1/2) Epoch 5, batch 300, loss[loss=0.3642, simple_loss=0.373, pruned_loss=0.1411, ctc_loss=0.2613, cr_loss=0.5209, over 34374.00 frames. ], tot_loss[loss=0.3216, simple_loss=0.3368, pruned_loss=0.1207, ctc_loss=0.2308, cr_loss=0.4687, over 5260889.82 frames. ], batch size: 107, lr: 2.39e-02, grad_scale: 16.0 2024-09-17 01:23:11,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=73756.66666666667, ans=0.125 2024-09-17 01:23:23,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=73803.33333333333, ans=0.5 2024-09-17 01:23:26,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=73803.33333333333, ans=0.125 2024-09-17 01:23:33,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=73803.33333333333, ans=0.2 2024-09-17 01:23:34,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=73803.33333333333, ans=0.0 2024-09-17 01:23:52,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=73850.0, ans=0.125 2024-09-17 01:24:10,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=73896.66666666667, ans=0.1 2024-09-17 01:24:11,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-09-17 01:24:15,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73943.33333333333, ans=0.1 2024-09-17 01:24:16,701 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.305e+02 3.184e+02 4.361e+02 5.700e+02 9.507e+02, threshold=8.723e+02, percent-clipped=15.0 2024-09-17 01:24:29,860 INFO [train.py:1198] (1/2) Epoch 5, batch 350, loss[loss=0.29, simple_loss=0.312, pruned_loss=0.1054, ctc_loss=0.2051, cr_loss=0.4064, over 34313.00 frames. ], tot_loss[loss=0.3215, simple_loss=0.3372, pruned_loss=0.1205, ctc_loss=0.2302, cr_loss=0.4687, over 5596035.63 frames. ], batch size: 83, lr: 2.39e-02, grad_scale: 16.0 2024-09-17 01:24:38,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=73990.0, ans=0.1 2024-09-17 01:24:45,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.81 vs. limit=22.5 2024-09-17 01:24:47,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=74036.66666666667, ans=0.0 2024-09-17 01:25:17,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=74130.0, ans=0.0 2024-09-17 01:25:31,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=74130.0, ans=0.0 2024-09-17 01:25:51,091 INFO [train.py:1198] (1/2) Epoch 5, batch 400, loss[loss=0.3368, simple_loss=0.3517, pruned_loss=0.1273, ctc_loss=0.2448, cr_loss=0.463, over 34425.00 frames. ], tot_loss[loss=0.3199, simple_loss=0.3358, pruned_loss=0.1198, ctc_loss=0.2289, cr_loss=0.4671, over 5863867.73 frames. ], batch size: 95, lr: 2.38e-02, grad_scale: 32.0 2024-09-17 01:26:04,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=74223.33333333333, ans=0.1 2024-09-17 01:26:09,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.03 vs. limit=15.0 2024-09-17 01:26:38,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=74316.66666666667, ans=0.125 2024-09-17 01:26:49,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=74363.33333333333, ans=0.125 2024-09-17 01:27:01,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=74410.0, ans=0.0 2024-09-17 01:27:04,696 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.294e+02 2.816e+02 3.595e+02 4.417e+02 7.887e+02, threshold=7.190e+02, percent-clipped=0.0 2024-09-17 01:27:15,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=74410.0, ans=0.0 2024-09-17 01:27:17,772 INFO [train.py:1198] (1/2) Epoch 5, batch 450, loss[loss=0.3259, simple_loss=0.3463, pruned_loss=0.1196, ctc_loss=0.228, cr_loss=0.5135, over 34697.00 frames. ], tot_loss[loss=0.3204, simple_loss=0.3362, pruned_loss=0.12, ctc_loss=0.2293, cr_loss=0.4683, over 6053948.67 frames. ], batch size: 97, lr: 2.38e-02, grad_scale: 32.0 2024-09-17 01:27:18,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=74456.66666666667, ans=0.0 2024-09-17 01:27:23,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.91 vs. limit=10.0 2024-09-17 01:27:39,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=74503.33333333333, ans=0.09899494936611666 2024-09-17 01:27:44,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=74503.33333333333, ans=0.0 2024-09-17 01:27:50,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=74550.0, ans=0.0 2024-09-17 01:27:52,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=74550.0, ans=0.0 2024-09-17 01:27:58,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=74550.0, ans=0.5 2024-09-17 01:28:30,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=74643.33333333333, ans=0.0 2024-09-17 01:28:39,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=74643.33333333333, ans=0.0 2024-09-17 01:28:45,673 INFO [train.py:1198] (1/2) Epoch 5, batch 500, loss[loss=0.3219, simple_loss=0.3468, pruned_loss=0.1168, ctc_loss=0.2259, cr_loss=0.4526, over 34481.00 frames. ], tot_loss[loss=0.3181, simple_loss=0.3344, pruned_loss=0.1188, ctc_loss=0.2272, cr_loss=0.4655, over 6220423.82 frames. ], batch size: 110, lr: 2.38e-02, grad_scale: 32.0 2024-09-17 01:28:46,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=74690.0, ans=0.125 2024-09-17 01:28:55,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=74690.0, ans=0.1 2024-09-17 01:29:02,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=74736.66666666667, ans=0.125 2024-09-17 01:29:16,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=74783.33333333333, ans=0.125 2024-09-17 01:29:51,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=74876.66666666667, ans=0.125 2024-09-17 01:29:54,441 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.253e+02 3.009e+02 3.595e+02 4.734e+02 8.343e+02, threshold=7.191e+02, percent-clipped=1.0 2024-09-17 01:30:11,512 INFO [train.py:1198] (1/2) Epoch 5, batch 550, loss[loss=0.3368, simple_loss=0.3525, pruned_loss=0.126, ctc_loss=0.2396, cr_loss=0.5311, over 33895.00 frames. ], tot_loss[loss=0.3187, simple_loss=0.3349, pruned_loss=0.1191, ctc_loss=0.2278, cr_loss=0.4664, over 6329707.10 frames. ], batch size: 122, lr: 2.38e-02, grad_scale: 32.0 2024-09-17 01:30:30,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.86 vs. limit=15.0 2024-09-17 01:30:58,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=75063.33333333333, ans=0.1 2024-09-17 01:31:22,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=75110.0, ans=0.125 2024-09-17 01:31:33,238 INFO [train.py:1198] (1/2) Epoch 5, batch 600, loss[loss=0.3368, simple_loss=0.355, pruned_loss=0.1258, ctc_loss=0.2409, cr_loss=0.4711, over 34171.00 frames. ], tot_loss[loss=0.3183, simple_loss=0.3348, pruned_loss=0.1189, ctc_loss=0.2272, cr_loss=0.4666, over 6431114.58 frames. ], batch size: 117, lr: 2.37e-02, grad_scale: 32.0 2024-09-17 01:31:35,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.27 vs. limit=15.0 2024-09-17 01:31:41,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=75156.66666666667, ans=0.2 2024-09-17 01:32:02,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=75203.33333333333, ans=0.125 2024-09-17 01:32:18,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=75250.0, ans=0.125 2024-09-17 01:32:31,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=75296.66666666667, ans=0.2 2024-09-17 01:32:33,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=75296.66666666667, ans=0.125 2024-09-17 01:32:41,250 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.319e+02 3.009e+02 3.579e+02 5.005e+02 9.157e+02, threshold=7.157e+02, percent-clipped=4.0 2024-09-17 01:32:43,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=75343.33333333333, ans=0.0 2024-09-17 01:32:54,260 INFO [train.py:1198] (1/2) Epoch 5, batch 650, loss[loss=0.3192, simple_loss=0.3335, pruned_loss=0.1206, ctc_loss=0.2248, cr_loss=0.4728, over 34531.00 frames. ], tot_loss[loss=0.3174, simple_loss=0.3341, pruned_loss=0.1184, ctc_loss=0.2263, cr_loss=0.4655, over 6522949.42 frames. ], batch size: 94, lr: 2.37e-02, grad_scale: 32.0 2024-09-17 01:32:56,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=75390.0, ans=0.125 2024-09-17 01:32:57,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=75390.0, ans=0.0 2024-09-17 01:33:02,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=75390.0, ans=0.125 2024-09-17 01:33:24,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=75436.66666666667, ans=0.125 2024-09-17 01:33:30,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.62 vs. limit=12.0 2024-09-17 01:33:49,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.44 vs. limit=22.5 2024-09-17 01:34:15,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=75576.66666666667, ans=0.125 2024-09-17 01:34:20,303 INFO [train.py:1198] (1/2) Epoch 5, batch 700, loss[loss=0.2894, simple_loss=0.312, pruned_loss=0.1048, ctc_loss=0.1988, cr_loss=0.4392, over 34568.00 frames. ], tot_loss[loss=0.3178, simple_loss=0.3346, pruned_loss=0.1185, ctc_loss=0.2267, cr_loss=0.4656, over 6579143.68 frames. ], batch size: 89, lr: 2.37e-02, grad_scale: 32.0 2024-09-17 01:34:32,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-09-17 01:34:33,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=75623.33333333333, ans=0.1 2024-09-17 01:34:46,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.21 vs. limit=15.0 2024-09-17 01:34:50,801 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2024-09-17 01:34:51,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=75716.66666666667, ans=0.0 2024-09-17 01:34:57,407 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.88 vs. limit=10.0 2024-09-17 01:35:04,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=75716.66666666667, ans=0.0 2024-09-17 01:35:08,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=75763.33333333333, ans=0.0 2024-09-17 01:35:24,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=75810.0, ans=0.5 2024-09-17 01:35:29,126 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.344e+02 3.023e+02 3.997e+02 5.652e+02 1.271e+03, threshold=7.994e+02, percent-clipped=11.0 2024-09-17 01:35:42,113 INFO [train.py:1198] (1/2) Epoch 5, batch 750, loss[loss=0.3135, simple_loss=0.3344, pruned_loss=0.1142, ctc_loss=0.2245, cr_loss=0.4843, over 34413.00 frames. ], tot_loss[loss=0.3167, simple_loss=0.3339, pruned_loss=0.1179, ctc_loss=0.2256, cr_loss=0.4658, over 6623969.72 frames. ], batch size: 95, lr: 2.36e-02, grad_scale: 32.0 2024-09-17 01:36:08,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=75903.33333333333, ans=0.025 2024-09-17 01:36:11,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=75903.33333333333, ans=0.125 2024-09-17 01:36:14,056 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.23 vs. limit=15.0 2024-09-17 01:36:44,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=75996.66666666667, ans=0.125 2024-09-17 01:37:01,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.40 vs. limit=15.0 2024-09-17 01:37:03,699 INFO [train.py:1198] (1/2) Epoch 5, batch 800, loss[loss=0.2993, simple_loss=0.3183, pruned_loss=0.109, ctc_loss=0.2144, cr_loss=0.4878, over 34478.00 frames. ], tot_loss[loss=0.3169, simple_loss=0.3341, pruned_loss=0.118, ctc_loss=0.2257, cr_loss=0.466, over 6659663.65 frames. ], batch size: 85, lr: 2.36e-02, grad_scale: 32.0 2024-09-17 01:37:05,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=76090.0, ans=0.05 2024-09-17 01:37:30,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.55 vs. limit=15.0 2024-09-17 01:37:39,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=15.0 2024-09-17 01:38:16,246 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.326e+02 2.836e+02 3.371e+02 4.196e+02 6.324e+02, threshold=6.743e+02, percent-clipped=0.0 2024-09-17 01:38:20,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.59 vs. limit=10.0 2024-09-17 01:38:23,916 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.92 vs. limit=15.0 2024-09-17 01:38:29,062 INFO [train.py:1198] (1/2) Epoch 5, batch 850, loss[loss=0.3418, simple_loss=0.3602, pruned_loss=0.1272, ctc_loss=0.2427, cr_loss=0.515, over 34397.00 frames. ], tot_loss[loss=0.3158, simple_loss=0.3334, pruned_loss=0.1174, ctc_loss=0.2246, cr_loss=0.4646, over 6692580.07 frames. ], batch size: 103, lr: 2.36e-02, grad_scale: 32.0 2024-09-17 01:38:45,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=76370.0, ans=0.0 2024-09-17 01:39:20,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.62 vs. limit=15.0 2024-09-17 01:39:42,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.58 vs. limit=6.0 2024-09-17 01:39:45,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2024-09-17 01:39:51,289 INFO [train.py:1198] (1/2) Epoch 5, batch 900, loss[loss=0.2897, simple_loss=0.3107, pruned_loss=0.1052, ctc_loss=0.2059, cr_loss=0.4314, over 34443.00 frames. ], tot_loss[loss=0.3171, simple_loss=0.3342, pruned_loss=0.1181, ctc_loss=0.2257, cr_loss=0.4657, over 6698777.45 frames. ], batch size: 85, lr: 2.35e-02, grad_scale: 32.0 2024-09-17 01:39:58,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=76556.66666666667, ans=0.0 2024-09-17 01:40:09,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=76603.33333333333, ans=0.125 2024-09-17 01:40:09,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=76603.33333333333, ans=0.2 2024-09-17 01:40:17,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=76603.33333333333, ans=0.1 2024-09-17 01:40:42,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=76696.66666666667, ans=0.05 2024-09-17 01:41:01,328 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.419e+02 3.197e+02 3.923e+02 5.503e+02 1.117e+03, threshold=7.846e+02, percent-clipped=9.0 2024-09-17 01:41:10,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.53 vs. limit=15.0 2024-09-17 01:41:14,468 INFO [train.py:1198] (1/2) Epoch 5, batch 950, loss[loss=0.2977, simple_loss=0.3178, pruned_loss=0.109, ctc_loss=0.2079, cr_loss=0.4546, over 34683.00 frames. ], tot_loss[loss=0.3175, simple_loss=0.3346, pruned_loss=0.1183, ctc_loss=0.226, cr_loss=0.4659, over 6702417.91 frames. ], batch size: 87, lr: 2.35e-02, grad_scale: 16.0 2024-09-17 01:42:07,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=76930.0, ans=15.0 2024-09-17 01:42:37,470 INFO [train.py:1198] (1/2) Epoch 5, batch 1000, loss[loss=0.2956, simple_loss=0.3143, pruned_loss=0.1085, ctc_loss=0.2071, cr_loss=0.4656, over 34506.00 frames. ], tot_loss[loss=0.319, simple_loss=0.3356, pruned_loss=0.1191, ctc_loss=0.2275, cr_loss=0.4673, over 6693564.82 frames. ], batch size: 90, lr: 2.35e-02, grad_scale: 16.0 2024-09-17 01:43:06,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2024-09-17 01:43:14,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=77116.66666666667, ans=0.125 2024-09-17 01:43:30,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=77163.33333333333, ans=0.2 2024-09-17 01:43:40,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=77163.33333333333, ans=0.0 2024-09-17 01:43:44,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=77210.0, ans=0.125 2024-09-17 01:43:47,612 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 3.115e+02 3.923e+02 5.130e+02 1.145e+03, threshold=7.847e+02, percent-clipped=2.0 2024-09-17 01:43:47,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=77210.0, ans=0.0 2024-09-17 01:43:56,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=77210.0, ans=0.0 2024-09-17 01:43:58,954 INFO [train.py:1198] (1/2) Epoch 5, batch 1050, loss[loss=0.3259, simple_loss=0.3431, pruned_loss=0.1218, ctc_loss=0.2335, cr_loss=0.458, over 34549.00 frames. ], tot_loss[loss=0.3178, simple_loss=0.3346, pruned_loss=0.1186, ctc_loss=0.2265, cr_loss=0.465, over 6702469.33 frames. ], batch size: 99, lr: 2.35e-02, grad_scale: 16.0 2024-09-17 01:44:17,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=77303.33333333333, ans=0.125 2024-09-17 01:44:40,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=77350.0, ans=0.05 2024-09-17 01:44:46,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=77396.66666666667, ans=0.125 2024-09-17 01:44:54,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=77396.66666666667, ans=0.0 2024-09-17 01:45:02,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=77396.66666666667, ans=0.0 2024-09-17 01:45:03,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.86 vs. limit=22.5 2024-09-17 01:45:19,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=77443.33333333333, ans=0.2 2024-09-17 01:45:24,192 INFO [train.py:1198] (1/2) Epoch 5, batch 1100, loss[loss=0.3113, simple_loss=0.3255, pruned_loss=0.1171, ctc_loss=0.223, cr_loss=0.4541, over 34357.00 frames. ], tot_loss[loss=0.3169, simple_loss=0.3339, pruned_loss=0.1181, ctc_loss=0.2257, cr_loss=0.4646, over 6716351.92 frames. ], batch size: 91, lr: 2.34e-02, grad_scale: 16.0 2024-09-17 01:45:32,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=77490.0, ans=0.025 2024-09-17 01:45:35,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=77490.0, ans=0.125 2024-09-17 01:45:37,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=77490.0, ans=0.0 2024-09-17 01:45:48,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=77536.66666666667, ans=15.0 2024-09-17 01:46:20,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=77630.0, ans=0.0 2024-09-17 01:46:34,280 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.356e+02 3.098e+02 3.668e+02 4.597e+02 1.018e+03, threshold=7.336e+02, percent-clipped=1.0 2024-09-17 01:46:39,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=77676.66666666667, ans=0.07 2024-09-17 01:46:45,905 INFO [train.py:1198] (1/2) Epoch 5, batch 1150, loss[loss=0.3026, simple_loss=0.3238, pruned_loss=0.1098, ctc_loss=0.2136, cr_loss=0.4784, over 34336.00 frames. ], tot_loss[loss=0.3167, simple_loss=0.3337, pruned_loss=0.118, ctc_loss=0.2256, cr_loss=0.4643, over 6715689.83 frames. ], batch size: 91, lr: 2.34e-02, grad_scale: 16.0 2024-09-17 01:46:48,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.42 vs. limit=22.5 2024-09-17 01:46:51,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=77723.33333333333, ans=0.05 2024-09-17 01:46:56,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=77723.33333333333, ans=0.025 2024-09-17 01:47:34,217 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 01:47:35,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=77863.33333333333, ans=0.1 2024-09-17 01:47:36,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.07 vs. limit=15.0 2024-09-17 01:47:42,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=77863.33333333333, ans=0.125 2024-09-17 01:47:47,781 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.83 vs. limit=22.5 2024-09-17 01:48:03,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=77910.0, ans=0.125 2024-09-17 01:48:06,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=77956.66666666667, ans=0.125 2024-09-17 01:48:08,081 INFO [train.py:1198] (1/2) Epoch 5, batch 1200, loss[loss=0.3446, simple_loss=0.3591, pruned_loss=0.1308, ctc_loss=0.2402, cr_loss=0.5101, over 34563.00 frames. ], tot_loss[loss=0.3173, simple_loss=0.3345, pruned_loss=0.1182, ctc_loss=0.2261, cr_loss=0.4653, over 6708019.44 frames. ], batch size: 99, lr: 2.34e-02, grad_scale: 32.0 2024-09-17 01:48:15,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.74 vs. limit=15.0 2024-09-17 01:48:18,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=77956.66666666667, ans=0.95 2024-09-17 01:48:31,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=78003.33333333333, ans=0.0 2024-09-17 01:48:32,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=78003.33333333333, ans=0.05 2024-09-17 01:48:34,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=78003.33333333333, ans=0.1 2024-09-17 01:49:03,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=22.5 2024-09-17 01:49:03,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=78096.66666666667, ans=0.125 2024-09-17 01:49:21,862 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.315e+02 2.779e+02 3.070e+02 3.770e+02 6.117e+02, threshold=6.139e+02, percent-clipped=0.0 2024-09-17 01:49:23,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=78143.33333333333, ans=0.125 2024-09-17 01:49:28,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=78143.33333333333, ans=0.025 2024-09-17 01:49:33,316 INFO [train.py:1198] (1/2) Epoch 5, batch 1250, loss[loss=0.3398, simple_loss=0.3553, pruned_loss=0.1281, ctc_loss=0.2416, cr_loss=0.4938, over 34323.00 frames. ], tot_loss[loss=0.3181, simple_loss=0.3352, pruned_loss=0.1185, ctc_loss=0.2265, cr_loss=0.4672, over 6742074.08 frames. ], batch size: 107, lr: 2.33e-02, grad_scale: 32.0 2024-09-17 01:49:38,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=78190.0, ans=0.125 2024-09-17 01:49:53,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=78236.66666666667, ans=0.125 2024-09-17 01:50:16,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=78283.33333333333, ans=0.0 2024-09-17 01:50:16,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=78283.33333333333, ans=0.125 2024-09-17 01:50:37,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=78376.66666666667, ans=0.0 2024-09-17 01:50:37,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.25 vs. limit=15.0 2024-09-17 01:50:42,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=78376.66666666667, ans=0.025 2024-09-17 01:50:47,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=78376.66666666667, ans=0.125 2024-09-17 01:50:55,060 INFO [train.py:1198] (1/2) Epoch 5, batch 1300, loss[loss=0.3253, simple_loss=0.3494, pruned_loss=0.118, ctc_loss=0.2321, cr_loss=0.4703, over 33186.00 frames. ], tot_loss[loss=0.3169, simple_loss=0.3341, pruned_loss=0.118, ctc_loss=0.2253, cr_loss=0.4656, over 6745634.05 frames. ], batch size: 130, lr: 2.33e-02, grad_scale: 16.0 2024-09-17 01:51:00,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=78423.33333333333, ans=0.125 2024-09-17 01:51:03,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=78423.33333333333, ans=0.125 2024-09-17 01:51:09,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=78470.0, ans=0.07 2024-09-17 01:51:37,735 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.699e-02 2024-09-17 01:51:45,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=78563.33333333333, ans=0.125 2024-09-17 01:51:52,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=78563.33333333333, ans=0.1 2024-09-17 01:51:54,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.61 vs. limit=22.5 2024-09-17 01:52:06,970 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.252e+02 3.179e+02 4.103e+02 5.643e+02 1.061e+03, threshold=8.205e+02, percent-clipped=19.0 2024-09-17 01:52:16,847 INFO [train.py:1198] (1/2) Epoch 5, batch 1350, loss[loss=0.3101, simple_loss=0.328, pruned_loss=0.1156, ctc_loss=0.2142, cr_loss=0.4564, over 34527.00 frames. ], tot_loss[loss=0.3159, simple_loss=0.3333, pruned_loss=0.1176, ctc_loss=0.2243, cr_loss=0.4649, over 6764482.71 frames. ], batch size: 94, lr: 2.33e-02, grad_scale: 16.0 2024-09-17 01:52:46,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=78703.33333333333, ans=0.125 2024-09-17 01:52:46,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=78703.33333333333, ans=0.125 2024-09-17 01:52:58,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2024-09-17 01:53:12,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=78796.66666666667, ans=0.125 2024-09-17 01:53:19,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=78796.66666666667, ans=0.1 2024-09-17 01:53:24,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=78843.33333333333, ans=0.2 2024-09-17 01:53:30,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=78843.33333333333, ans=0.125 2024-09-17 01:53:34,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=78843.33333333333, ans=0.1 2024-09-17 01:53:34,588 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.19 vs. limit=15.0 2024-09-17 01:53:36,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=15.0 2024-09-17 01:53:41,968 INFO [train.py:1198] (1/2) Epoch 5, batch 1400, loss[loss=0.2794, simple_loss=0.2987, pruned_loss=0.1013, ctc_loss=0.197, cr_loss=0.4485, over 34274.00 frames. ], tot_loss[loss=0.3159, simple_loss=0.3333, pruned_loss=0.1175, ctc_loss=0.2244, cr_loss=0.4648, over 6776485.35 frames. ], batch size: 80, lr: 2.33e-02, grad_scale: 16.0 2024-09-17 01:53:55,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=78890.0, ans=0.125 2024-09-17 01:54:04,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=78936.66666666667, ans=0.0 2024-09-17 01:54:04,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=78936.66666666667, ans=0.025 2024-09-17 01:54:14,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=78983.33333333333, ans=0.05 2024-09-17 01:54:16,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=78983.33333333333, ans=0.1 2024-09-17 01:54:29,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=79030.0, ans=0.1 2024-09-17 01:54:53,697 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.198e+02 3.016e+02 3.561e+02 4.495e+02 1.075e+03, threshold=7.122e+02, percent-clipped=2.0 2024-09-17 01:54:54,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.94 vs. limit=15.0 2024-09-17 01:55:03,539 INFO [train.py:1198] (1/2) Epoch 5, batch 1450, loss[loss=0.3505, simple_loss=0.363, pruned_loss=0.1343, ctc_loss=0.2481, cr_loss=0.4988, over 34439.00 frames. ], tot_loss[loss=0.3159, simple_loss=0.3335, pruned_loss=0.1174, ctc_loss=0.2243, cr_loss=0.4646, over 6774695.45 frames. ], batch size: 110, lr: 2.32e-02, grad_scale: 16.0 2024-09-17 01:55:07,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=79123.33333333333, ans=0.125 2024-09-17 01:55:16,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=79123.33333333333, ans=0.125 2024-09-17 01:55:25,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2024-09-17 01:55:50,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=79263.33333333333, ans=0.125 2024-09-17 01:55:51,639 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-09-17 01:56:07,492 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.44 vs. limit=15.0 2024-09-17 01:56:07,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.25 vs. limit=15.0 2024-09-17 01:56:10,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=79310.0, ans=0.125 2024-09-17 01:56:12,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.17 vs. limit=22.5 2024-09-17 01:56:18,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=79310.0, ans=0.0 2024-09-17 01:56:26,685 INFO [train.py:1198] (1/2) Epoch 5, batch 1500, loss[loss=0.325, simple_loss=0.3447, pruned_loss=0.1199, ctc_loss=0.23, cr_loss=0.4886, over 34456.00 frames. ], tot_loss[loss=0.316, simple_loss=0.3338, pruned_loss=0.1174, ctc_loss=0.2243, cr_loss=0.4653, over 6775815.96 frames. ], batch size: 100, lr: 2.32e-02, grad_scale: 16.0 2024-09-17 01:56:35,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=79356.66666666667, ans=0.0 2024-09-17 01:56:47,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=79403.33333333333, ans=0.04949747468305833 2024-09-17 01:56:47,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.77 vs. limit=10.0 2024-09-17 01:57:14,343 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2024-09-17 01:57:25,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=79496.66666666667, ans=0.125 2024-09-17 01:57:35,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=79543.33333333333, ans=0.125 2024-09-17 01:57:40,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=79543.33333333333, ans=0.125 2024-09-17 01:57:41,208 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.247e+02 2.930e+02 3.539e+02 4.817e+02 8.764e+02, threshold=7.077e+02, percent-clipped=3.0 2024-09-17 01:57:50,964 INFO [train.py:1198] (1/2) Epoch 5, batch 1550, loss[loss=0.3373, simple_loss=0.3527, pruned_loss=0.1266, ctc_loss=0.2397, cr_loss=0.5207, over 34455.00 frames. ], tot_loss[loss=0.3164, simple_loss=0.3336, pruned_loss=0.1178, ctc_loss=0.2251, cr_loss=0.4656, over 6746975.61 frames. ], batch size: 105, lr: 2.32e-02, grad_scale: 16.0 2024-09-17 01:58:07,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=79636.66666666667, ans=0.125 2024-09-17 01:58:07,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=79636.66666666667, ans=0.125 2024-09-17 01:58:23,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=79683.33333333333, ans=0.1 2024-09-17 01:58:26,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=79683.33333333333, ans=0.0 2024-09-17 01:59:09,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=79776.66666666667, ans=0.5 2024-09-17 01:59:12,381 INFO [train.py:1198] (1/2) Epoch 5, batch 1600, loss[loss=0.3307, simple_loss=0.3516, pruned_loss=0.1222, ctc_loss=0.2343, cr_loss=0.464, over 34577.00 frames. ], tot_loss[loss=0.3158, simple_loss=0.3332, pruned_loss=0.1174, ctc_loss=0.2247, cr_loss=0.4651, over 6727326.19 frames. ], batch size: 99, lr: 2.31e-02, grad_scale: 32.0 2024-09-17 01:59:33,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=79870.0, ans=0.1 2024-09-17 01:59:35,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=79870.0, ans=0.09899494936611666 2024-09-17 01:59:39,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2024-09-17 01:59:57,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.72 vs. limit=12.0 2024-09-17 02:00:01,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=79963.33333333333, ans=0.1 2024-09-17 02:00:08,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=79963.33333333333, ans=0.2 2024-09-17 02:00:27,779 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.210e+02 2.974e+02 3.456e+02 4.586e+02 9.128e+02, threshold=6.911e+02, percent-clipped=8.0 2024-09-17 02:00:32,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=15.0 2024-09-17 02:00:37,563 INFO [train.py:1198] (1/2) Epoch 5, batch 1650, loss[loss=0.332, simple_loss=0.3518, pruned_loss=0.1231, ctc_loss=0.2326, cr_loss=0.4883, over 34362.00 frames. ], tot_loss[loss=0.3159, simple_loss=0.3332, pruned_loss=0.1175, ctc_loss=0.2248, cr_loss=0.4654, over 6720683.83 frames. ], batch size: 103, lr: 2.31e-02, grad_scale: 32.0 2024-09-17 02:00:51,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=15.0 2024-09-17 02:01:02,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=80103.33333333333, ans=0.1 2024-09-17 02:01:02,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=80103.33333333333, ans=0.0 2024-09-17 02:01:04,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=26.80 vs. limit=22.5 2024-09-17 02:01:22,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=80150.0, ans=0.2 2024-09-17 02:01:29,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=12.0 2024-09-17 02:01:35,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=80196.66666666667, ans=0.0 2024-09-17 02:01:59,669 INFO [train.py:1198] (1/2) Epoch 5, batch 1700, loss[loss=0.2775, simple_loss=0.2989, pruned_loss=0.1004, ctc_loss=0.195, cr_loss=0.4045, over 34271.00 frames. ], tot_loss[loss=0.3154, simple_loss=0.3329, pruned_loss=0.1172, ctc_loss=0.2241, cr_loss=0.4652, over 6746364.14 frames. ], batch size: 80, lr: 2.31e-02, grad_scale: 32.0 2024-09-17 02:02:09,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=80290.0, ans=0.125 2024-09-17 02:02:23,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.05 vs. limit=6.0 2024-09-17 02:03:11,923 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.219e+02 2.907e+02 3.455e+02 4.751e+02 7.043e+02, threshold=6.910e+02, percent-clipped=1.0 2024-09-17 02:03:23,342 INFO [train.py:1198] (1/2) Epoch 5, batch 1750, loss[loss=0.2827, simple_loss=0.302, pruned_loss=0.1034, ctc_loss=0.1991, cr_loss=0.4197, over 34169.00 frames. ], tot_loss[loss=0.3145, simple_loss=0.3323, pruned_loss=0.1168, ctc_loss=0.2232, cr_loss=0.465, over 6755096.07 frames. ], batch size: 78, lr: 2.31e-02, grad_scale: 32.0 2024-09-17 02:03:50,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.78 vs. limit=15.0 2024-09-17 02:04:07,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=80616.66666666667, ans=0.0 2024-09-17 02:04:21,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=80663.33333333333, ans=0.1 2024-09-17 02:04:26,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=80663.33333333333, ans=0.125 2024-09-17 02:04:36,729 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2024-09-17 02:04:46,981 INFO [train.py:1198] (1/2) Epoch 5, batch 1800, loss[loss=0.321, simple_loss=0.3426, pruned_loss=0.1171, ctc_loss=0.2295, cr_loss=0.4834, over 34697.00 frames. ], tot_loss[loss=0.3148, simple_loss=0.3327, pruned_loss=0.1169, ctc_loss=0.2236, cr_loss=0.4649, over 6757688.92 frames. ], batch size: 97, lr: 2.30e-02, grad_scale: 32.0 2024-09-17 02:04:47,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=80756.66666666667, ans=0.2 2024-09-17 02:05:40,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-09-17 02:05:57,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=80943.33333333333, ans=0.025 2024-09-17 02:05:58,995 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 2.874e+02 3.476e+02 4.488e+02 6.966e+02, threshold=6.951e+02, percent-clipped=1.0 2024-09-17 02:06:06,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=80943.33333333333, ans=0.2 2024-09-17 02:06:08,995 INFO [train.py:1198] (1/2) Epoch 5, batch 1850, loss[loss=0.3168, simple_loss=0.3407, pruned_loss=0.115, ctc_loss=0.2214, cr_loss=0.4651, over 34462.00 frames. ], tot_loss[loss=0.3139, simple_loss=0.3319, pruned_loss=0.1164, ctc_loss=0.2226, cr_loss=0.4647, over 6765084.84 frames. ], batch size: 100, lr: 2.30e-02, grad_scale: 32.0 2024-09-17 02:06:20,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=80990.0, ans=0.125 2024-09-17 02:06:30,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=81036.66666666667, ans=0.1 2024-09-17 02:07:00,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=15.35 vs. limit=15.0 2024-09-17 02:07:26,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=81176.66666666667, ans=0.2 2024-09-17 02:07:32,240 INFO [train.py:1198] (1/2) Epoch 5, batch 1900, loss[loss=0.3205, simple_loss=0.3396, pruned_loss=0.1179, ctc_loss=0.2332, cr_loss=0.4723, over 34377.00 frames. ], tot_loss[loss=0.3147, simple_loss=0.3327, pruned_loss=0.1167, ctc_loss=0.2232, cr_loss=0.4651, over 6773157.55 frames. ], batch size: 103, lr: 2.30e-02, grad_scale: 32.0 2024-09-17 02:07:39,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=81223.33333333333, ans=0.0 2024-09-17 02:08:07,992 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.92 vs. limit=12.0 2024-09-17 02:08:10,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=81316.66666666667, ans=0.125 2024-09-17 02:08:33,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=81363.33333333333, ans=0.125 2024-09-17 02:08:33,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=81363.33333333333, ans=0.2 2024-09-17 02:08:41,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=81410.0, ans=0.125 2024-09-17 02:08:46,380 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.427e+02 3.112e+02 3.816e+02 4.965e+02 7.706e+02, threshold=7.632e+02, percent-clipped=3.0 2024-09-17 02:08:56,210 INFO [train.py:1198] (1/2) Epoch 5, batch 1950, loss[loss=0.3005, simple_loss=0.3192, pruned_loss=0.1114, ctc_loss=0.2092, cr_loss=0.4302, over 34353.00 frames. ], tot_loss[loss=0.316, simple_loss=0.3341, pruned_loss=0.1171, ctc_loss=0.2241, cr_loss=0.4679, over 6790360.91 frames. ], batch size: 91, lr: 2.29e-02, grad_scale: 32.0 2024-09-17 02:09:06,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=81456.66666666667, ans=0.125 2024-09-17 02:09:34,721 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=30.15 vs. limit=22.5 2024-09-17 02:09:43,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=81596.66666666667, ans=0.125 2024-09-17 02:09:49,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=81596.66666666667, ans=0.0 2024-09-17 02:10:04,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-09-17 02:10:12,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=81643.33333333333, ans=0.025 2024-09-17 02:10:13,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.76 vs. limit=15.0 2024-09-17 02:10:17,191 INFO [train.py:1198] (1/2) Epoch 5, batch 2000, loss[loss=0.2668, simple_loss=0.2885, pruned_loss=0.0955, ctc_loss=0.1845, cr_loss=0.433, over 34192.00 frames. ], tot_loss[loss=0.3168, simple_loss=0.3347, pruned_loss=0.1176, ctc_loss=0.2248, cr_loss=0.468, over 6765632.85 frames. ], batch size: 78, lr: 2.29e-02, grad_scale: 32.0 2024-09-17 02:10:22,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=81690.0, ans=0.0 2024-09-17 02:10:45,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=81736.66666666667, ans=0.0 2024-09-17 02:11:20,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=81830.0, ans=0.125 2024-09-17 02:11:26,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=81876.66666666667, ans=0.1 2024-09-17 02:11:28,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=81876.66666666667, ans=0.125 2024-09-17 02:11:31,374 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.277e+02 3.142e+02 3.964e+02 5.210e+02 8.070e+02, threshold=7.928e+02, percent-clipped=1.0 2024-09-17 02:11:35,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2024-09-17 02:11:43,160 INFO [train.py:1198] (1/2) Epoch 5, batch 2050, loss[loss=0.2693, simple_loss=0.2963, pruned_loss=0.09472, ctc_loss=0.1824, cr_loss=0.4106, over 34465.00 frames. ], tot_loss[loss=0.3148, simple_loss=0.3329, pruned_loss=0.1167, ctc_loss=0.2232, cr_loss=0.4658, over 6756587.16 frames. ], batch size: 82, lr: 2.29e-02, grad_scale: 32.0 2024-09-17 02:11:46,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=81923.33333333333, ans=0.0 2024-09-17 02:12:31,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=82063.33333333333, ans=6.0 2024-09-17 02:12:56,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82110.0, ans=0.1 2024-09-17 02:13:04,844 INFO [train.py:1198] (1/2) Epoch 5, batch 2100, loss[loss=0.2987, simple_loss=0.3221, pruned_loss=0.1075, ctc_loss=0.2112, cr_loss=0.4497, over 34555.00 frames. ], tot_loss[loss=0.3139, simple_loss=0.3322, pruned_loss=0.1162, ctc_loss=0.2223, cr_loss=0.4653, over 6770950.88 frames. ], batch size: 94, lr: 2.29e-02, grad_scale: 32.0 2024-09-17 02:13:27,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=82203.33333333333, ans=0.0 2024-09-17 02:13:40,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.91 vs. limit=15.0 2024-09-17 02:13:48,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=82250.0, ans=0.125 2024-09-17 02:13:50,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82250.0, ans=0.1 2024-09-17 02:14:13,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=82343.33333333333, ans=0.125 2024-09-17 02:14:16,430 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.308e+02 2.972e+02 3.533e+02 4.293e+02 8.397e+02, threshold=7.067e+02, percent-clipped=2.0 2024-09-17 02:14:27,912 INFO [train.py:1198] (1/2) Epoch 5, batch 2150, loss[loss=0.2977, simple_loss=0.3202, pruned_loss=0.1074, ctc_loss=0.2072, cr_loss=0.4779, over 34366.00 frames. ], tot_loss[loss=0.3129, simple_loss=0.3315, pruned_loss=0.1157, ctc_loss=0.2213, cr_loss=0.4649, over 6789203.47 frames. ], batch size: 91, lr: 2.28e-02, grad_scale: 32.0 2024-09-17 02:14:57,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=82436.66666666667, ans=0.125 2024-09-17 02:15:51,713 INFO [train.py:1198] (1/2) Epoch 5, batch 2200, loss[loss=0.3153, simple_loss=0.3405, pruned_loss=0.1138, ctc_loss=0.2199, cr_loss=0.4642, over 34450.00 frames. ], tot_loss[loss=0.313, simple_loss=0.3317, pruned_loss=0.1157, ctc_loss=0.2216, cr_loss=0.4652, over 6783900.29 frames. ], batch size: 100, lr: 2.28e-02, grad_scale: 32.0 2024-09-17 02:16:00,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=82623.33333333333, ans=0.125 2024-09-17 02:16:08,247 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:16:16,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82670.0, ans=0.1 2024-09-17 02:16:21,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.97 vs. limit=22.5 2024-09-17 02:16:30,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.84 vs. limit=15.0 2024-09-17 02:16:39,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=82763.33333333333, ans=0.125 2024-09-17 02:16:48,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=82763.33333333333, ans=0.05 2024-09-17 02:17:03,590 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.352e+02 3.268e+02 4.122e+02 5.654e+02 9.739e+02, threshold=8.244e+02, percent-clipped=10.0 2024-09-17 02:17:08,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.51 vs. limit=15.0 2024-09-17 02:17:13,322 INFO [train.py:1198] (1/2) Epoch 5, batch 2250, loss[loss=0.3306, simple_loss=0.346, pruned_loss=0.124, ctc_loss=0.2359, cr_loss=0.4979, over 34436.00 frames. ], tot_loss[loss=0.3129, simple_loss=0.3316, pruned_loss=0.1156, ctc_loss=0.2213, cr_loss=0.4647, over 6781449.85 frames. ], batch size: 95, lr: 2.28e-02, grad_scale: 32.0 2024-09-17 02:17:49,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=82950.0, ans=0.1 2024-09-17 02:17:51,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=82950.0, ans=0.125 2024-09-17 02:17:52,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=82950.0, ans=0.125 2024-09-17 02:17:56,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=82950.0, ans=0.0 2024-09-17 02:18:10,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=82996.66666666667, ans=0.0 2024-09-17 02:18:14,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82996.66666666667, ans=0.1 2024-09-17 02:18:17,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=9.29 vs. limit=15.0 2024-09-17 02:18:22,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=83043.33333333333, ans=0.125 2024-09-17 02:18:36,995 INFO [train.py:1198] (1/2) Epoch 5, batch 2300, loss[loss=0.2892, simple_loss=0.309, pruned_loss=0.1052, ctc_loss=0.2028, cr_loss=0.4604, over 34303.00 frames. ], tot_loss[loss=0.3117, simple_loss=0.3304, pruned_loss=0.1152, ctc_loss=0.2205, cr_loss=0.463, over 6767022.81 frames. ], batch size: 83, lr: 2.28e-02, grad_scale: 32.0 2024-09-17 02:18:42,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=83090.0, ans=0.05 2024-09-17 02:18:43,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=83090.0, ans=0.025 2024-09-17 02:19:03,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=83136.66666666667, ans=0.0 2024-09-17 02:19:14,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=83183.33333333333, ans=0.025 2024-09-17 02:19:34,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=83230.0, ans=0.2 2024-09-17 02:19:50,882 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.248e+02 3.074e+02 4.095e+02 5.239e+02 7.551e+02, threshold=8.189e+02, percent-clipped=0.0 2024-09-17 02:20:00,623 INFO [train.py:1198] (1/2) Epoch 5, batch 2350, loss[loss=0.3252, simple_loss=0.3416, pruned_loss=0.1214, ctc_loss=0.2332, cr_loss=0.4812, over 34692.00 frames. ], tot_loss[loss=0.3118, simple_loss=0.3305, pruned_loss=0.1152, ctc_loss=0.2205, cr_loss=0.4644, over 6773866.13 frames. ], batch size: 97, lr: 2.27e-02, grad_scale: 32.0 2024-09-17 02:20:02,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=83323.33333333333, ans=0.07 2024-09-17 02:20:12,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=83323.33333333333, ans=12.0 2024-09-17 02:20:19,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.20 vs. limit=15.0 2024-09-17 02:20:27,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=83370.0, ans=0.07 2024-09-17 02:20:28,625 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:20:33,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=83416.66666666667, ans=0.0 2024-09-17 02:20:40,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=83416.66666666667, ans=0.2 2024-09-17 02:20:40,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=15.0 2024-09-17 02:20:53,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.12 vs. limit=15.0 2024-09-17 02:21:14,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=83510.0, ans=0.025 2024-09-17 02:21:14,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=83510.0, ans=10.0 2024-09-17 02:21:18,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.15 vs. limit=15.0 2024-09-17 02:21:19,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=83510.0, ans=0.5 2024-09-17 02:21:21,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=83556.66666666667, ans=0.0 2024-09-17 02:21:22,478 INFO [train.py:1198] (1/2) Epoch 5, batch 2400, loss[loss=0.277, simple_loss=0.3008, pruned_loss=0.09847, ctc_loss=0.1928, cr_loss=0.4419, over 34569.00 frames. ], tot_loss[loss=0.3128, simple_loss=0.3313, pruned_loss=0.1157, ctc_loss=0.2214, cr_loss=0.4655, over 6776679.46 frames. ], batch size: 89, lr: 2.27e-02, grad_scale: 32.0 2024-09-17 02:21:44,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=83603.33333333333, ans=0.125 2024-09-17 02:21:53,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=83603.33333333333, ans=0.0 2024-09-17 02:22:24,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=83696.66666666667, ans=0.0 2024-09-17 02:22:37,654 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.481e+02 3.335e+02 4.481e+02 6.083e+02 1.176e+03, threshold=8.962e+02, percent-clipped=10.0 2024-09-17 02:22:45,793 INFO [train.py:1198] (1/2) Epoch 5, batch 2450, loss[loss=0.314, simple_loss=0.3429, pruned_loss=0.1116, ctc_loss=0.2172, cr_loss=0.4631, over 34428.00 frames. ], tot_loss[loss=0.3143, simple_loss=0.3325, pruned_loss=0.1165, ctc_loss=0.2229, cr_loss=0.4664, over 6751803.81 frames. ], batch size: 95, lr: 2.27e-02, grad_scale: 16.0 2024-09-17 02:22:47,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=83790.0, ans=0.125 2024-09-17 02:22:51,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=15.0 2024-09-17 02:23:15,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=83836.66666666667, ans=0.0 2024-09-17 02:23:15,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=83836.66666666667, ans=0.0 2024-09-17 02:23:41,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=83930.0, ans=10.0 2024-09-17 02:23:43,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=83930.0, ans=0.2 2024-09-17 02:23:47,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2024-09-17 02:23:48,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=83930.0, ans=0.07 2024-09-17 02:24:09,877 INFO [train.py:1198] (1/2) Epoch 5, batch 2500, loss[loss=0.3372, simple_loss=0.3532, pruned_loss=0.1262, ctc_loss=0.243, cr_loss=0.505, over 34459.00 frames. ], tot_loss[loss=0.3136, simple_loss=0.3321, pruned_loss=0.116, ctc_loss=0.2223, cr_loss=0.4662, over 6763412.05 frames. ], batch size: 100, lr: 2.26e-02, grad_scale: 16.0 2024-09-17 02:24:23,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=84023.33333333333, ans=0.2 2024-09-17 02:24:54,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=84116.66666666667, ans=0.1 2024-09-17 02:25:04,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=84163.33333333333, ans=0.0 2024-09-17 02:25:04,174 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:25:06,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.00 vs. limit=10.0 2024-09-17 02:25:17,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=84210.0, ans=0.0 2024-09-17 02:25:24,947 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.306e+02 3.178e+02 3.768e+02 4.877e+02 1.387e+03, threshold=7.535e+02, percent-clipped=1.0 2024-09-17 02:25:32,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=84256.66666666667, ans=0.05 2024-09-17 02:25:33,274 INFO [train.py:1198] (1/2) Epoch 5, batch 2550, loss[loss=0.2651, simple_loss=0.2903, pruned_loss=0.09345, ctc_loss=0.1803, cr_loss=0.4247, over 34164.00 frames. ], tot_loss[loss=0.3137, simple_loss=0.3322, pruned_loss=0.116, ctc_loss=0.2222, cr_loss=0.4661, over 6767914.98 frames. ], batch size: 78, lr: 2.26e-02, grad_scale: 16.0 2024-09-17 02:25:51,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=84303.33333333333, ans=0.0 2024-09-17 02:25:59,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=84303.33333333333, ans=0.125 2024-09-17 02:26:09,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=84350.0, ans=0.0 2024-09-17 02:26:33,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2024-09-17 02:26:43,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.62 vs. limit=15.0 2024-09-17 02:26:47,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=84443.33333333333, ans=0.0 2024-09-17 02:26:57,136 INFO [train.py:1198] (1/2) Epoch 5, batch 2600, loss[loss=0.2939, simple_loss=0.3153, pruned_loss=0.107, ctc_loss=0.2005, cr_loss=0.4584, over 34377.00 frames. ], tot_loss[loss=0.3144, simple_loss=0.3327, pruned_loss=0.1164, ctc_loss=0.223, cr_loss=0.4672, over 6762566.33 frames. ], batch size: 91, lr: 2.26e-02, grad_scale: 16.0 2024-09-17 02:26:59,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84490.0, ans=0.1 2024-09-17 02:27:14,169 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.75 vs. limit=22.5 2024-09-17 02:27:24,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2024-09-17 02:27:31,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=84583.33333333333, ans=0.125 2024-09-17 02:27:31,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84583.33333333333, ans=0.1 2024-09-17 02:27:34,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=84583.33333333333, ans=0.125 2024-09-17 02:27:35,383 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.61 vs. limit=10.0 2024-09-17 02:28:00,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=84676.66666666667, ans=0.125 2024-09-17 02:28:05,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=84676.66666666667, ans=0.125 2024-09-17 02:28:08,936 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:28:10,011 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.292e+02 3.085e+02 3.686e+02 4.516e+02 8.279e+02, threshold=7.372e+02, percent-clipped=2.0 2024-09-17 02:28:10,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=84676.66666666667, ans=0.2 2024-09-17 02:28:15,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=84676.66666666667, ans=0.125 2024-09-17 02:28:18,180 INFO [train.py:1198] (1/2) Epoch 5, batch 2650, loss[loss=0.352, simple_loss=0.3645, pruned_loss=0.1341, ctc_loss=0.2573, cr_loss=0.4965, over 34179.00 frames. ], tot_loss[loss=0.3139, simple_loss=0.3325, pruned_loss=0.1161, ctc_loss=0.2225, cr_loss=0.4669, over 6769711.83 frames. ], batch size: 117, lr: 2.26e-02, grad_scale: 16.0 2024-09-17 02:28:26,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=84723.33333333333, ans=0.125 2024-09-17 02:28:48,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=84770.0, ans=0.125 2024-09-17 02:29:08,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=84863.33333333333, ans=0.09899494936611666 2024-09-17 02:29:17,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=84863.33333333333, ans=0.1 2024-09-17 02:29:41,698 INFO [train.py:1198] (1/2) Epoch 5, batch 2700, loss[loss=0.3307, simple_loss=0.3509, pruned_loss=0.1224, ctc_loss=0.2285, cr_loss=0.4961, over 34607.00 frames. ], tot_loss[loss=0.3138, simple_loss=0.3324, pruned_loss=0.116, ctc_loss=0.2223, cr_loss=0.4674, over 6764412.37 frames. ], batch size: 102, lr: 2.25e-02, grad_scale: 16.0 2024-09-17 02:30:01,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=85003.33333333333, ans=0.04949747468305833 2024-09-17 02:30:19,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=85050.0, ans=0.125 2024-09-17 02:30:24,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=85050.0, ans=0.025 2024-09-17 02:30:27,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=85050.0, ans=0.125 2024-09-17 02:30:38,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=85096.66666666667, ans=0.1 2024-09-17 02:30:43,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.87 vs. limit=15.0 2024-09-17 02:30:44,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=85096.66666666667, ans=0.1 2024-09-17 02:30:57,081 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.280e+02 3.308e+02 4.191e+02 6.141e+02 8.580e+02, threshold=8.382e+02, percent-clipped=9.0 2024-09-17 02:31:05,410 INFO [train.py:1198] (1/2) Epoch 5, batch 2750, loss[loss=0.3146, simple_loss=0.3264, pruned_loss=0.1193, ctc_loss=0.2265, cr_loss=0.4719, over 34629.00 frames. ], tot_loss[loss=0.3117, simple_loss=0.3306, pruned_loss=0.1151, ctc_loss=0.2207, cr_loss=0.4651, over 6760212.48 frames. ], batch size: 88, lr: 2.25e-02, grad_scale: 16.0 2024-09-17 02:31:07,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=85190.0, ans=0.125 2024-09-17 02:31:10,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=85190.0, ans=0.1 2024-09-17 02:31:15,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=85190.0, ans=0.125 2024-09-17 02:31:22,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=85236.66666666667, ans=0.0 2024-09-17 02:31:31,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=85236.66666666667, ans=0.1 2024-09-17 02:31:33,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=85236.66666666667, ans=0.0 2024-09-17 02:31:35,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=85236.66666666667, ans=0.0 2024-09-17 02:31:51,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.60 vs. limit=15.0 2024-09-17 02:32:27,117 INFO [train.py:1198] (1/2) Epoch 5, batch 2800, loss[loss=0.3694, simple_loss=0.365, pruned_loss=0.1493, ctc_loss=0.2806, cr_loss=0.4765, over 23872.00 frames. ], tot_loss[loss=0.3116, simple_loss=0.3305, pruned_loss=0.115, ctc_loss=0.2206, cr_loss=0.4643, over 6738856.75 frames. ], batch size: 244, lr: 2.25e-02, grad_scale: 32.0 2024-09-17 02:32:28,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=85423.33333333333, ans=0.125 2024-09-17 02:32:36,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.05 vs. limit=22.5 2024-09-17 02:33:04,223 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.38 vs. limit=15.0 2024-09-17 02:33:14,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=85516.66666666667, ans=0.025 2024-09-17 02:33:31,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=85563.33333333333, ans=0.0 2024-09-17 02:33:31,736 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.11 vs. limit=15.0 2024-09-17 02:33:42,154 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.280e+02 3.040e+02 3.715e+02 4.333e+02 8.238e+02, threshold=7.430e+02, percent-clipped=0.0 2024-09-17 02:33:45,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.93 vs. limit=15.0 2024-09-17 02:33:50,324 INFO [train.py:1198] (1/2) Epoch 5, batch 2850, loss[loss=0.2903, simple_loss=0.3142, pruned_loss=0.104, ctc_loss=0.2031, cr_loss=0.4411, over 34492.00 frames. ], tot_loss[loss=0.3129, simple_loss=0.3316, pruned_loss=0.1157, ctc_loss=0.2219, cr_loss=0.4652, over 6724750.12 frames. ], batch size: 90, lr: 2.25e-02, grad_scale: 32.0 2024-09-17 02:33:58,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=85656.66666666667, ans=0.125 2024-09-17 02:33:59,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.77 vs. limit=15.0 2024-09-17 02:34:02,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=85656.66666666667, ans=0.025 2024-09-17 02:34:06,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=85703.33333333333, ans=0.05 2024-09-17 02:34:30,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=85750.0, ans=0.125 2024-09-17 02:34:33,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.83 vs. limit=22.5 2024-09-17 02:34:41,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=85796.66666666667, ans=0.0 2024-09-17 02:34:44,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=85796.66666666667, ans=0.0 2024-09-17 02:34:48,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=85796.66666666667, ans=0.0 2024-09-17 02:34:49,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=85796.66666666667, ans=0.05 2024-09-17 02:34:56,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=85843.33333333333, ans=0.5 2024-09-17 02:35:10,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=85843.33333333333, ans=0.05 2024-09-17 02:35:13,962 INFO [train.py:1198] (1/2) Epoch 5, batch 2900, loss[loss=0.3069, simple_loss=0.3285, pruned_loss=0.1116, ctc_loss=0.2167, cr_loss=0.4656, over 34521.00 frames. ], tot_loss[loss=0.3133, simple_loss=0.3324, pruned_loss=0.1156, ctc_loss=0.2217, cr_loss=0.4665, over 6755424.42 frames. ], batch size: 94, lr: 2.24e-02, grad_scale: 32.0 2024-09-17 02:35:27,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.57 vs. limit=12.0 2024-09-17 02:35:34,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.44 vs. limit=12.0 2024-09-17 02:35:40,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=85936.66666666667, ans=0.0 2024-09-17 02:35:48,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=85983.33333333333, ans=0.025 2024-09-17 02:35:50,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=85983.33333333333, ans=0.125 2024-09-17 02:36:00,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=85983.33333333333, ans=0.0 2024-09-17 02:36:11,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=86030.0, ans=0.125 2024-09-17 02:36:29,131 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.358e+02 2.835e+02 3.346e+02 4.464e+02 7.227e+02, threshold=6.692e+02, percent-clipped=0.0 2024-09-17 02:36:37,339 INFO [train.py:1198] (1/2) Epoch 5, batch 2950, loss[loss=0.2998, simple_loss=0.3203, pruned_loss=0.1102, ctc_loss=0.2046, cr_loss=0.4521, over 34663.00 frames. ], tot_loss[loss=0.3118, simple_loss=0.3309, pruned_loss=0.115, ctc_loss=0.2206, cr_loss=0.4649, over 6749442.16 frames. ], batch size: 88, lr: 2.24e-02, grad_scale: 32.0 2024-09-17 02:36:50,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=86123.33333333333, ans=0.125 2024-09-17 02:36:54,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=86170.0, ans=0.125 2024-09-17 02:36:58,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=86170.0, ans=0.125 2024-09-17 02:37:13,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=86216.66666666667, ans=0.125 2024-09-17 02:37:30,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=86263.33333333333, ans=10.0 2024-09-17 02:38:01,602 INFO [train.py:1198] (1/2) Epoch 5, batch 3000, loss[loss=0.3092, simple_loss=0.3329, pruned_loss=0.1119, ctc_loss=0.2171, cr_loss=0.4528, over 34523.00 frames. ], tot_loss[loss=0.3113, simple_loss=0.3305, pruned_loss=0.1147, ctc_loss=0.22, cr_loss=0.4648, over 6750434.28 frames. ], batch size: 94, lr: 2.24e-02, grad_scale: 32.0 2024-09-17 02:38:01,603 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 02:38:18,306 INFO [train.py:1230] (1/2) Epoch 5, validation: loss=0.1767, simple_loss=0.2736, pruned_loss=0.03291, ctc_loss=0.06955, cr_loss=1.433e-14, over 944034.00 frames. 2024-09-17 02:38:18,306 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 02:38:23,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=86356.66666666667, ans=10.0 2024-09-17 02:38:41,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=86403.33333333333, ans=0.125 2024-09-17 02:39:13,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=86496.66666666667, ans=0.2 2024-09-17 02:39:19,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.15 vs. limit=22.5 2024-09-17 02:39:31,171 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.364e+02 2.946e+02 3.692e+02 4.488e+02 6.471e+02, threshold=7.385e+02, percent-clipped=0.0 2024-09-17 02:39:39,375 INFO [train.py:1198] (1/2) Epoch 5, batch 3050, loss[loss=0.3089, simple_loss=0.3266, pruned_loss=0.1144, ctc_loss=0.2179, cr_loss=0.4687, over 34596.00 frames. ], tot_loss[loss=0.313, simple_loss=0.3319, pruned_loss=0.1156, ctc_loss=0.2215, cr_loss=0.466, over 6742926.23 frames. ], batch size: 89, lr: 2.24e-02, grad_scale: 32.0 2024-09-17 02:40:12,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=86683.33333333333, ans=0.2 2024-09-17 02:40:25,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=22.5 2024-09-17 02:40:33,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=86730.0, ans=0.1 2024-09-17 02:40:38,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=86730.0, ans=0.125 2024-09-17 02:40:40,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.84 vs. limit=15.0 2024-09-17 02:41:01,900 INFO [train.py:1198] (1/2) Epoch 5, batch 3100, loss[loss=0.3405, simple_loss=0.3563, pruned_loss=0.1277, ctc_loss=0.2429, cr_loss=0.518, over 34153.00 frames. ], tot_loss[loss=0.3116, simple_loss=0.3309, pruned_loss=0.1149, ctc_loss=0.2203, cr_loss=0.4643, over 6742755.41 frames. ], batch size: 117, lr: 2.23e-02, grad_scale: 16.0 2024-09-17 02:41:05,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=86823.33333333333, ans=0.1 2024-09-17 02:41:10,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.59 vs. limit=22.5 2024-09-17 02:41:20,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2024-09-17 02:41:21,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=86870.0, ans=0.125 2024-09-17 02:41:24,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=86870.0, ans=0.0 2024-09-17 02:41:27,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=86870.0, ans=0.0 2024-09-17 02:41:32,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=86916.66666666667, ans=0.07 2024-09-17 02:41:56,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=86963.33333333333, ans=0.125 2024-09-17 02:42:16,014 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.454e+02 3.251e+02 4.300e+02 6.115e+02 9.224e+02, threshold=8.600e+02, percent-clipped=7.0 2024-09-17 02:42:21,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.77 vs. limit=15.0 2024-09-17 02:42:22,568 INFO [train.py:1198] (1/2) Epoch 5, batch 3150, loss[loss=0.339, simple_loss=0.3578, pruned_loss=0.1254, ctc_loss=0.2433, cr_loss=0.5181, over 33922.00 frames. ], tot_loss[loss=0.3115, simple_loss=0.3308, pruned_loss=0.1148, ctc_loss=0.2203, cr_loss=0.4645, over 6748912.21 frames. ], batch size: 122, lr: 2.23e-02, grad_scale: 16.0 2024-09-17 02:42:22,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=87056.66666666667, ans=0.125 2024-09-17 02:42:23,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.84 vs. limit=10.0 2024-09-17 02:42:30,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=87056.66666666667, ans=0.125 2024-09-17 02:42:32,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=87056.66666666667, ans=0.0 2024-09-17 02:42:39,492 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2024-09-17 02:42:40,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=87103.33333333333, ans=0.025 2024-09-17 02:42:47,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=87103.33333333333, ans=0.125 2024-09-17 02:43:00,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.36 vs. limit=15.0 2024-09-17 02:43:05,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=87150.0, ans=0.125 2024-09-17 02:43:07,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=87150.0, ans=0.025 2024-09-17 02:43:17,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2024-09-17 02:43:24,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.15 vs. limit=15.0 2024-09-17 02:43:44,144 INFO [train.py:1198] (1/2) Epoch 5, batch 3200, loss[loss=0.3064, simple_loss=0.3274, pruned_loss=0.1119, ctc_loss=0.2115, cr_loss=0.4857, over 34540.00 frames. ], tot_loss[loss=0.3105, simple_loss=0.3298, pruned_loss=0.1144, ctc_loss=0.2193, cr_loss=0.4634, over 6761732.05 frames. ], batch size: 94, lr: 2.23e-02, grad_scale: 32.0 2024-09-17 02:44:28,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.91 vs. limit=15.0 2024-09-17 02:44:42,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=87430.0, ans=0.125 2024-09-17 02:44:47,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=87476.66666666667, ans=0.125 2024-09-17 02:44:58,481 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.173e+02 3.030e+02 3.526e+02 4.247e+02 8.000e+02, threshold=7.053e+02, percent-clipped=0.0 2024-09-17 02:45:03,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=87523.33333333333, ans=0.1 2024-09-17 02:45:04,942 INFO [train.py:1198] (1/2) Epoch 5, batch 3250, loss[loss=0.3298, simple_loss=0.3457, pruned_loss=0.1237, ctc_loss=0.2386, cr_loss=0.4712, over 34631.00 frames. ], tot_loss[loss=0.3106, simple_loss=0.3302, pruned_loss=0.1143, ctc_loss=0.2193, cr_loss=0.4643, over 6770493.72 frames. ], batch size: 98, lr: 2.23e-02, grad_scale: 32.0 2024-09-17 02:45:07,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.51 vs. limit=15.0 2024-09-17 02:45:18,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=14.85 vs. limit=15.0 2024-09-17 02:45:19,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87570.0, ans=0.1 2024-09-17 02:45:28,159 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.86 vs. limit=22.5 2024-09-17 02:45:49,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=87616.66666666667, ans=0.125 2024-09-17 02:46:04,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=87663.33333333333, ans=0.125 2024-09-17 02:46:15,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=87710.0, ans=0.09899494936611666 2024-09-17 02:46:21,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=87710.0, ans=0.2 2024-09-17 02:46:24,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=87756.66666666667, ans=0.125 2024-09-17 02:46:26,258 INFO [train.py:1198] (1/2) Epoch 5, batch 3300, loss[loss=0.3438, simple_loss=0.3591, pruned_loss=0.1289, ctc_loss=0.2554, cr_loss=0.4899, over 33137.00 frames. ], tot_loss[loss=0.3093, simple_loss=0.329, pruned_loss=0.1137, ctc_loss=0.2182, cr_loss=0.4623, over 6767813.30 frames. ], batch size: 130, lr: 2.22e-02, grad_scale: 32.0 2024-09-17 02:46:28,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=87756.66666666667, ans=0.95 2024-09-17 02:46:38,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=87756.66666666667, ans=0.125 2024-09-17 02:46:42,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=87803.33333333333, ans=0.035 2024-09-17 02:47:29,322 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:47:37,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=87943.33333333333, ans=0.125 2024-09-17 02:47:38,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=87943.33333333333, ans=0.125 2024-09-17 02:47:40,245 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.184e+02 2.935e+02 3.754e+02 4.857e+02 7.751e+02, threshold=7.509e+02, percent-clipped=2.0 2024-09-17 02:47:46,727 INFO [train.py:1198] (1/2) Epoch 5, batch 3350, loss[loss=0.3316, simple_loss=0.3457, pruned_loss=0.1251, ctc_loss=0.2394, cr_loss=0.4844, over 33826.00 frames. ], tot_loss[loss=0.3105, simple_loss=0.3299, pruned_loss=0.1143, ctc_loss=0.2194, cr_loss=0.4639, over 6742891.85 frames. ], batch size: 122, lr: 2.22e-02, grad_scale: 32.0 2024-09-17 02:48:07,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=88036.66666666667, ans=0.025 2024-09-17 02:48:28,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=88083.33333333333, ans=0.125 2024-09-17 02:48:31,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=88083.33333333333, ans=0.125 2024-09-17 02:48:42,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2024-09-17 02:48:49,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=88176.66666666667, ans=0.2 2024-09-17 02:48:52,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=88176.66666666667, ans=0.1 2024-09-17 02:48:54,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=88176.66666666667, ans=0.0 2024-09-17 02:49:00,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=88176.66666666667, ans=0.125 2024-09-17 02:49:07,988 INFO [train.py:1198] (1/2) Epoch 5, batch 3400, loss[loss=0.2773, simple_loss=0.2967, pruned_loss=0.1007, ctc_loss=0.1956, cr_loss=0.43, over 34162.00 frames. ], tot_loss[loss=0.3107, simple_loss=0.3299, pruned_loss=0.1145, ctc_loss=0.2196, cr_loss=0.464, over 6733866.44 frames. ], batch size: 78, lr: 2.22e-02, grad_scale: 32.0 2024-09-17 02:49:33,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=88270.0, ans=0.0 2024-09-17 02:49:41,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=88316.66666666667, ans=0.125 2024-09-17 02:49:46,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=88316.66666666667, ans=0.025 2024-09-17 02:50:12,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=88410.0, ans=0.125 2024-09-17 02:50:13,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=88410.0, ans=0.125 2024-09-17 02:50:21,401 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.178e+02 2.920e+02 3.623e+02 4.581e+02 8.916e+02, threshold=7.246e+02, percent-clipped=3.0 2024-09-17 02:50:27,816 INFO [train.py:1198] (1/2) Epoch 5, batch 3450, loss[loss=0.3194, simple_loss=0.342, pruned_loss=0.116, ctc_loss=0.224, cr_loss=0.4996, over 33195.00 frames. ], tot_loss[loss=0.3104, simple_loss=0.33, pruned_loss=0.1142, ctc_loss=0.2189, cr_loss=0.4637, over 6746300.35 frames. ], batch size: 130, lr: 2.21e-02, grad_scale: 32.0 2024-09-17 02:50:40,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-09-17 02:50:45,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=88503.33333333333, ans=0.125 2024-09-17 02:50:52,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=88503.33333333333, ans=15.0 2024-09-17 02:50:55,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=88503.33333333333, ans=0.025 2024-09-17 02:51:07,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=88550.0, ans=0.0 2024-09-17 02:51:15,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=88596.66666666667, ans=0.0 2024-09-17 02:51:25,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-09-17 02:51:30,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2024-09-17 02:51:33,681 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.54 vs. limit=12.0 2024-09-17 02:51:45,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=88643.33333333333, ans=0.1 2024-09-17 02:51:45,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=88643.33333333333, ans=0.125 2024-09-17 02:51:48,613 INFO [train.py:1198] (1/2) Epoch 5, batch 3500, loss[loss=0.2777, simple_loss=0.3002, pruned_loss=0.09922, ctc_loss=0.1932, cr_loss=0.4555, over 34502.00 frames. ], tot_loss[loss=0.3097, simple_loss=0.3292, pruned_loss=0.1139, ctc_loss=0.2184, cr_loss=0.4631, over 6748217.01 frames. ], batch size: 85, lr: 2.21e-02, grad_scale: 16.0 2024-09-17 02:51:52,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=88690.0, ans=0.1 2024-09-17 02:51:57,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.71 vs. limit=15.0 2024-09-17 02:52:03,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.53 vs. limit=10.0 2024-09-17 02:52:46,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=88830.0, ans=0.125 2024-09-17 02:52:57,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=88876.66666666667, ans=0.125 2024-09-17 02:53:00,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=88876.66666666667, ans=0.1 2024-09-17 02:53:00,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=88876.66666666667, ans=0.125 2024-09-17 02:53:03,561 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.158e+02 2.943e+02 3.632e+02 4.756e+02 8.527e+02, threshold=7.265e+02, percent-clipped=7.0 2024-09-17 02:53:08,318 INFO [train.py:1198] (1/2) Epoch 5, batch 3550, loss[loss=0.3271, simple_loss=0.3475, pruned_loss=0.1206, ctc_loss=0.2335, cr_loss=0.475, over 34396.00 frames. ], tot_loss[loss=0.3097, simple_loss=0.3294, pruned_loss=0.1139, ctc_loss=0.2182, cr_loss=0.4635, over 6757847.98 frames. ], batch size: 103, lr: 2.21e-02, grad_scale: 16.0 2024-09-17 02:53:15,511 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.99 vs. limit=15.0 2024-09-17 02:53:19,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=88923.33333333333, ans=0.125 2024-09-17 02:53:59,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=89063.33333333333, ans=0.0 2024-09-17 02:54:14,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=89110.0, ans=0.125 2024-09-17 02:54:24,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=89110.0, ans=0.125 2024-09-17 02:54:29,089 INFO [train.py:1198] (1/2) Epoch 5, batch 3600, loss[loss=0.2933, simple_loss=0.3159, pruned_loss=0.1054, ctc_loss=0.2091, cr_loss=0.4485, over 34481.00 frames. ], tot_loss[loss=0.31, simple_loss=0.3298, pruned_loss=0.114, ctc_loss=0.2184, cr_loss=0.4643, over 6767693.23 frames. ], batch size: 90, lr: 2.21e-02, grad_scale: 32.0 2024-09-17 02:54:37,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=89156.66666666667, ans=0.0 2024-09-17 02:54:45,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=8.41 vs. limit=10.0 2024-09-17 02:55:00,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=89250.0, ans=0.0 2024-09-17 02:55:05,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=89250.0, ans=0.0 2024-09-17 02:55:21,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=89296.66666666667, ans=0.04949747468305833 2024-09-17 02:55:27,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=89296.66666666667, ans=0.125 2024-09-17 02:55:30,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89296.66666666667, ans=0.1 2024-09-17 02:55:44,778 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.196e+02 2.910e+02 3.530e+02 4.615e+02 7.600e+02, threshold=7.061e+02, percent-clipped=2.0 2024-09-17 02:55:48,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89390.0, ans=0.1 2024-09-17 02:55:49,639 INFO [train.py:1198] (1/2) Epoch 5, batch 3650, loss[loss=0.3588, simple_loss=0.3666, pruned_loss=0.1408, ctc_loss=0.2533, cr_loss=0.4703, over 34448.00 frames. ], tot_loss[loss=0.3092, simple_loss=0.3291, pruned_loss=0.1136, ctc_loss=0.2177, cr_loss=0.4627, over 6769211.01 frames. ], batch size: 110, lr: 2.20e-02, grad_scale: 32.0 2024-09-17 02:55:53,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=89390.0, ans=0.0 2024-09-17 02:56:13,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=89436.66666666667, ans=0.125 2024-09-17 02:56:23,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=89483.33333333333, ans=0.1 2024-09-17 02:56:32,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89483.33333333333, ans=0.1 2024-09-17 02:56:36,614 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.61 vs. limit=15.0 2024-09-17 02:56:41,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.80 vs. limit=10.0 2024-09-17 02:56:44,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=89530.0, ans=0.025 2024-09-17 02:56:45,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=89530.0, ans=0.0 2024-09-17 02:56:52,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=89576.66666666667, ans=0.0 2024-09-17 02:57:01,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.58 vs. limit=15.0 2024-09-17 02:57:10,308 INFO [train.py:1198] (1/2) Epoch 5, batch 3700, loss[loss=0.3242, simple_loss=0.3432, pruned_loss=0.1204, ctc_loss=0.2268, cr_loss=0.4768, over 34614.00 frames. ], tot_loss[loss=0.3089, simple_loss=0.329, pruned_loss=0.1134, ctc_loss=0.2175, cr_loss=0.4625, over 6783874.25 frames. ], batch size: 102, lr: 2.20e-02, grad_scale: 16.0 2024-09-17 02:57:42,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=89716.66666666667, ans=0.2 2024-09-17 02:58:00,640 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 02:58:08,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2024-09-17 02:58:17,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=89810.0, ans=0.125 2024-09-17 02:58:28,053 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.230e+02 2.798e+02 3.289e+02 4.112e+02 7.146e+02, threshold=6.577e+02, percent-clipped=1.0 2024-09-17 02:58:31,337 INFO [train.py:1198] (1/2) Epoch 5, batch 3750, loss[loss=0.3263, simple_loss=0.3473, pruned_loss=0.12, ctc_loss=0.2281, cr_loss=0.4955, over 34343.00 frames. ], tot_loss[loss=0.3128, simple_loss=0.3326, pruned_loss=0.1151, ctc_loss=0.2206, cr_loss=0.4677, over 6785631.95 frames. ], batch size: 113, lr: 2.20e-02, grad_scale: 16.0 2024-09-17 02:58:47,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=89903.33333333333, ans=0.0 2024-09-17 02:58:50,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89903.33333333333, ans=0.1 2024-09-17 02:59:00,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=89903.33333333333, ans=0.125 2024-09-17 02:59:06,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=89950.0, ans=0.04949747468305833 2024-09-17 02:59:24,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=89996.66666666667, ans=0.125 2024-09-17 02:59:47,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=90043.33333333333, ans=0.2 2024-09-17 02:59:52,295 INFO [train.py:1198] (1/2) Epoch 5, batch 3800, loss[loss=0.3656, simple_loss=0.3658, pruned_loss=0.1453, ctc_loss=0.2721, cr_loss=0.5053, over 29937.00 frames. ], tot_loss[loss=0.3184, simple_loss=0.3366, pruned_loss=0.1181, ctc_loss=0.2258, cr_loss=0.4715, over 6675047.58 frames. ], batch size: 175, lr: 2.20e-02, grad_scale: 16.0 2024-09-17 02:59:54,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=90090.0, ans=0.0 2024-09-17 03:00:07,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90136.66666666667, ans=0.1 2024-09-17 03:00:09,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=90136.66666666667, ans=0.125 2024-09-17 03:00:34,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=90183.33333333333, ans=0.125 2024-09-17 03:00:37,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=90183.33333333333, ans=0.125 2024-09-17 03:01:05,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=90276.66666666667, ans=0.0 2024-09-17 03:01:06,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=90276.66666666667, ans=0.125 2024-09-17 03:01:11,620 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.424e+02 2.751e+02 3.080e+02 3.501e+02 6.821e+02, threshold=6.161e+02, percent-clipped=1.0 2024-09-17 03:01:14,894 INFO [train.py:1198] (1/2) Epoch 5, batch 3850, loss[loss=0.3809, simple_loss=0.3671, pruned_loss=0.1578, ctc_loss=0.2952, cr_loss=0.5007, over 23604.00 frames. ], tot_loss[loss=0.3273, simple_loss=0.3417, pruned_loss=0.1234, ctc_loss=0.236, cr_loss=0.4738, over 6247374.50 frames. ], batch size: 244, lr: 2.19e-02, grad_scale: 16.0 2024-09-17 03:01:21,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=90323.33333333333, ans=0.2 2024-09-17 03:01:45,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=90370.0, ans=0.125 2024-09-17 03:01:53,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=90416.66666666667, ans=0.95 2024-09-17 03:02:45,820 INFO [train.py:1198] (1/2) Epoch 6, batch 0, loss[loss=0.2934, simple_loss=0.3125, pruned_loss=0.1079, ctc_loss=0.2042, cr_loss=0.4427, over 34486.00 frames. ], tot_loss[loss=0.2934, simple_loss=0.3125, pruned_loss=0.1079, ctc_loss=0.2042, cr_loss=0.4427, over 34486.00 frames. ], batch size: 85, lr: 2.05e-02, grad_scale: 32.0 2024-09-17 03:02:45,821 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 03:02:52,961 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([4.0905, 3.3268, 3.7820, 2.7533], device='cuda:1') 2024-09-17 03:03:02,571 INFO [train.py:1230] (1/2) Epoch 6, validation: loss=0.1793, simple_loss=0.2771, pruned_loss=0.03377, ctc_loss=0.0703, cr_loss=1.48e-14, over 944034.00 frames. 2024-09-17 03:03:02,571 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 03:03:14,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=90449.33333333333, ans=0.125 2024-09-17 03:03:15,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.98 vs. limit=22.5 2024-09-17 03:03:47,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=90542.66666666667, ans=0.125 2024-09-17 03:04:13,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=90636.0, ans=0.0 2024-09-17 03:04:13,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=90636.0, ans=0.1 2024-09-17 03:04:22,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=90636.0, ans=0.0 2024-09-17 03:04:24,964 INFO [train.py:1198] (1/2) Epoch 6, batch 50, loss[loss=0.2623, simple_loss=0.2889, pruned_loss=0.09167, ctc_loss=0.1787, cr_loss=0.4163, over 34469.00 frames. ], tot_loss[loss=0.3135, simple_loss=0.3318, pruned_loss=0.1161, ctc_loss=0.2215, cr_loss=0.4676, over 1480465.47 frames. ], batch size: 82, lr: 2.04e-02, grad_scale: 32.0 2024-09-17 03:04:33,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=90682.66666666667, ans=0.125 2024-09-17 03:04:36,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=90682.66666666667, ans=0.125 2024-09-17 03:05:01,238 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.201e+02 2.992e+02 3.584e+02 5.780e+02 9.824e+02, threshold=7.169e+02, percent-clipped=20.0 2024-09-17 03:05:24,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=90822.66666666667, ans=0.125 2024-09-17 03:05:26,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=90822.66666666667, ans=0.025 2024-09-17 03:05:26,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=90822.66666666667, ans=0.5 2024-09-17 03:05:46,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=90869.33333333333, ans=0.125 2024-09-17 03:05:50,525 INFO [train.py:1198] (1/2) Epoch 6, batch 100, loss[loss=0.2882, simple_loss=0.3078, pruned_loss=0.1048, ctc_loss=0.2029, cr_loss=0.46, over 34582.00 frames. ], tot_loss[loss=0.3129, simple_loss=0.3325, pruned_loss=0.1153, ctc_loss=0.2207, cr_loss=0.4684, over 2630073.90 frames. ], batch size: 89, lr: 2.04e-02, grad_scale: 16.0 2024-09-17 03:06:32,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=91009.33333333333, ans=0.0 2024-09-17 03:06:56,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=91102.66666666667, ans=0.125 2024-09-17 03:07:00,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2024-09-17 03:07:06,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=91102.66666666667, ans=0.125 2024-09-17 03:07:10,888 INFO [train.py:1198] (1/2) Epoch 6, batch 150, loss[loss=0.259, simple_loss=0.2846, pruned_loss=0.09003, ctc_loss=0.1788, cr_loss=0.4398, over 34474.00 frames. ], tot_loss[loss=0.3081, simple_loss=0.3288, pruned_loss=0.1128, ctc_loss=0.2164, cr_loss=0.4645, over 3558935.93 frames. ], batch size: 82, lr: 2.04e-02, grad_scale: 16.0 2024-09-17 03:07:21,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=91149.33333333333, ans=0.0 2024-09-17 03:07:33,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2024-09-17 03:07:35,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91196.0, ans=0.1 2024-09-17 03:07:46,873 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.282e+02 2.759e+02 3.224e+02 4.195e+02 8.737e+02, threshold=6.448e+02, percent-clipped=1.0 2024-09-17 03:07:50,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=91242.66666666667, ans=0.0 2024-09-17 03:08:02,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=12.0 2024-09-17 03:08:03,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=91289.33333333333, ans=0.2 2024-09-17 03:08:18,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91336.0, ans=0.1 2024-09-17 03:08:32,474 INFO [train.py:1198] (1/2) Epoch 6, batch 200, loss[loss=0.3337, simple_loss=0.3454, pruned_loss=0.127, ctc_loss=0.2446, cr_loss=0.4771, over 31900.00 frames. ], tot_loss[loss=0.3068, simple_loss=0.3273, pruned_loss=0.1123, ctc_loss=0.2156, cr_loss=0.4632, over 4273443.20 frames. ], batch size: 145, lr: 2.04e-02, grad_scale: 16.0 2024-09-17 03:09:08,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2024-09-17 03:09:16,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=91476.0, ans=0.125 2024-09-17 03:09:23,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=91476.0, ans=0.0 2024-09-17 03:09:31,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=91522.66666666667, ans=0.1 2024-09-17 03:09:59,038 INFO [train.py:1198] (1/2) Epoch 6, batch 250, loss[loss=0.3463, simple_loss=0.3638, pruned_loss=0.1303, ctc_loss=0.2442, cr_loss=0.4831, over 34195.00 frames. ], tot_loss[loss=0.3063, simple_loss=0.3273, pruned_loss=0.1119, ctc_loss=0.2149, cr_loss=0.4625, over 4836040.23 frames. ], batch size: 117, lr: 2.04e-02, grad_scale: 16.0 2024-09-17 03:10:27,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=91662.66666666667, ans=0.0 2024-09-17 03:10:31,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.92 vs. limit=15.0 2024-09-17 03:10:35,302 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.177e+02 2.807e+02 3.330e+02 4.143e+02 7.054e+02, threshold=6.660e+02, percent-clipped=2.0 2024-09-17 03:10:40,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91709.33333333333, ans=0.1 2024-09-17 03:11:03,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=91802.66666666667, ans=0.125 2024-09-17 03:11:21,053 INFO [train.py:1198] (1/2) Epoch 6, batch 300, loss[loss=0.2916, simple_loss=0.3173, pruned_loss=0.1032, ctc_loss=0.203, cr_loss=0.469, over 34363.00 frames. ], tot_loss[loss=0.3055, simple_loss=0.3265, pruned_loss=0.1116, ctc_loss=0.2145, cr_loss=0.4621, over 5264133.35 frames. ], batch size: 107, lr: 2.03e-02, grad_scale: 16.0 2024-09-17 03:11:32,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=91849.33333333333, ans=0.0 2024-09-17 03:11:37,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91896.0, ans=0.1 2024-09-17 03:11:52,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=91942.66666666667, ans=12.0 2024-09-17 03:12:40,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=92082.66666666667, ans=0.125 2024-09-17 03:12:42,335 INFO [train.py:1198] (1/2) Epoch 6, batch 350, loss[loss=0.2793, simple_loss=0.2982, pruned_loss=0.1023, ctc_loss=0.1965, cr_loss=0.4131, over 34724.00 frames. ], tot_loss[loss=0.3059, simple_loss=0.3271, pruned_loss=0.1116, ctc_loss=0.2146, cr_loss=0.463, over 5599870.92 frames. ], batch size: 84, lr: 2.03e-02, grad_scale: 16.0 2024-09-17 03:12:50,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=92082.66666666667, ans=0.0 2024-09-17 03:13:22,077 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.124e+02 2.881e+02 3.576e+02 5.201e+02 1.165e+03, threshold=7.153e+02, percent-clipped=10.0 2024-09-17 03:13:29,298 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.313e-02 2024-09-17 03:13:32,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=92176.0, ans=0.125 2024-09-17 03:13:32,839 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.97 vs. limit=10.0 2024-09-17 03:13:43,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=92222.66666666667, ans=0.0 2024-09-17 03:13:51,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=92269.33333333333, ans=0.125 2024-09-17 03:14:07,740 INFO [train.py:1198] (1/2) Epoch 6, batch 400, loss[loss=0.3127, simple_loss=0.3394, pruned_loss=0.1117, ctc_loss=0.2173, cr_loss=0.4765, over 34452.00 frames. ], tot_loss[loss=0.3048, simple_loss=0.3263, pruned_loss=0.1111, ctc_loss=0.2137, cr_loss=0.4621, over 5866778.43 frames. ], batch size: 95, lr: 2.03e-02, grad_scale: 32.0 2024-09-17 03:14:29,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=92362.66666666667, ans=0.125 2024-09-17 03:14:29,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=92362.66666666667, ans=0.2 2024-09-17 03:14:49,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=92409.33333333333, ans=0.125 2024-09-17 03:15:27,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=92502.66666666667, ans=0.95 2024-09-17 03:15:30,683 INFO [train.py:1198] (1/2) Epoch 6, batch 450, loss[loss=0.3092, simple_loss=0.3308, pruned_loss=0.1125, ctc_loss=0.2159, cr_loss=0.4841, over 34729.00 frames. ], tot_loss[loss=0.3049, simple_loss=0.3262, pruned_loss=0.1111, ctc_loss=0.2138, cr_loss=0.4626, over 6056209.07 frames. ], batch size: 97, lr: 2.03e-02, grad_scale: 16.0 2024-09-17 03:16:08,717 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.290e+02 3.431e+02 4.341e+02 5.771e+02 8.114e+02, threshold=8.682e+02, percent-clipped=7.0 2024-09-17 03:16:30,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=92689.33333333333, ans=0.125 2024-09-17 03:16:33,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=92689.33333333333, ans=0.125 2024-09-17 03:16:54,777 INFO [train.py:1198] (1/2) Epoch 6, batch 500, loss[loss=0.3516, simple_loss=0.365, pruned_loss=0.1336, ctc_loss=0.252, cr_loss=0.5134, over 34495.00 frames. ], tot_loss[loss=0.3032, simple_loss=0.3251, pruned_loss=0.1102, ctc_loss=0.2122, cr_loss=0.4605, over 6221486.59 frames. ], batch size: 110, lr: 2.02e-02, grad_scale: 16.0 2024-09-17 03:17:07,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=92782.66666666667, ans=0.0 2024-09-17 03:17:37,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=92876.0, ans=0.125 2024-09-17 03:17:47,812 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:18:18,287 INFO [train.py:1198] (1/2) Epoch 6, batch 550, loss[loss=0.3202, simple_loss=0.343, pruned_loss=0.1168, ctc_loss=0.223, cr_loss=0.4802, over 33870.00 frames. ], tot_loss[loss=0.3037, simple_loss=0.3254, pruned_loss=0.1105, ctc_loss=0.2126, cr_loss=0.4614, over 6331669.10 frames. ], batch size: 122, lr: 2.02e-02, grad_scale: 16.0 2024-09-17 03:18:18,587 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:18:51,843 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:18:56,034 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.343e+02 3.025e+02 3.706e+02 4.925e+02 7.747e+02, threshold=7.412e+02, percent-clipped=0.0 2024-09-17 03:19:03,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=93109.33333333333, ans=0.0 2024-09-17 03:19:09,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=93156.0, ans=0.07 2024-09-17 03:19:13,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.98 vs. limit=15.0 2024-09-17 03:19:14,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=93156.0, ans=0.1 2024-09-17 03:19:29,866 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-09-17 03:19:40,492 INFO [train.py:1198] (1/2) Epoch 6, batch 600, loss[loss=0.3208, simple_loss=0.344, pruned_loss=0.1171, ctc_loss=0.2246, cr_loss=0.4599, over 34223.00 frames. ], tot_loss[loss=0.3033, simple_loss=0.3252, pruned_loss=0.1103, ctc_loss=0.2122, cr_loss=0.4612, over 6432862.63 frames. ], batch size: 117, lr: 2.02e-02, grad_scale: 16.0 2024-09-17 03:20:03,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=93296.0, ans=0.025 2024-09-17 03:20:37,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=93389.33333333333, ans=0.1 2024-09-17 03:20:37,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=15.0 2024-09-17 03:20:41,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=93389.33333333333, ans=0.125 2024-09-17 03:20:43,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=93389.33333333333, ans=0.125 2024-09-17 03:20:45,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=93389.33333333333, ans=0.0 2024-09-17 03:20:48,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=93389.33333333333, ans=0.035 2024-09-17 03:21:04,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=93436.0, ans=0.1 2024-09-17 03:21:11,919 INFO [train.py:1198] (1/2) Epoch 6, batch 650, loss[loss=0.3022, simple_loss=0.3242, pruned_loss=0.1094, ctc_loss=0.2128, cr_loss=0.4724, over 34546.00 frames. ], tot_loss[loss=0.3015, simple_loss=0.3239, pruned_loss=0.1093, ctc_loss=0.2106, cr_loss=0.4584, over 6523870.70 frames. ], batch size: 94, lr: 2.02e-02, grad_scale: 16.0 2024-09-17 03:21:12,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=93482.66666666667, ans=0.2 2024-09-17 03:21:23,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=93482.66666666667, ans=0.2 2024-09-17 03:21:28,473 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:21:35,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=93529.33333333333, ans=22.5 2024-09-17 03:21:46,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=93576.0, ans=0.125 2024-09-17 03:21:49,512 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.183e+02 2.742e+02 3.383e+02 4.660e+02 8.187e+02, threshold=6.765e+02, percent-clipped=1.0 2024-09-17 03:21:54,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=93576.0, ans=0.1 2024-09-17 03:22:02,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=93622.66666666667, ans=0.125 2024-09-17 03:22:11,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=93622.66666666667, ans=0.04949747468305833 2024-09-17 03:22:12,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=93622.66666666667, ans=0.125 2024-09-17 03:22:22,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=93669.33333333333, ans=0.1 2024-09-17 03:22:33,359 INFO [train.py:1198] (1/2) Epoch 6, batch 700, loss[loss=0.2893, simple_loss=0.3085, pruned_loss=0.1056, ctc_loss=0.1991, cr_loss=0.4764, over 34593.00 frames. ], tot_loss[loss=0.3018, simple_loss=0.3242, pruned_loss=0.1094, ctc_loss=0.2107, cr_loss=0.459, over 6579012.84 frames. ], batch size: 89, lr: 2.02e-02, grad_scale: 16.0 2024-09-17 03:22:46,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=93716.0, ans=0.125 2024-09-17 03:22:49,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=93762.66666666667, ans=0.0 2024-09-17 03:23:06,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=93809.33333333333, ans=0.125 2024-09-17 03:23:07,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93809.33333333333, ans=0.1 2024-09-17 03:23:21,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=93856.0, ans=0.125 2024-09-17 03:23:23,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.89 vs. limit=15.0 2024-09-17 03:23:40,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=15.0 2024-09-17 03:23:44,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=93902.66666666667, ans=0.125 2024-09-17 03:23:55,211 INFO [train.py:1198] (1/2) Epoch 6, batch 750, loss[loss=0.3119, simple_loss=0.329, pruned_loss=0.1151, ctc_loss=0.2213, cr_loss=0.5055, over 34383.00 frames. ], tot_loss[loss=0.3015, simple_loss=0.3239, pruned_loss=0.1094, ctc_loss=0.2105, cr_loss=0.4587, over 6622751.29 frames. ], batch size: 95, lr: 2.01e-02, grad_scale: 16.0 2024-09-17 03:24:18,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=93996.0, ans=0.025 2024-09-17 03:24:34,354 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.175e+02 3.026e+02 4.026e+02 5.668e+02 1.151e+03, threshold=8.051e+02, percent-clipped=10.0 2024-09-17 03:24:48,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=94089.33333333333, ans=0.02 2024-09-17 03:24:53,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=94089.33333333333, ans=0.125 2024-09-17 03:25:09,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=94136.0, ans=0.2 2024-09-17 03:25:12,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=94136.0, ans=0.0 2024-09-17 03:25:17,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff2.min_abs, batch_count=94136.0, ans=0.1 2024-09-17 03:25:20,510 INFO [train.py:1198] (1/2) Epoch 6, batch 800, loss[loss=0.275, simple_loss=0.3016, pruned_loss=0.097, ctc_loss=0.1871, cr_loss=0.4258, over 34459.00 frames. ], tot_loss[loss=0.3016, simple_loss=0.324, pruned_loss=0.1094, ctc_loss=0.2105, cr_loss=0.4588, over 6658886.48 frames. ], batch size: 85, lr: 2.01e-02, grad_scale: 32.0 2024-09-17 03:25:28,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=94182.66666666667, ans=0.125 2024-09-17 03:25:29,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=94182.66666666667, ans=0.07 2024-09-17 03:25:48,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=94229.33333333333, ans=0.125 2024-09-17 03:25:54,368 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=5.54 vs. limit=12.0 2024-09-17 03:26:08,597 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:26:42,291 INFO [train.py:1198] (1/2) Epoch 6, batch 850, loss[loss=0.3028, simple_loss=0.3279, pruned_loss=0.109, ctc_loss=0.2098, cr_loss=0.4389, over 34401.00 frames. ], tot_loss[loss=0.3013, simple_loss=0.3236, pruned_loss=0.1093, ctc_loss=0.2103, cr_loss=0.4586, over 6690626.77 frames. ], batch size: 103, lr: 2.01e-02, grad_scale: 32.0 2024-09-17 03:26:58,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=94462.66666666667, ans=10.0 2024-09-17 03:27:05,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=94462.66666666667, ans=0.2 2024-09-17 03:27:10,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=94462.66666666667, ans=0.1 2024-09-17 03:27:17,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.66 vs. limit=15.0 2024-09-17 03:27:19,718 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.286e+02 2.902e+02 3.667e+02 5.549e+02 9.277e+02, threshold=7.334e+02, percent-clipped=4.0 2024-09-17 03:27:29,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=94556.0, ans=0.0 2024-09-17 03:27:31,844 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2024-09-17 03:27:38,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=94556.0, ans=0.5 2024-09-17 03:27:44,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=94556.0, ans=0.125 2024-09-17 03:27:51,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=94602.66666666667, ans=0.0 2024-09-17 03:27:56,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=94602.66666666667, ans=0.0 2024-09-17 03:28:05,265 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.81 vs. limit=15.0 2024-09-17 03:28:05,877 INFO [train.py:1198] (1/2) Epoch 6, batch 900, loss[loss=0.2609, simple_loss=0.2879, pruned_loss=0.09053, ctc_loss=0.1789, cr_loss=0.4253, over 34476.00 frames. ], tot_loss[loss=0.3021, simple_loss=0.3243, pruned_loss=0.1096, ctc_loss=0.211, cr_loss=0.4595, over 6696799.15 frames. ], batch size: 85, lr: 2.01e-02, grad_scale: 16.0 2024-09-17 03:28:13,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=94649.33333333333, ans=22.5 2024-09-17 03:28:24,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=94696.0, ans=0.0 2024-09-17 03:29:29,324 INFO [train.py:1198] (1/2) Epoch 6, batch 950, loss[loss=0.2758, simple_loss=0.3001, pruned_loss=0.09823, ctc_loss=0.1911, cr_loss=0.4215, over 34683.00 frames. ], tot_loss[loss=0.3023, simple_loss=0.3246, pruned_loss=0.1097, ctc_loss=0.2111, cr_loss=0.4598, over 6701512.27 frames. ], batch size: 87, lr: 2.00e-02, grad_scale: 16.0 2024-09-17 03:30:03,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=94976.0, ans=22.5 2024-09-17 03:30:08,507 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.191e+02 3.024e+02 3.540e+02 4.184e+02 7.081e+02, threshold=7.081e+02, percent-clipped=0.0 2024-09-17 03:30:17,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=95022.66666666667, ans=0.125 2024-09-17 03:30:26,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=95022.66666666667, ans=0.2 2024-09-17 03:30:46,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=95069.33333333333, ans=0.0 2024-09-17 03:30:46,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=95069.33333333333, ans=0.07 2024-09-17 03:30:50,813 INFO [train.py:1198] (1/2) Epoch 6, batch 1000, loss[loss=0.296, simple_loss=0.3146, pruned_loss=0.1088, ctc_loss=0.2075, cr_loss=0.4533, over 34494.00 frames. ], tot_loss[loss=0.3035, simple_loss=0.3255, pruned_loss=0.1103, ctc_loss=0.2121, cr_loss=0.4613, over 6694775.00 frames. ], batch size: 90, lr: 2.00e-02, grad_scale: 16.0 2024-09-17 03:31:31,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.06 vs. limit=22.5 2024-09-17 03:31:41,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=22.5 2024-09-17 03:31:51,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=95256.0, ans=0.0 2024-09-17 03:31:54,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.82 vs. limit=10.0 2024-09-17 03:32:14,383 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2024-09-17 03:32:16,488 INFO [train.py:1198] (1/2) Epoch 6, batch 1050, loss[loss=0.3027, simple_loss=0.3325, pruned_loss=0.1061, ctc_loss=0.2088, cr_loss=0.4709, over 34559.00 frames. ], tot_loss[loss=0.3022, simple_loss=0.3244, pruned_loss=0.1097, ctc_loss=0.2111, cr_loss=0.4603, over 6703514.44 frames. ], batch size: 99, lr: 2.00e-02, grad_scale: 16.0 2024-09-17 03:32:42,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=95396.0, ans=0.125 2024-09-17 03:32:55,837 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.290e+02 2.885e+02 3.508e+02 4.244e+02 1.057e+03, threshold=7.017e+02, percent-clipped=7.0 2024-09-17 03:33:01,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.73 vs. limit=15.0 2024-09-17 03:33:13,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.33 vs. limit=15.0 2024-09-17 03:33:33,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=95536.0, ans=10.0 2024-09-17 03:33:38,426 INFO [train.py:1198] (1/2) Epoch 6, batch 1100, loss[loss=0.2665, simple_loss=0.2988, pruned_loss=0.09087, ctc_loss=0.179, cr_loss=0.4191, over 34361.00 frames. ], tot_loss[loss=0.302, simple_loss=0.3242, pruned_loss=0.1096, ctc_loss=0.2109, cr_loss=0.4602, over 6716560.44 frames. ], batch size: 91, lr: 2.00e-02, grad_scale: 16.0 2024-09-17 03:33:53,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=95629.33333333333, ans=0.125 2024-09-17 03:33:57,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=12.0 2024-09-17 03:34:11,644 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:34:11,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=95676.0, ans=0.2 2024-09-17 03:34:18,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=95676.0, ans=0.05 2024-09-17 03:34:24,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=95676.0, ans=0.1 2024-09-17 03:34:28,675 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.87 vs. limit=10.0 2024-09-17 03:34:31,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=95722.66666666667, ans=0.025 2024-09-17 03:34:44,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=95769.33333333333, ans=0.125 2024-09-17 03:34:53,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.04 vs. limit=22.5 2024-09-17 03:35:00,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2024-09-17 03:35:00,585 INFO [train.py:1198] (1/2) Epoch 6, batch 1150, loss[loss=0.2975, simple_loss=0.317, pruned_loss=0.1094, ctc_loss=0.2061, cr_loss=0.447, over 34364.00 frames. ], tot_loss[loss=0.3019, simple_loss=0.324, pruned_loss=0.1096, ctc_loss=0.2109, cr_loss=0.4602, over 6714851.45 frames. ], batch size: 91, lr: 2.00e-02, grad_scale: 16.0 2024-09-17 03:35:10,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=95816.0, ans=0.07 2024-09-17 03:35:40,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=95909.33333333333, ans=0.1 2024-09-17 03:35:41,750 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.334e+02 3.150e+02 4.189e+02 5.469e+02 1.072e+03, threshold=8.377e+02, percent-clipped=11.0 2024-09-17 03:35:42,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=95909.33333333333, ans=0.2 2024-09-17 03:36:25,917 INFO [train.py:1198] (1/2) Epoch 6, batch 1200, loss[loss=0.3131, simple_loss=0.3374, pruned_loss=0.1137, ctc_loss=0.2111, cr_loss=0.4768, over 34601.00 frames. ], tot_loss[loss=0.3026, simple_loss=0.3247, pruned_loss=0.1099, ctc_loss=0.2112, cr_loss=0.4603, over 6707916.31 frames. ], batch size: 99, lr: 1.99e-02, grad_scale: 32.0 2024-09-17 03:36:31,675 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.94 vs. limit=22.5 2024-09-17 03:36:47,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=96096.0, ans=0.125 2024-09-17 03:37:01,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=96142.66666666667, ans=0.125 2024-09-17 03:37:14,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=96189.33333333333, ans=0.2 2024-09-17 03:37:17,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=96189.33333333333, ans=0.125 2024-09-17 03:37:25,675 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:37:48,216 INFO [train.py:1198] (1/2) Epoch 6, batch 1250, loss[loss=0.3039, simple_loss=0.3305, pruned_loss=0.1084, ctc_loss=0.2063, cr_loss=0.483, over 34391.00 frames. ], tot_loss[loss=0.3026, simple_loss=0.325, pruned_loss=0.1097, ctc_loss=0.2111, cr_loss=0.4614, over 6741230.15 frames. ], batch size: 107, lr: 1.99e-02, grad_scale: 32.0 2024-09-17 03:38:19,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=96376.0, ans=0.0 2024-09-17 03:38:24,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=96376.0, ans=0.125 2024-09-17 03:38:26,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=96376.0, ans=0.0 2024-09-17 03:38:27,554 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.219e+02 2.641e+02 3.000e+02 3.606e+02 7.454e+02, threshold=5.999e+02, percent-clipped=0.0 2024-09-17 03:38:30,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.20 vs. limit=15.0 2024-09-17 03:38:38,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2024-09-17 03:39:07,578 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.35 vs. limit=15.0 2024-09-17 03:39:11,520 INFO [train.py:1198] (1/2) Epoch 6, batch 1300, loss[loss=0.3029, simple_loss=0.3303, pruned_loss=0.108, ctc_loss=0.2071, cr_loss=0.4509, over 33001.00 frames. ], tot_loss[loss=0.3018, simple_loss=0.3244, pruned_loss=0.1093, ctc_loss=0.2104, cr_loss=0.4607, over 6745449.60 frames. ], batch size: 130, lr: 1.99e-02, grad_scale: 32.0 2024-09-17 03:39:24,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=96516.0, ans=0.0 2024-09-17 03:39:26,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=96562.66666666667, ans=0.125 2024-09-17 03:39:30,631 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.70 vs. limit=15.0 2024-09-17 03:39:37,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=96562.66666666667, ans=0.95 2024-09-17 03:39:50,172 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:40:01,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=96656.0, ans=0.0 2024-09-17 03:40:03,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=96656.0, ans=0.1 2024-09-17 03:40:35,770 INFO [train.py:1198] (1/2) Epoch 6, batch 1350, loss[loss=0.2901, simple_loss=0.3182, pruned_loss=0.1018, ctc_loss=0.2007, cr_loss=0.4577, over 34556.00 frames. ], tot_loss[loss=0.3004, simple_loss=0.3233, pruned_loss=0.1086, ctc_loss=0.2092, cr_loss=0.4598, over 6765180.87 frames. ], batch size: 94, lr: 1.99e-02, grad_scale: 32.0 2024-09-17 03:40:39,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.39 vs. limit=15.0 2024-09-17 03:40:52,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=96796.0, ans=0.1 2024-09-17 03:41:03,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=96796.0, ans=0.1 2024-09-17 03:41:13,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=96842.66666666667, ans=0.125 2024-09-17 03:41:14,608 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.282e+02 3.352e+02 4.074e+02 5.454e+02 9.912e+02, threshold=8.149e+02, percent-clipped=22.0 2024-09-17 03:41:18,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=96842.66666666667, ans=0.125 2024-09-17 03:41:23,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=96889.33333333333, ans=0.07 2024-09-17 03:41:26,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=96889.33333333333, ans=0.0 2024-09-17 03:41:33,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=96889.33333333333, ans=0.07 2024-09-17 03:41:33,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=96889.33333333333, ans=0.125 2024-09-17 03:41:37,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=96889.33333333333, ans=0.125 2024-09-17 03:41:47,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=96936.0, ans=0.125 2024-09-17 03:41:57,114 INFO [train.py:1198] (1/2) Epoch 6, batch 1400, loss[loss=0.2561, simple_loss=0.2836, pruned_loss=0.08852, ctc_loss=0.1724, cr_loss=0.4282, over 34290.00 frames. ], tot_loss[loss=0.2997, simple_loss=0.3227, pruned_loss=0.1083, ctc_loss=0.2086, cr_loss=0.4591, over 6777154.68 frames. ], batch size: 80, lr: 1.99e-02, grad_scale: 32.0 2024-09-17 03:42:08,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=96982.66666666667, ans=0.0 2024-09-17 03:42:12,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=97029.33333333333, ans=0.125 2024-09-17 03:42:14,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.89 vs. limit=15.0 2024-09-17 03:42:15,367 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:42:57,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=97122.66666666667, ans=0.125 2024-09-17 03:43:08,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=97169.33333333333, ans=0.125 2024-09-17 03:43:10,917 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=12.0 2024-09-17 03:43:21,204 INFO [train.py:1198] (1/2) Epoch 6, batch 1450, loss[loss=0.3375, simple_loss=0.3589, pruned_loss=0.1242, ctc_loss=0.2347, cr_loss=0.5199, over 34445.00 frames. ], tot_loss[loss=0.3004, simple_loss=0.3236, pruned_loss=0.1085, ctc_loss=0.2093, cr_loss=0.4602, over 6774068.95 frames. ], batch size: 110, lr: 1.98e-02, grad_scale: 32.0 2024-09-17 03:43:29,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=97216.0, ans=0.2 2024-09-17 03:43:38,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=97262.66666666667, ans=0.09899494936611666 2024-09-17 03:43:38,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=97262.66666666667, ans=0.1 2024-09-17 03:43:56,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2024-09-17 03:44:02,318 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.350e+02 3.052e+02 3.892e+02 4.589e+02 7.707e+02, threshold=7.784e+02, percent-clipped=0.0 2024-09-17 03:44:09,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=97309.33333333333, ans=0.0 2024-09-17 03:44:44,469 INFO [train.py:1198] (1/2) Epoch 6, batch 1500, loss[loss=0.3078, simple_loss=0.3324, pruned_loss=0.1104, ctc_loss=0.2139, cr_loss=0.49, over 34411.00 frames. ], tot_loss[loss=0.3014, simple_loss=0.3245, pruned_loss=0.109, ctc_loss=0.2101, cr_loss=0.4607, over 6773825.07 frames. ], batch size: 100, lr: 1.98e-02, grad_scale: 32.0 2024-09-17 03:44:57,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=15.0 2024-09-17 03:44:59,711 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:44:59,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=97496.0, ans=0.125 2024-09-17 03:45:07,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2024-09-17 03:45:21,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=12.0 2024-09-17 03:45:22,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=97542.66666666667, ans=0.125 2024-09-17 03:45:34,730 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=12.11 vs. limit=15.0 2024-09-17 03:45:40,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=97589.33333333333, ans=0.125 2024-09-17 03:45:42,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=97589.33333333333, ans=0.125 2024-09-17 03:46:03,656 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:46:06,486 INFO [train.py:1198] (1/2) Epoch 6, batch 1550, loss[loss=0.3211, simple_loss=0.3449, pruned_loss=0.1163, ctc_loss=0.2249, cr_loss=0.4945, over 34440.00 frames. ], tot_loss[loss=0.3019, simple_loss=0.3245, pruned_loss=0.1094, ctc_loss=0.2107, cr_loss=0.4613, over 6745268.02 frames. ], batch size: 105, lr: 1.98e-02, grad_scale: 32.0 2024-09-17 03:46:49,004 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.289e+02 2.816e+02 3.589e+02 4.747e+02 8.492e+02, threshold=7.178e+02, percent-clipped=1.0 2024-09-17 03:47:02,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=97822.66666666667, ans=0.125 2024-09-17 03:47:02,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=97822.66666666667, ans=0.1 2024-09-17 03:47:31,924 INFO [train.py:1198] (1/2) Epoch 6, batch 1600, loss[loss=0.2997, simple_loss=0.3247, pruned_loss=0.1076, ctc_loss=0.2103, cr_loss=0.4401, over 34553.00 frames. ], tot_loss[loss=0.3022, simple_loss=0.3244, pruned_loss=0.1097, ctc_loss=0.2111, cr_loss=0.4616, over 6725055.94 frames. ], batch size: 99, lr: 1.98e-02, grad_scale: 32.0 2024-09-17 03:47:48,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=97962.66666666667, ans=0.0 2024-09-17 03:47:50,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=97962.66666666667, ans=0.125 2024-09-17 03:48:10,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=98009.33333333333, ans=0.125 2024-09-17 03:48:10,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=98009.33333333333, ans=0.2 2024-09-17 03:48:15,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=98009.33333333333, ans=0.0 2024-09-17 03:48:44,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=98102.66666666667, ans=0.0 2024-09-17 03:48:49,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=98102.66666666667, ans=0.0 2024-09-17 03:48:54,074 INFO [train.py:1198] (1/2) Epoch 6, batch 1650, loss[loss=0.3081, simple_loss=0.3363, pruned_loss=0.1089, ctc_loss=0.2158, cr_loss=0.4693, over 34375.00 frames. ], tot_loss[loss=0.302, simple_loss=0.3244, pruned_loss=0.1095, ctc_loss=0.2109, cr_loss=0.4606, over 6717582.13 frames. ], batch size: 103, lr: 1.97e-02, grad_scale: 32.0 2024-09-17 03:48:58,050 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:49:00,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=98149.33333333333, ans=15.0 2024-09-17 03:49:07,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=98149.33333333333, ans=0.125 2024-09-17 03:49:22,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=98196.0, ans=0.125 2024-09-17 03:49:34,896 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.301e+02 2.926e+02 3.426e+02 4.209e+02 7.137e+02, threshold=6.853e+02, percent-clipped=0.0 2024-09-17 03:49:51,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=98289.33333333333, ans=0.125 2024-09-17 03:49:59,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=98336.0, ans=0.025 2024-09-17 03:50:17,154 INFO [train.py:1198] (1/2) Epoch 6, batch 1700, loss[loss=0.2481, simple_loss=0.281, pruned_loss=0.08293, ctc_loss=0.1664, cr_loss=0.4007, over 34282.00 frames. ], tot_loss[loss=0.3004, simple_loss=0.3234, pruned_loss=0.1086, ctc_loss=0.2093, cr_loss=0.4596, over 6743809.90 frames. ], batch size: 80, lr: 1.97e-02, grad_scale: 32.0 2024-09-17 03:50:19,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-09-17 03:50:23,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=98382.66666666667, ans=0.0 2024-09-17 03:50:36,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.40 vs. limit=10.0 2024-09-17 03:50:47,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.85 vs. limit=15.0 2024-09-17 03:50:50,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=98476.0, ans=0.125 2024-09-17 03:51:40,511 INFO [train.py:1198] (1/2) Epoch 6, batch 1750, loss[loss=0.2589, simple_loss=0.2836, pruned_loss=0.09059, ctc_loss=0.1779, cr_loss=0.4375, over 34141.00 frames. ], tot_loss[loss=0.3, simple_loss=0.323, pruned_loss=0.1084, ctc_loss=0.2089, cr_loss=0.4592, over 6753219.95 frames. ], batch size: 78, lr: 1.97e-02, grad_scale: 32.0 2024-09-17 03:52:08,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=98662.66666666667, ans=0.0 2024-09-17 03:52:21,404 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.358e+02 2.734e+02 3.082e+02 3.988e+02 6.902e+02, threshold=6.164e+02, percent-clipped=1.0 2024-09-17 03:52:28,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=98756.0, ans=10.0 2024-09-17 03:52:54,425 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=15.0 2024-09-17 03:52:54,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.93 vs. limit=15.0 2024-09-17 03:53:01,617 INFO [train.py:1198] (1/2) Epoch 6, batch 1800, loss[loss=0.3112, simple_loss=0.3359, pruned_loss=0.1129, ctc_loss=0.2102, cr_loss=0.4675, over 34705.00 frames. ], tot_loss[loss=0.2999, simple_loss=0.323, pruned_loss=0.1083, ctc_loss=0.2086, cr_loss=0.4592, over 6755871.12 frames. ], batch size: 97, lr: 1.97e-02, grad_scale: 32.0 2024-09-17 03:53:05,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2024-09-17 03:53:53,730 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2024-09-17 03:54:06,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=98989.33333333333, ans=0.02 2024-09-17 03:54:15,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=12.0 2024-09-17 03:54:20,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=99036.0, ans=0.125 2024-09-17 03:54:25,497 INFO [train.py:1198] (1/2) Epoch 6, batch 1850, loss[loss=0.2937, simple_loss=0.3244, pruned_loss=0.1029, ctc_loss=0.2013, cr_loss=0.4237, over 34439.00 frames. ], tot_loss[loss=0.2996, simple_loss=0.3229, pruned_loss=0.1081, ctc_loss=0.2083, cr_loss=0.4592, over 6764253.41 frames. ], batch size: 100, lr: 1.97e-02, grad_scale: 32.0 2024-09-17 03:54:42,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=99129.33333333333, ans=0.125 2024-09-17 03:54:42,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=99129.33333333333, ans=0.2 2024-09-17 03:54:42,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=22.5 2024-09-17 03:54:45,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=99129.33333333333, ans=0.035 2024-09-17 03:54:45,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=99129.33333333333, ans=0.125 2024-09-17 03:54:57,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=99129.33333333333, ans=0.125 2024-09-17 03:55:08,561 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.164e+02 3.060e+02 3.603e+02 4.419e+02 8.329e+02, threshold=7.205e+02, percent-clipped=10.0 2024-09-17 03:55:08,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99176.0, ans=0.1 2024-09-17 03:55:28,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=99222.66666666667, ans=0.07 2024-09-17 03:55:36,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99269.33333333333, ans=0.1 2024-09-17 03:55:44,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=99269.33333333333, ans=0.025 2024-09-17 03:55:49,020 INFO [train.py:1198] (1/2) Epoch 6, batch 1900, loss[loss=0.2975, simple_loss=0.3309, pruned_loss=0.103, ctc_loss=0.2052, cr_loss=0.4309, over 34388.00 frames. ], tot_loss[loss=0.2999, simple_loss=0.3233, pruned_loss=0.1082, ctc_loss=0.2083, cr_loss=0.4593, over 6773472.88 frames. ], batch size: 103, lr: 1.96e-02, grad_scale: 32.0 2024-09-17 03:55:55,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=99316.0, ans=0.125 2024-09-17 03:56:36,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=99456.0, ans=0.07 2024-09-17 03:56:38,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.68 vs. limit=15.0 2024-09-17 03:56:45,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=99456.0, ans=0.0 2024-09-17 03:56:53,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=99502.66666666667, ans=0.0 2024-09-17 03:57:12,412 INFO [train.py:1198] (1/2) Epoch 6, batch 1950, loss[loss=0.3041, simple_loss=0.3236, pruned_loss=0.1112, ctc_loss=0.2162, cr_loss=0.4767, over 34379.00 frames. ], tot_loss[loss=0.301, simple_loss=0.3245, pruned_loss=0.1086, ctc_loss=0.2091, cr_loss=0.461, over 6790662.74 frames. ], batch size: 91, lr: 1.96e-02, grad_scale: 32.0 2024-09-17 03:57:12,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=99549.33333333333, ans=0.1 2024-09-17 03:57:12,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=99549.33333333333, ans=0.1 2024-09-17 03:57:23,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.33 vs. limit=15.0 2024-09-17 03:57:27,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=99596.0, ans=0.2 2024-09-17 03:57:39,097 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 03:57:47,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=99642.66666666667, ans=15.0 2024-09-17 03:57:53,464 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.307e+02 2.952e+02 3.772e+02 4.715e+02 7.751e+02, threshold=7.544e+02, percent-clipped=1.0 2024-09-17 03:58:16,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=99736.0, ans=0.1 2024-09-17 03:58:36,336 INFO [train.py:1198] (1/2) Epoch 6, batch 2000, loss[loss=0.273, simple_loss=0.2941, pruned_loss=0.09829, ctc_loss=0.1894, cr_loss=0.4379, over 34144.00 frames. ], tot_loss[loss=0.3022, simple_loss=0.3253, pruned_loss=0.1093, ctc_loss=0.2104, cr_loss=0.4616, over 6765063.62 frames. ], batch size: 78, lr: 1.96e-02, grad_scale: 32.0 2024-09-17 03:58:40,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=99782.66666666667, ans=0.125 2024-09-17 03:58:43,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=99782.66666666667, ans=0.2 2024-09-17 03:59:37,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=99922.66666666667, ans=0.0 2024-09-17 03:59:37,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=12.0 2024-09-17 03:59:42,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=99969.33333333333, ans=0.07 2024-09-17 03:59:57,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=100016.0, ans=0.125 2024-09-17 03:59:58,581 INFO [train.py:1198] (1/2) Epoch 6, batch 2050, loss[loss=0.2576, simple_loss=0.2887, pruned_loss=0.08795, ctc_loss=0.1717, cr_loss=0.4089, over 34484.00 frames. ], tot_loss[loss=0.301, simple_loss=0.324, pruned_loss=0.1088, ctc_loss=0.2093, cr_loss=0.4591, over 6755824.17 frames. ], batch size: 82, lr: 1.96e-02, grad_scale: 32.0 2024-09-17 03:59:59,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.82 vs. limit=12.0 2024-09-17 04:00:03,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=100016.0, ans=0.0 2024-09-17 04:00:07,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=100016.0, ans=0.0 2024-09-17 04:00:07,992 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2024-09-17 04:00:21,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=100062.66666666667, ans=0.125 2024-09-17 04:00:26,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=100062.66666666667, ans=0.1 2024-09-17 04:00:27,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.96 vs. limit=15.0 2024-09-17 04:00:39,614 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.226e+02 2.774e+02 3.326e+02 4.533e+02 7.823e+02, threshold=6.652e+02, percent-clipped=1.0 2024-09-17 04:00:43,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=100109.33333333333, ans=0.09899494936611666 2024-09-17 04:00:57,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=100156.0, ans=0.1 2024-09-17 04:00:59,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=100156.0, ans=0.0 2024-09-17 04:01:02,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=100156.0, ans=0.09899494936611666 2024-09-17 04:01:11,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=100202.66666666667, ans=0.125 2024-09-17 04:01:12,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=100202.66666666667, ans=0.2 2024-09-17 04:01:21,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=100249.33333333333, ans=0.0 2024-09-17 04:01:22,242 INFO [train.py:1198] (1/2) Epoch 6, batch 2100, loss[loss=0.2986, simple_loss=0.3239, pruned_loss=0.1072, ctc_loss=0.2026, cr_loss=0.4593, over 34538.00 frames. ], tot_loss[loss=0.3001, simple_loss=0.3232, pruned_loss=0.1084, ctc_loss=0.2086, cr_loss=0.4589, over 6770939.25 frames. ], batch size: 94, lr: 1.96e-02, grad_scale: 32.0 2024-09-17 04:01:30,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=100249.33333333333, ans=0.125 2024-09-17 04:01:53,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=100342.66666666667, ans=0.125 2024-09-17 04:02:14,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=100389.33333333333, ans=0.015 2024-09-17 04:02:42,728 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:02:45,672 INFO [train.py:1198] (1/2) Epoch 6, batch 2150, loss[loss=0.2827, simple_loss=0.308, pruned_loss=0.1001, ctc_loss=0.196, cr_loss=0.4504, over 34347.00 frames. ], tot_loss[loss=0.298, simple_loss=0.3217, pruned_loss=0.1073, ctc_loss=0.2068, cr_loss=0.4574, over 6789479.40 frames. ], batch size: 91, lr: 1.95e-02, grad_scale: 32.0 2024-09-17 04:02:55,280 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.52 vs. limit=15.0 2024-09-17 04:03:28,628 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.287e+02 2.743e+02 3.612e+02 4.900e+02 7.698e+02, threshold=7.224e+02, percent-clipped=4.0 2024-09-17 04:03:46,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=100622.66666666667, ans=0.125 2024-09-17 04:03:55,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=100669.33333333333, ans=0.0 2024-09-17 04:04:00,472 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2024-09-17 04:04:04,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=100669.33333333333, ans=10.0 2024-09-17 04:04:07,789 INFO [train.py:1198] (1/2) Epoch 6, batch 2200, loss[loss=0.2957, simple_loss=0.3218, pruned_loss=0.1061, ctc_loss=0.1973, cr_loss=0.4483, over 34459.00 frames. ], tot_loss[loss=0.2987, simple_loss=0.3222, pruned_loss=0.1077, ctc_loss=0.2073, cr_loss=0.4577, over 6784052.66 frames. ], batch size: 100, lr: 1.95e-02, grad_scale: 16.0 2024-09-17 04:04:24,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2024-09-17 04:04:49,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.max_positive, batch_count=100809.33333333333, ans=0.95 2024-09-17 04:04:51,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=100809.33333333333, ans=0.0 2024-09-17 04:04:52,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=100809.33333333333, ans=0.125 2024-09-17 04:05:00,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=100856.0, ans=0.125 2024-09-17 04:05:04,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=100856.0, ans=0.125 2024-09-17 04:05:22,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=100902.66666666667, ans=0.0 2024-09-17 04:05:26,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=100902.66666666667, ans=10.0 2024-09-17 04:05:30,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=100949.33333333333, ans=0.125 2024-09-17 04:05:31,343 INFO [train.py:1198] (1/2) Epoch 6, batch 2250, loss[loss=0.3115, simple_loss=0.3319, pruned_loss=0.1135, ctc_loss=0.2227, cr_loss=0.4886, over 34404.00 frames. ], tot_loss[loss=0.299, simple_loss=0.3224, pruned_loss=0.1079, ctc_loss=0.2078, cr_loss=0.4576, over 6781279.93 frames. ], batch size: 95, lr: 1.95e-02, grad_scale: 16.0 2024-09-17 04:05:38,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=100949.33333333333, ans=0.125 2024-09-17 04:05:52,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=100996.0, ans=0.0 2024-09-17 04:06:07,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=101042.66666666667, ans=0.125 2024-09-17 04:06:13,374 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.129e+02 3.030e+02 3.705e+02 4.870e+02 9.741e+02, threshold=7.409e+02, percent-clipped=7.0 2024-09-17 04:06:30,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=101089.33333333333, ans=0.125 2024-09-17 04:06:30,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=101089.33333333333, ans=0.1 2024-09-17 04:06:51,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=101136.0, ans=0.1 2024-09-17 04:06:54,678 INFO [train.py:1198] (1/2) Epoch 6, batch 2300, loss[loss=0.263, simple_loss=0.2915, pruned_loss=0.09107, ctc_loss=0.176, cr_loss=0.4308, over 34296.00 frames. ], tot_loss[loss=0.2975, simple_loss=0.3209, pruned_loss=0.1072, ctc_loss=0.2067, cr_loss=0.4555, over 6766740.71 frames. ], batch size: 83, lr: 1.95e-02, grad_scale: 16.0 2024-09-17 04:06:59,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=101182.66666666667, ans=0.125 2024-09-17 04:07:16,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=101229.33333333333, ans=0.025 2024-09-17 04:07:22,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=101229.33333333333, ans=0.0 2024-09-17 04:07:29,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=101276.0, ans=0.2 2024-09-17 04:07:34,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=101276.0, ans=0.125 2024-09-17 04:07:41,721 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2024-09-17 04:07:45,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=101322.66666666667, ans=0.0 2024-09-17 04:07:51,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.89 vs. limit=22.5 2024-09-17 04:08:08,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=101369.33333333333, ans=0.1 2024-09-17 04:08:16,389 INFO [train.py:1198] (1/2) Epoch 6, batch 2350, loss[loss=0.3084, simple_loss=0.3368, pruned_loss=0.1091, ctc_loss=0.2159, cr_loss=0.4654, over 34693.00 frames. ], tot_loss[loss=0.2973, simple_loss=0.3209, pruned_loss=0.1071, ctc_loss=0.2065, cr_loss=0.4563, over 6773148.87 frames. ], batch size: 97, lr: 1.95e-02, grad_scale: 16.0 2024-09-17 04:08:17,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.80 vs. limit=22.5 2024-09-17 04:08:44,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=101462.66666666667, ans=0.125 2024-09-17 04:09:00,206 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.217e+02 3.134e+02 4.013e+02 5.188e+02 1.101e+03, threshold=8.026e+02, percent-clipped=9.0 2024-09-17 04:09:29,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.14 vs. limit=6.0 2024-09-17 04:09:39,835 INFO [train.py:1198] (1/2) Epoch 6, batch 2400, loss[loss=0.2752, simple_loss=0.3032, pruned_loss=0.09655, ctc_loss=0.1874, cr_loss=0.4152, over 34590.00 frames. ], tot_loss[loss=0.2982, simple_loss=0.3216, pruned_loss=0.1075, ctc_loss=0.2071, cr_loss=0.4576, over 6777688.16 frames. ], batch size: 89, lr: 1.94e-02, grad_scale: 32.0 2024-09-17 04:09:55,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=101696.0, ans=0.125 2024-09-17 04:09:56,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=101696.0, ans=0.2 2024-09-17 04:10:15,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=101742.66666666667, ans=0.0 2024-09-17 04:10:32,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.10 vs. limit=10.0 2024-09-17 04:10:57,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=101836.0, ans=6.0 2024-09-17 04:11:04,690 INFO [train.py:1198] (1/2) Epoch 6, batch 2450, loss[loss=0.3012, simple_loss=0.3264, pruned_loss=0.1084, ctc_loss=0.2028, cr_loss=0.468, over 34406.00 frames. ], tot_loss[loss=0.2992, simple_loss=0.3225, pruned_loss=0.108, ctc_loss=0.2079, cr_loss=0.458, over 6750893.28 frames. ], batch size: 95, lr: 1.94e-02, grad_scale: 32.0 2024-09-17 04:11:13,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=101882.66666666667, ans=0.0 2024-09-17 04:11:22,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.70 vs. limit=10.0 2024-09-17 04:11:22,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=101929.33333333333, ans=0.025 2024-09-17 04:11:41,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=101976.0, ans=0.0 2024-09-17 04:11:47,202 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.319e+02 2.825e+02 3.402e+02 4.343e+02 6.580e+02, threshold=6.805e+02, percent-clipped=0.0 2024-09-17 04:11:58,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=102022.66666666667, ans=0.07 2024-09-17 04:12:07,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=102022.66666666667, ans=0.125 2024-09-17 04:12:23,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=102069.33333333333, ans=0.125 2024-09-17 04:12:27,958 INFO [train.py:1198] (1/2) Epoch 6, batch 2500, loss[loss=0.3144, simple_loss=0.333, pruned_loss=0.1156, ctc_loss=0.2253, cr_loss=0.4861, over 34467.00 frames. ], tot_loss[loss=0.2991, simple_loss=0.3224, pruned_loss=0.108, ctc_loss=0.2079, cr_loss=0.4586, over 6762073.85 frames. ], batch size: 100, lr: 1.94e-02, grad_scale: 32.0 2024-09-17 04:12:49,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=102162.66666666667, ans=0.125 2024-09-17 04:12:51,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=102162.66666666667, ans=0.125 2024-09-17 04:12:52,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=102162.66666666667, ans=0.1 2024-09-17 04:12:54,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=102162.66666666667, ans=0.125 2024-09-17 04:13:52,244 INFO [train.py:1198] (1/2) Epoch 6, batch 2550, loss[loss=0.2491, simple_loss=0.2817, pruned_loss=0.08378, ctc_loss=0.1679, cr_loss=0.3815, over 34186.00 frames. ], tot_loss[loss=0.2988, simple_loss=0.3224, pruned_loss=0.1077, ctc_loss=0.2075, cr_loss=0.4579, over 6765842.50 frames. ], batch size: 78, lr: 1.94e-02, grad_scale: 32.0 2024-09-17 04:14:16,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=102396.0, ans=0.07 2024-09-17 04:14:34,545 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.188e+02 2.786e+02 3.407e+02 4.560e+02 8.700e+02, threshold=6.815e+02, percent-clipped=6.0 2024-09-17 04:14:34,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=102442.66666666667, ans=0.035 2024-09-17 04:14:46,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2024-09-17 04:15:09,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.92 vs. limit=15.0 2024-09-17 04:15:13,888 INFO [train.py:1198] (1/2) Epoch 6, batch 2600, loss[loss=0.2965, simple_loss=0.3213, pruned_loss=0.1061, ctc_loss=0.2064, cr_loss=0.4565, over 34360.00 frames. ], tot_loss[loss=0.2997, simple_loss=0.3231, pruned_loss=0.1081, ctc_loss=0.2083, cr_loss=0.4589, over 6760919.63 frames. ], batch size: 91, lr: 1.94e-02, grad_scale: 32.0 2024-09-17 04:15:22,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=102582.66666666667, ans=0.125 2024-09-17 04:15:30,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=102629.33333333333, ans=0.125 2024-09-17 04:15:32,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=102629.33333333333, ans=0.2 2024-09-17 04:15:32,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102629.33333333333, ans=0.1 2024-09-17 04:15:40,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=102629.33333333333, ans=0.2 2024-09-17 04:15:43,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=102629.33333333333, ans=0.2 2024-09-17 04:15:50,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=102676.0, ans=0.0 2024-09-17 04:16:12,147 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2024-09-17 04:16:36,796 INFO [train.py:1198] (1/2) Epoch 6, batch 2650, loss[loss=0.3291, simple_loss=0.3502, pruned_loss=0.1217, ctc_loss=0.2284, cr_loss=0.4705, over 34229.00 frames. ], tot_loss[loss=0.3004, simple_loss=0.3238, pruned_loss=0.1084, ctc_loss=0.2089, cr_loss=0.4598, over 6769757.67 frames. ], batch size: 117, lr: 1.93e-02, grad_scale: 32.0 2024-09-17 04:16:45,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102816.0, ans=0.1 2024-09-17 04:17:17,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=102909.33333333333, ans=0.125 2024-09-17 04:17:18,957 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.261e+02 2.805e+02 3.477e+02 4.668e+02 9.341e+02, threshold=6.955e+02, percent-clipped=4.0 2024-09-17 04:17:54,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=103002.66666666667, ans=0.0 2024-09-17 04:17:54,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=103002.66666666667, ans=0.0 2024-09-17 04:18:00,327 INFO [train.py:1198] (1/2) Epoch 6, batch 2700, loss[loss=0.3061, simple_loss=0.3314, pruned_loss=0.1091, ctc_loss=0.2161, cr_loss=0.4863, over 34624.00 frames. ], tot_loss[loss=0.3, simple_loss=0.3235, pruned_loss=0.1082, ctc_loss=0.2085, cr_loss=0.4589, over 6764475.49 frames. ], batch size: 102, lr: 1.93e-02, grad_scale: 32.0 2024-09-17 04:18:04,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.18 vs. limit=6.0 2024-09-17 04:18:09,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=103049.33333333333, ans=0.025 2024-09-17 04:18:15,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=103096.0, ans=0.125 2024-09-17 04:18:45,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=103142.66666666667, ans=0.0 2024-09-17 04:19:08,898 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2024-09-17 04:19:24,104 INFO [train.py:1198] (1/2) Epoch 6, batch 2750, loss[loss=0.3022, simple_loss=0.3191, pruned_loss=0.1125, ctc_loss=0.2109, cr_loss=0.4525, over 34633.00 frames. ], tot_loss[loss=0.2986, simple_loss=0.3221, pruned_loss=0.1077, ctc_loss=0.2075, cr_loss=0.4577, over 6762233.35 frames. ], batch size: 88, lr: 1.93e-02, grad_scale: 32.0 2024-09-17 04:20:06,515 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.173e+02 2.849e+02 3.660e+02 4.550e+02 6.983e+02, threshold=7.319e+02, percent-clipped=1.0 2024-09-17 04:20:21,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=103422.66666666667, ans=0.0 2024-09-17 04:20:30,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.78 vs. limit=15.0 2024-09-17 04:20:43,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=103469.33333333333, ans=0.2 2024-09-17 04:20:46,167 INFO [train.py:1198] (1/2) Epoch 6, batch 2800, loss[loss=0.3674, simple_loss=0.3649, pruned_loss=0.1474, ctc_loss=0.2822, cr_loss=0.4665, over 24128.00 frames. ], tot_loss[loss=0.2987, simple_loss=0.3222, pruned_loss=0.1077, ctc_loss=0.2076, cr_loss=0.4577, over 6739629.69 frames. ], batch size: 246, lr: 1.93e-02, grad_scale: 32.0 2024-09-17 04:20:49,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=103516.0, ans=0.0 2024-09-17 04:21:20,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.16 vs. limit=15.0 2024-09-17 04:21:33,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=103609.33333333333, ans=0.1 2024-09-17 04:21:57,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=103702.66666666667, ans=0.2 2024-09-17 04:22:10,206 INFO [train.py:1198] (1/2) Epoch 6, batch 2850, loss[loss=0.2802, simple_loss=0.3045, pruned_loss=0.1005, ctc_loss=0.1898, cr_loss=0.4263, over 34459.00 frames. ], tot_loss[loss=0.2995, simple_loss=0.3227, pruned_loss=0.1081, ctc_loss=0.2083, cr_loss=0.4586, over 6724009.35 frames. ], batch size: 90, lr: 1.93e-02, grad_scale: 16.0 2024-09-17 04:22:11,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-09-17 04:22:22,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.88 vs. limit=22.5 2024-09-17 04:22:38,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=15.0 2024-09-17 04:22:57,880 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.172e+02 2.881e+02 3.443e+02 4.311e+02 8.605e+02, threshold=6.885e+02, percent-clipped=2.0 2024-09-17 04:23:01,632 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:23:06,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=103889.33333333333, ans=0.125 2024-09-17 04:23:09,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=103889.33333333333, ans=0.0 2024-09-17 04:23:33,628 INFO [train.py:1198] (1/2) Epoch 6, batch 2900, loss[loss=0.2983, simple_loss=0.3214, pruned_loss=0.107, ctc_loss=0.2088, cr_loss=0.4837, over 34531.00 frames. ], tot_loss[loss=0.2997, simple_loss=0.3234, pruned_loss=0.108, ctc_loss=0.208, cr_loss=0.4598, over 6754484.65 frames. ], batch size: 94, lr: 1.92e-02, grad_scale: 16.0 2024-09-17 04:23:34,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=103982.66666666667, ans=0.125 2024-09-17 04:23:35,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=103982.66666666667, ans=0.025 2024-09-17 04:23:36,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.76 vs. limit=15.0 2024-09-17 04:23:37,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=103982.66666666667, ans=0.125 2024-09-17 04:23:56,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.43 vs. limit=6.0 2024-09-17 04:24:23,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=104122.66666666667, ans=0.125 2024-09-17 04:24:26,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=104122.66666666667, ans=0.0 2024-09-17 04:24:34,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=104122.66666666667, ans=0.125 2024-09-17 04:24:49,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=104169.33333333333, ans=0.125 2024-09-17 04:24:55,843 INFO [train.py:1198] (1/2) Epoch 6, batch 2950, loss[loss=0.2808, simple_loss=0.3059, pruned_loss=0.09985, ctc_loss=0.1908, cr_loss=0.4478, over 34619.00 frames. ], tot_loss[loss=0.2982, simple_loss=0.3219, pruned_loss=0.1074, ctc_loss=0.207, cr_loss=0.458, over 6749315.71 frames. ], batch size: 88, lr: 1.92e-02, grad_scale: 16.0 2024-09-17 04:24:56,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=104216.0, ans=0.125 2024-09-17 04:25:43,437 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.279e+02 2.906e+02 3.625e+02 4.366e+02 7.354e+02, threshold=7.251e+02, percent-clipped=2.0 2024-09-17 04:25:50,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=104356.0, ans=0.125 2024-09-17 04:26:10,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.97 vs. limit=22.5 2024-09-17 04:26:14,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=104402.66666666667, ans=0.125 2024-09-17 04:26:19,433 INFO [train.py:1198] (1/2) Epoch 6, batch 3000, loss[loss=0.2703, simple_loss=0.306, pruned_loss=0.08984, ctc_loss=0.183, cr_loss=0.4606, over 34516.00 frames. ], tot_loss[loss=0.2983, simple_loss=0.322, pruned_loss=0.1074, ctc_loss=0.2072, cr_loss=0.4592, over 6749771.20 frames. ], batch size: 94, lr: 1.92e-02, grad_scale: 16.0 2024-09-17 04:26:19,433 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 04:26:36,395 INFO [train.py:1230] (1/2) Epoch 6, validation: loss=0.1689, simple_loss=0.2672, pruned_loss=0.02912, ctc_loss=0.06136, cr_loss=1.411e-14, over 944034.00 frames. 2024-09-17 04:26:36,395 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 04:26:49,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=104449.33333333333, ans=0.125 2024-09-17 04:26:51,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=104496.0, ans=0.1 2024-09-17 04:27:03,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=22.5 2024-09-17 04:27:05,301 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=22.5 2024-09-17 04:27:12,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=104542.66666666667, ans=0.125 2024-09-17 04:27:15,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=104542.66666666667, ans=0.125 2024-09-17 04:27:27,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=104589.33333333333, ans=0.125 2024-09-17 04:27:55,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=104682.66666666667, ans=0.0 2024-09-17 04:27:57,367 INFO [train.py:1198] (1/2) Epoch 6, batch 3050, loss[loss=0.2917, simple_loss=0.3104, pruned_loss=0.107, ctc_loss=0.2041, cr_loss=0.4555, over 34582.00 frames. ], tot_loss[loss=0.2992, simple_loss=0.3227, pruned_loss=0.1079, ctc_loss=0.2079, cr_loss=0.4598, over 6743759.77 frames. ], batch size: 89, lr: 1.92e-02, grad_scale: 16.0 2024-09-17 04:28:33,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=104776.0, ans=0.0 2024-09-17 04:28:42,510 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.334e+02 2.783e+02 3.359e+02 4.166e+02 6.773e+02, threshold=6.717e+02, percent-clipped=0.0 2024-09-17 04:28:52,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=104822.66666666667, ans=0.0 2024-09-17 04:29:02,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=104869.33333333333, ans=0.125 2024-09-17 04:29:17,950 INFO [train.py:1198] (1/2) Epoch 6, batch 3100, loss[loss=0.3399, simple_loss=0.3551, pruned_loss=0.1283, ctc_loss=0.2422, cr_loss=0.495, over 34254.00 frames. ], tot_loss[loss=0.2988, simple_loss=0.3222, pruned_loss=0.1077, ctc_loss=0.2075, cr_loss=0.4592, over 6743838.13 frames. ], batch size: 117, lr: 1.92e-02, grad_scale: 16.0 2024-09-17 04:30:02,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105009.33333333333, ans=0.1 2024-09-17 04:30:05,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=105009.33333333333, ans=0.125 2024-09-17 04:30:28,439 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.41 vs. limit=15.0 2024-09-17 04:30:39,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=105149.33333333333, ans=0.0 2024-09-17 04:30:40,503 INFO [train.py:1198] (1/2) Epoch 6, batch 3150, loss[loss=0.3287, simple_loss=0.351, pruned_loss=0.1202, ctc_loss=0.2317, cr_loss=0.4898, over 33837.00 frames. ], tot_loss[loss=0.2985, simple_loss=0.3221, pruned_loss=0.1075, ctc_loss=0.2073, cr_loss=0.4591, over 6749499.56 frames. ], batch size: 122, lr: 1.92e-02, grad_scale: 16.0 2024-09-17 04:30:55,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=105196.0, ans=0.0 2024-09-17 04:31:00,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=105196.0, ans=0.125 2024-09-17 04:31:05,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=105196.0, ans=0.125 2024-09-17 04:31:16,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=105242.66666666667, ans=0.0 2024-09-17 04:31:19,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=105242.66666666667, ans=0.125 2024-09-17 04:31:22,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=105242.66666666667, ans=0.125 2024-09-17 04:31:24,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=105242.66666666667, ans=0.125 2024-09-17 04:31:25,458 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.346e+02 2.900e+02 4.009e+02 5.131e+02 9.642e+02, threshold=8.019e+02, percent-clipped=9.0 2024-09-17 04:31:38,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=105289.33333333333, ans=0.125 2024-09-17 04:31:43,374 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:31:51,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=105336.0, ans=0.125 2024-09-17 04:32:00,455 INFO [train.py:1198] (1/2) Epoch 6, batch 3200, loss[loss=0.284, simple_loss=0.3149, pruned_loss=0.09839, ctc_loss=0.1924, cr_loss=0.4443, over 34531.00 frames. ], tot_loss[loss=0.2972, simple_loss=0.3211, pruned_loss=0.1069, ctc_loss=0.2061, cr_loss=0.4579, over 6762309.81 frames. ], batch size: 94, lr: 1.91e-02, grad_scale: 32.0 2024-09-17 04:32:12,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=105382.66666666667, ans=0.0 2024-09-17 04:32:20,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.49 vs. limit=10.0 2024-09-17 04:32:52,329 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:32:55,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=105522.66666666667, ans=0.0 2024-09-17 04:33:22,178 INFO [train.py:1198] (1/2) Epoch 6, batch 3250, loss[loss=0.2984, simple_loss=0.323, pruned_loss=0.107, ctc_loss=0.2048, cr_loss=0.4706, over 34657.00 frames. ], tot_loss[loss=0.2973, simple_loss=0.3213, pruned_loss=0.1068, ctc_loss=0.2061, cr_loss=0.4582, over 6770769.77 frames. ], batch size: 98, lr: 1.91e-02, grad_scale: 32.0 2024-09-17 04:33:38,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=105662.66666666667, ans=0.125 2024-09-17 04:33:46,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=105662.66666666667, ans=0.125 2024-09-17 04:33:54,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=105709.33333333333, ans=0.1 2024-09-17 04:34:06,934 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.090e+02 2.795e+02 3.366e+02 4.105e+02 7.260e+02, threshold=6.731e+02, percent-clipped=0.0 2024-09-17 04:34:12,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=105756.0, ans=0.0 2024-09-17 04:34:36,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=105802.66666666667, ans=0.1 2024-09-17 04:34:42,186 INFO [train.py:1198] (1/2) Epoch 6, batch 3300, loss[loss=0.3148, simple_loss=0.3453, pruned_loss=0.1112, ctc_loss=0.2175, cr_loss=0.46, over 33162.00 frames. ], tot_loss[loss=0.2958, simple_loss=0.32, pruned_loss=0.1062, ctc_loss=0.2051, cr_loss=0.4562, over 6767690.72 frames. ], batch size: 130, lr: 1.91e-02, grad_scale: 32.0 2024-09-17 04:34:49,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=105849.33333333333, ans=0.125 2024-09-17 04:35:07,079 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=6.0 2024-09-17 04:35:16,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=105942.66666666667, ans=0.0 2024-09-17 04:35:18,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.32 vs. limit=15.0 2024-09-17 04:35:37,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=105989.33333333333, ans=0.2 2024-09-17 04:36:04,022 INFO [train.py:1198] (1/2) Epoch 6, batch 3350, loss[loss=0.3303, simple_loss=0.3515, pruned_loss=0.1219, ctc_loss=0.2276, cr_loss=0.4925, over 33896.00 frames. ], tot_loss[loss=0.2975, simple_loss=0.3214, pruned_loss=0.107, ctc_loss=0.2065, cr_loss=0.458, over 6742922.78 frames. ], batch size: 122, lr: 1.91e-02, grad_scale: 32.0 2024-09-17 04:36:07,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=106082.66666666667, ans=0.0 2024-09-17 04:36:24,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.87 vs. limit=15.0 2024-09-17 04:36:42,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=106176.0, ans=0.0 2024-09-17 04:36:49,070 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 3.106e+02 3.853e+02 4.577e+02 8.630e+02, threshold=7.706e+02, percent-clipped=4.0 2024-09-17 04:36:49,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=106176.0, ans=0.125 2024-09-17 04:37:08,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=106269.33333333333, ans=0.125 2024-09-17 04:37:16,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=106269.33333333333, ans=0.125 2024-09-17 04:37:22,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=106269.33333333333, ans=0.125 2024-09-17 04:37:25,449 INFO [train.py:1198] (1/2) Epoch 6, batch 3400, loss[loss=0.2741, simple_loss=0.3015, pruned_loss=0.09639, ctc_loss=0.1855, cr_loss=0.4185, over 34177.00 frames. ], tot_loss[loss=0.2978, simple_loss=0.3216, pruned_loss=0.1072, ctc_loss=0.2068, cr_loss=0.4586, over 6733313.32 frames. ], batch size: 78, lr: 1.91e-02, grad_scale: 32.0 2024-09-17 04:37:33,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=106316.0, ans=0.125 2024-09-17 04:37:46,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=106362.66666666667, ans=0.0 2024-09-17 04:38:02,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=106409.33333333333, ans=0.2 2024-09-17 04:38:20,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=106456.0, ans=0.125 2024-09-17 04:38:25,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=106456.0, ans=0.125 2024-09-17 04:38:29,882 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:38:30,587 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.54 vs. limit=6.0 2024-09-17 04:38:31,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=106502.66666666667, ans=0.0 2024-09-17 04:38:45,497 INFO [train.py:1198] (1/2) Epoch 6, batch 3450, loss[loss=0.3039, simple_loss=0.332, pruned_loss=0.108, ctc_loss=0.2074, cr_loss=0.4584, over 33187.00 frames. ], tot_loss[loss=0.2973, simple_loss=0.3214, pruned_loss=0.1068, ctc_loss=0.2061, cr_loss=0.458, over 6746167.99 frames. ], batch size: 130, lr: 1.90e-02, grad_scale: 32.0 2024-09-17 04:38:47,781 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-17 04:39:00,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=106596.0, ans=0.1 2024-09-17 04:39:09,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=106596.0, ans=0.125 2024-09-17 04:39:10,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.36 vs. limit=15.0 2024-09-17 04:39:29,523 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.338e+02 2.913e+02 3.665e+02 4.828e+02 8.169e+02, threshold=7.331e+02, percent-clipped=3.0 2024-09-17 04:39:29,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=106642.66666666667, ans=0.125 2024-09-17 04:39:42,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=106689.33333333333, ans=0.1 2024-09-17 04:40:04,452 INFO [train.py:1198] (1/2) Epoch 6, batch 3500, loss[loss=0.2642, simple_loss=0.2926, pruned_loss=0.092, ctc_loss=0.1778, cr_loss=0.4063, over 34490.00 frames. ], tot_loss[loss=0.2967, simple_loss=0.3207, pruned_loss=0.1066, ctc_loss=0.2058, cr_loss=0.4576, over 6748516.40 frames. ], batch size: 85, lr: 1.90e-02, grad_scale: 32.0 2024-09-17 04:40:07,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=106782.66666666667, ans=0.0 2024-09-17 04:40:45,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=106876.0, ans=0.125 2024-09-17 04:41:25,262 INFO [train.py:1198] (1/2) Epoch 6, batch 3550, loss[loss=0.2991, simple_loss=0.3263, pruned_loss=0.1069, ctc_loss=0.2043, cr_loss=0.4333, over 34392.00 frames. ], tot_loss[loss=0.2968, simple_loss=0.3209, pruned_loss=0.1066, ctc_loss=0.2058, cr_loss=0.4582, over 6757584.33 frames. ], batch size: 103, lr: 1.90e-02, grad_scale: 32.0 2024-09-17 04:41:30,452 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:41:35,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=107016.0, ans=0.1 2024-09-17 04:41:45,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=107062.66666666667, ans=0.07 2024-09-17 04:41:50,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=107062.66666666667, ans=0.1 2024-09-17 04:41:50,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=107062.66666666667, ans=0.04949747468305833 2024-09-17 04:42:10,763 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.345e+02 2.938e+02 3.701e+02 4.990e+02 7.883e+02, threshold=7.402e+02, percent-clipped=6.0 2024-09-17 04:42:25,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=107156.0, ans=0.125 2024-09-17 04:42:45,492 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2024-09-17 04:42:46,189 INFO [train.py:1198] (1/2) Epoch 6, batch 3600, loss[loss=0.282, simple_loss=0.3125, pruned_loss=0.09826, ctc_loss=0.1876, cr_loss=0.439, over 34472.00 frames. ], tot_loss[loss=0.2969, simple_loss=0.3211, pruned_loss=0.1066, ctc_loss=0.2059, cr_loss=0.4589, over 6767356.75 frames. ], batch size: 90, lr: 1.90e-02, grad_scale: 32.0 2024-09-17 04:43:01,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.18 vs. limit=15.0 2024-09-17 04:43:04,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=107296.0, ans=0.2 2024-09-17 04:43:05,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=107296.0, ans=0.1 2024-09-17 04:43:11,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=107296.0, ans=0.1 2024-09-17 04:43:34,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=107389.33333333333, ans=0.125 2024-09-17 04:43:48,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=107436.0, ans=0.1 2024-09-17 04:43:49,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.70 vs. limit=15.0 2024-09-17 04:43:51,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=107436.0, ans=0.125 2024-09-17 04:43:53,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=107436.0, ans=0.125 2024-09-17 04:44:06,989 INFO [train.py:1198] (1/2) Epoch 6, batch 3650, loss[loss=0.3053, simple_loss=0.3274, pruned_loss=0.1108, ctc_loss=0.2086, cr_loss=0.4935, over 34495.00 frames. ], tot_loss[loss=0.2966, simple_loss=0.3209, pruned_loss=0.1065, ctc_loss=0.2055, cr_loss=0.4582, over 6769138.19 frames. ], batch size: 110, lr: 1.90e-02, grad_scale: 32.0 2024-09-17 04:44:07,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=107482.66666666667, ans=0.0 2024-09-17 04:44:17,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.47 vs. limit=15.0 2024-09-17 04:44:25,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=107529.33333333333, ans=0.125 2024-09-17 04:44:26,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=107529.33333333333, ans=0.125 2024-09-17 04:44:26,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=107529.33333333333, ans=0.0 2024-09-17 04:44:31,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=107529.33333333333, ans=0.125 2024-09-17 04:44:36,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=107529.33333333333, ans=0.125 2024-09-17 04:44:46,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.51 vs. limit=22.5 2024-09-17 04:44:47,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=107576.0, ans=0.125 2024-09-17 04:44:51,547 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.370e+02 3.078e+02 3.739e+02 4.867e+02 7.641e+02, threshold=7.478e+02, percent-clipped=2.0 2024-09-17 04:44:56,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=107622.66666666667, ans=0.125 2024-09-17 04:44:58,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=107622.66666666667, ans=0.125 2024-09-17 04:45:02,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=107622.66666666667, ans=0.07 2024-09-17 04:45:10,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=107669.33333333333, ans=0.07 2024-09-17 04:45:11,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=107669.33333333333, ans=0.1 2024-09-17 04:45:27,255 INFO [train.py:1198] (1/2) Epoch 6, batch 3700, loss[loss=0.2892, simple_loss=0.3158, pruned_loss=0.102, ctc_loss=0.1998, cr_loss=0.4694, over 34650.00 frames. ], tot_loss[loss=0.2966, simple_loss=0.3211, pruned_loss=0.1063, ctc_loss=0.2053, cr_loss=0.4586, over 6783276.39 frames. ], batch size: 102, lr: 1.89e-02, grad_scale: 32.0 2024-09-17 04:45:54,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=107762.66666666667, ans=0.0 2024-09-17 04:46:08,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=107809.33333333333, ans=0.0 2024-09-17 04:46:14,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.35 vs. limit=15.0 2024-09-17 04:46:19,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=107856.0, ans=0.2 2024-09-17 04:46:47,890 INFO [train.py:1198] (1/2) Epoch 6, batch 3750, loss[loss=0.3003, simple_loss=0.3276, pruned_loss=0.1067, ctc_loss=0.2062, cr_loss=0.4552, over 34379.00 frames. ], tot_loss[loss=0.3006, simple_loss=0.3247, pruned_loss=0.1082, ctc_loss=0.2085, cr_loss=0.4632, over 6784586.68 frames. ], batch size: 113, lr: 1.89e-02, grad_scale: 16.0 2024-09-17 04:46:57,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=107949.33333333333, ans=0.0 2024-09-17 04:47:00,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=107949.33333333333, ans=0.125 2024-09-17 04:47:02,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=107996.0, ans=0.1 2024-09-17 04:47:35,003 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.327e+02 2.708e+02 3.064e+02 3.701e+02 1.001e+03, threshold=6.128e+02, percent-clipped=1.0 2024-09-17 04:47:40,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=108089.33333333333, ans=0.1 2024-09-17 04:47:48,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=108089.33333333333, ans=0.125 2024-09-17 04:47:52,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=108136.0, ans=0.1 2024-09-17 04:48:08,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=108182.66666666667, ans=0.125 2024-09-17 04:48:09,449 INFO [train.py:1198] (1/2) Epoch 6, batch 3800, loss[loss=0.3328, simple_loss=0.3445, pruned_loss=0.1261, ctc_loss=0.2425, cr_loss=0.5106, over 29660.00 frames. ], tot_loss[loss=0.3054, simple_loss=0.3281, pruned_loss=0.1107, ctc_loss=0.2131, cr_loss=0.4674, over 6672098.57 frames. ], batch size: 175, lr: 1.89e-02, grad_scale: 16.0 2024-09-17 04:48:29,618 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:48:31,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=108229.33333333333, ans=0.1 2024-09-17 04:49:00,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=108322.66666666667, ans=0.0 2024-09-17 04:49:10,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=108322.66666666667, ans=0.5 2024-09-17 04:49:12,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=108322.66666666667, ans=0.125 2024-09-17 04:49:33,318 INFO [train.py:1198] (1/2) Epoch 6, batch 3850, loss[loss=0.3437, simple_loss=0.3472, pruned_loss=0.1344, ctc_loss=0.2588, cr_loss=0.4909, over 24233.00 frames. ], tot_loss[loss=0.3139, simple_loss=0.333, pruned_loss=0.1158, ctc_loss=0.2227, cr_loss=0.4697, over 6246964.03 frames. ], batch size: 244, lr: 1.89e-02, grad_scale: 16.0 2024-09-17 04:49:38,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=108416.0, ans=0.2 2024-09-17 04:51:00,929 INFO [train.py:1198] (1/2) Epoch 7, batch 0, loss[loss=0.3001, simple_loss=0.3171, pruned_loss=0.1117, ctc_loss=0.2062, cr_loss=0.4594, over 34448.00 frames. ], tot_loss[loss=0.3001, simple_loss=0.3171, pruned_loss=0.1117, ctc_loss=0.2062, cr_loss=0.4594, over 34448.00 frames. ], batch size: 85, lr: 1.77e-02, grad_scale: 32.0 2024-09-17 04:51:00,929 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 04:51:17,619 INFO [train.py:1230] (1/2) Epoch 7, validation: loss=0.1725, simple_loss=0.2712, pruned_loss=0.03068, ctc_loss=0.06225, cr_loss=1.527e-14, over 944034.00 frames. 2024-09-17 04:51:17,619 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 04:51:19,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=108537.33333333333, ans=0.2 2024-09-17 04:51:22,593 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.372e+02 2.782e+02 3.121e+02 3.566e+02 6.598e+02, threshold=6.242e+02, percent-clipped=1.0 2024-09-17 04:51:28,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2024-09-17 04:51:28,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2024-09-17 04:51:29,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=108537.33333333333, ans=0.125 2024-09-17 04:51:57,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.67 vs. limit=15.0 2024-09-17 04:52:10,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=108677.33333333333, ans=0.07 2024-09-17 04:52:27,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-09-17 04:52:32,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.92 vs. limit=22.5 2024-09-17 04:52:42,928 INFO [train.py:1198] (1/2) Epoch 7, batch 50, loss[loss=0.2739, simple_loss=0.2953, pruned_loss=0.09863, ctc_loss=0.1925, cr_loss=0.4167, over 34485.00 frames. ], tot_loss[loss=0.3028, simple_loss=0.326, pruned_loss=0.1095, ctc_loss=0.2106, cr_loss=0.4612, over 1480226.06 frames. ], batch size: 82, lr: 1.77e-02, grad_scale: 16.0 2024-09-17 04:52:43,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=108770.66666666667, ans=0.025 2024-09-17 04:52:53,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.07 vs. limit=15.0 2024-09-17 04:53:03,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.45 vs. limit=6.0 2024-09-17 04:53:17,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=108864.0, ans=0.125 2024-09-17 04:53:41,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.98 vs. limit=15.0 2024-09-17 04:53:47,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=108957.33333333333, ans=0.125 2024-09-17 04:53:50,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=108957.33333333333, ans=0.025 2024-09-17 04:53:58,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=108957.33333333333, ans=0.025 2024-09-17 04:54:04,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.70 vs. limit=6.0 2024-09-17 04:54:04,696 INFO [train.py:1198] (1/2) Epoch 7, batch 100, loss[loss=0.276, simple_loss=0.3024, pruned_loss=0.09671, ctc_loss=0.1882, cr_loss=0.4633, over 34572.00 frames. ], tot_loss[loss=0.3019, simple_loss=0.3259, pruned_loss=0.1088, ctc_loss=0.2095, cr_loss=0.463, over 2628092.06 frames. ], batch size: 89, lr: 1.77e-02, grad_scale: 16.0 2024-09-17 04:54:10,999 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.170e+02 2.810e+02 3.413e+02 4.539e+02 8.398e+02, threshold=6.827e+02, percent-clipped=5.0 2024-09-17 04:54:33,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.77 vs. limit=15.0 2024-09-17 04:54:37,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=109097.33333333333, ans=0.0 2024-09-17 04:54:45,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=109097.33333333333, ans=0.0 2024-09-17 04:55:17,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.01 vs. limit=5.0 2024-09-17 04:55:24,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=109237.33333333333, ans=0.125 2024-09-17 04:55:26,115 INFO [train.py:1198] (1/2) Epoch 7, batch 150, loss[loss=0.2576, simple_loss=0.2868, pruned_loss=0.08819, ctc_loss=0.1748, cr_loss=0.4246, over 34510.00 frames. ], tot_loss[loss=0.2969, simple_loss=0.3221, pruned_loss=0.1062, ctc_loss=0.2051, cr_loss=0.4585, over 3555529.69 frames. ], batch size: 82, lr: 1.76e-02, grad_scale: 16.0 2024-09-17 04:55:29,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=109237.33333333333, ans=0.125 2024-09-17 04:56:17,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=109377.33333333333, ans=0.125 2024-09-17 04:56:22,637 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:56:27,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=109377.33333333333, ans=0.0 2024-09-17 04:56:43,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=109424.0, ans=0.125 2024-09-17 04:56:51,593 INFO [train.py:1198] (1/2) Epoch 7, batch 200, loss[loss=0.3332, simple_loss=0.3519, pruned_loss=0.1244, ctc_loss=0.2314, cr_loss=0.4874, over 32217.00 frames. ], tot_loss[loss=0.295, simple_loss=0.3203, pruned_loss=0.1054, ctc_loss=0.2036, cr_loss=0.457, over 4269075.47 frames. ], batch size: 145, lr: 1.76e-02, grad_scale: 16.0 2024-09-17 04:56:58,183 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.741e+02 3.087e+02 4.015e+02 8.987e+02, threshold=6.175e+02, percent-clipped=2.0 2024-09-17 04:57:01,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=109470.66666666667, ans=0.1 2024-09-17 04:58:01,446 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2024-09-17 04:58:13,672 INFO [train.py:1198] (1/2) Epoch 7, batch 250, loss[loss=0.3121, simple_loss=0.3359, pruned_loss=0.1131, ctc_loss=0.2164, cr_loss=0.4702, over 34211.00 frames. ], tot_loss[loss=0.295, simple_loss=0.3202, pruned_loss=0.1054, ctc_loss=0.2037, cr_loss=0.4577, over 4831077.52 frames. ], batch size: 117, lr: 1.76e-02, grad_scale: 16.0 2024-09-17 04:58:36,905 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 04:58:38,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=109750.66666666667, ans=0.125 2024-09-17 04:58:40,394 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.73 vs. limit=22.5 2024-09-17 04:58:58,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2024-09-17 04:59:01,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=109844.0, ans=0.125 2024-09-17 04:59:02,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=109844.0, ans=0.125 2024-09-17 04:59:37,046 INFO [train.py:1198] (1/2) Epoch 7, batch 300, loss[loss=0.3268, simple_loss=0.3387, pruned_loss=0.1238, ctc_loss=0.2342, cr_loss=0.5087, over 34368.00 frames. ], tot_loss[loss=0.2946, simple_loss=0.3197, pruned_loss=0.1053, ctc_loss=0.2037, cr_loss=0.4571, over 5260938.20 frames. ], batch size: 107, lr: 1.76e-02, grad_scale: 16.0 2024-09-17 04:59:39,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=109937.33333333333, ans=0.05 2024-09-17 04:59:39,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=109937.33333333333, ans=0.1 2024-09-17 04:59:40,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=109937.33333333333, ans=0.0 2024-09-17 04:59:43,592 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 2.896e+02 3.593e+02 4.717e+02 9.894e+02, threshold=7.185e+02, percent-clipped=5.0 2024-09-17 04:59:55,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=109984.0, ans=0.125 2024-09-17 05:00:05,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=109984.0, ans=10.0 2024-09-17 05:00:07,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=109984.0, ans=0.02 2024-09-17 05:00:12,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=110030.66666666667, ans=0.125 2024-09-17 05:00:26,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=110077.33333333333, ans=0.1 2024-09-17 05:01:00,939 INFO [train.py:1198] (1/2) Epoch 7, batch 350, loss[loss=0.2623, simple_loss=0.29, pruned_loss=0.09138, ctc_loss=0.1752, cr_loss=0.4199, over 34269.00 frames. ], tot_loss[loss=0.2948, simple_loss=0.3198, pruned_loss=0.1054, ctc_loss=0.2037, cr_loss=0.4585, over 5596504.12 frames. ], batch size: 83, lr: 1.76e-02, grad_scale: 16.0 2024-09-17 05:01:09,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=110170.66666666667, ans=0.0 2024-09-17 05:01:18,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=110217.33333333333, ans=0.0 2024-09-17 05:01:25,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=110217.33333333333, ans=0.0 2024-09-17 05:01:48,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=110310.66666666667, ans=0.1 2024-09-17 05:01:54,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=110310.66666666667, ans=0.1 2024-09-17 05:01:59,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=110310.66666666667, ans=0.1 2024-09-17 05:02:03,488 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-17 05:02:11,750 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.74 vs. limit=15.0 2024-09-17 05:02:22,225 INFO [train.py:1198] (1/2) Epoch 7, batch 400, loss[loss=0.2868, simple_loss=0.3194, pruned_loss=0.09913, ctc_loss=0.1935, cr_loss=0.4319, over 34393.00 frames. ], tot_loss[loss=0.2934, simple_loss=0.3187, pruned_loss=0.1046, ctc_loss=0.2024, cr_loss=0.4564, over 5864398.38 frames. ], batch size: 95, lr: 1.76e-02, grad_scale: 32.0 2024-09-17 05:02:28,837 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.387e+02 2.897e+02 3.800e+02 4.925e+02 7.984e+02, threshold=7.600e+02, percent-clipped=1.0 2024-09-17 05:03:00,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=110497.33333333333, ans=0.0 2024-09-17 05:03:16,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=110544.0, ans=0.0 2024-09-17 05:03:45,427 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:03:48,113 INFO [train.py:1198] (1/2) Epoch 7, batch 450, loss[loss=0.3014, simple_loss=0.3284, pruned_loss=0.1071, ctc_loss=0.2063, cr_loss=0.4721, over 34700.00 frames. ], tot_loss[loss=0.2936, simple_loss=0.3187, pruned_loss=0.1048, ctc_loss=0.2027, cr_loss=0.4568, over 6053846.31 frames. ], batch size: 97, lr: 1.75e-02, grad_scale: 16.0 2024-09-17 05:04:03,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=110684.0, ans=0.2 2024-09-17 05:04:23,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=15.0 2024-09-17 05:04:28,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.35 vs. limit=15.0 2024-09-17 05:04:45,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110777.33333333333, ans=0.1 2024-09-17 05:04:47,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=110777.33333333333, ans=0.05 2024-09-17 05:05:01,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2024-09-17 05:05:09,828 INFO [train.py:1198] (1/2) Epoch 7, batch 500, loss[loss=0.3369, simple_loss=0.3557, pruned_loss=0.1258, ctc_loss=0.2336, cr_loss=0.493, over 34448.00 frames. ], tot_loss[loss=0.2922, simple_loss=0.3176, pruned_loss=0.1042, ctc_loss=0.2016, cr_loss=0.4558, over 6219459.79 frames. ], batch size: 110, lr: 1.75e-02, grad_scale: 16.0 2024-09-17 05:05:18,028 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.148e+02 2.700e+02 3.241e+02 3.982e+02 7.294e+02, threshold=6.481e+02, percent-clipped=0.0 2024-09-17 05:05:21,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=110870.66666666667, ans=0.125 2024-09-17 05:05:26,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=110917.33333333333, ans=0.0 2024-09-17 05:05:29,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=110917.33333333333, ans=0.1 2024-09-17 05:05:41,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=110964.0, ans=0.0 2024-09-17 05:05:59,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=111010.66666666667, ans=0.125 2024-09-17 05:06:07,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=111010.66666666667, ans=0.0 2024-09-17 05:06:13,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=111057.33333333333, ans=0.0 2024-09-17 05:06:22,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=111057.33333333333, ans=0.125 2024-09-17 05:06:31,848 INFO [train.py:1198] (1/2) Epoch 7, batch 550, loss[loss=0.317, simple_loss=0.3446, pruned_loss=0.1133, ctc_loss=0.2163, cr_loss=0.494, over 33811.00 frames. ], tot_loss[loss=0.2925, simple_loss=0.3179, pruned_loss=0.1042, ctc_loss=0.2018, cr_loss=0.4562, over 6327714.59 frames. ], batch size: 122, lr: 1.75e-02, grad_scale: 16.0 2024-09-17 05:06:42,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=111104.0, ans=0.125 2024-09-17 05:07:06,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=111197.33333333333, ans=0.95 2024-09-17 05:07:31,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=111244.0, ans=0.0 2024-09-17 05:07:51,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2024-09-17 05:07:57,406 INFO [train.py:1198] (1/2) Epoch 7, batch 600, loss[loss=0.3224, simple_loss=0.3441, pruned_loss=0.1175, ctc_loss=0.2282, cr_loss=0.4985, over 34221.00 frames. ], tot_loss[loss=0.2921, simple_loss=0.3177, pruned_loss=0.104, ctc_loss=0.2014, cr_loss=0.4566, over 6428899.97 frames. ], batch size: 117, lr: 1.75e-02, grad_scale: 16.0 2024-09-17 05:08:05,696 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.262e+02 3.066e+02 3.815e+02 4.471e+02 8.616e+02, threshold=7.630e+02, percent-clipped=3.0 2024-09-17 05:08:10,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=111337.33333333333, ans=0.125 2024-09-17 05:08:56,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=14.79 vs. limit=15.0 2024-09-17 05:09:18,290 INFO [train.py:1198] (1/2) Epoch 7, batch 650, loss[loss=0.273, simple_loss=0.3028, pruned_loss=0.09431, ctc_loss=0.1826, cr_loss=0.4505, over 34526.00 frames. ], tot_loss[loss=0.2901, simple_loss=0.3163, pruned_loss=0.1029, ctc_loss=0.1996, cr_loss=0.4538, over 6520435.20 frames. ], batch size: 94, lr: 1.75e-02, grad_scale: 16.0 2024-09-17 05:09:25,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=111570.66666666667, ans=0.125 2024-09-17 05:09:33,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=111617.33333333333, ans=0.125 2024-09-17 05:09:51,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=111664.0, ans=0.07 2024-09-17 05:09:58,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=111664.0, ans=0.125 2024-09-17 05:10:07,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=111710.66666666667, ans=0.0 2024-09-17 05:10:12,696 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:10:14,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=111710.66666666667, ans=0.125 2024-09-17 05:10:32,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=111757.33333333333, ans=0.1 2024-09-17 05:10:33,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=111757.33333333333, ans=0.125 2024-09-17 05:10:41,597 INFO [train.py:1198] (1/2) Epoch 7, batch 700, loss[loss=0.2743, simple_loss=0.2994, pruned_loss=0.09765, ctc_loss=0.1867, cr_loss=0.4147, over 34597.00 frames. ], tot_loss[loss=0.2909, simple_loss=0.3172, pruned_loss=0.1031, ctc_loss=0.2, cr_loss=0.4551, over 6577317.43 frames. ], batch size: 89, lr: 1.75e-02, grad_scale: 16.0 2024-09-17 05:10:49,734 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.170e+02 2.745e+02 3.421e+02 4.797e+02 9.082e+02, threshold=6.842e+02, percent-clipped=4.0 2024-09-17 05:10:59,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=111850.66666666667, ans=0.0 2024-09-17 05:11:02,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=111850.66666666667, ans=0.125 2024-09-17 05:11:06,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=111850.66666666667, ans=0.125 2024-09-17 05:12:03,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=111990.66666666667, ans=0.125 2024-09-17 05:12:04,409 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2024-09-17 05:12:08,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=111990.66666666667, ans=0.025 2024-09-17 05:12:11,500 INFO [train.py:1198] (1/2) Epoch 7, batch 750, loss[loss=0.287, simple_loss=0.3124, pruned_loss=0.101, ctc_loss=0.1982, cr_loss=0.4988, over 34411.00 frames. ], tot_loss[loss=0.2897, simple_loss=0.3164, pruned_loss=0.1026, ctc_loss=0.1989, cr_loss=0.4533, over 6619420.34 frames. ], batch size: 95, lr: 1.74e-02, grad_scale: 16.0 2024-09-17 05:12:18,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=112037.33333333333, ans=0.2 2024-09-17 05:12:29,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=112084.0, ans=0.125 2024-09-17 05:13:03,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.21 vs. limit=15.0 2024-09-17 05:13:04,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=112177.33333333333, ans=0.0 2024-09-17 05:13:33,288 INFO [train.py:1198] (1/2) Epoch 7, batch 800, loss[loss=0.2668, simple_loss=0.2945, pruned_loss=0.09265, ctc_loss=0.1842, cr_loss=0.4232, over 34470.00 frames. ], tot_loss[loss=0.2893, simple_loss=0.3161, pruned_loss=0.1023, ctc_loss=0.1984, cr_loss=0.4533, over 6655532.59 frames. ], batch size: 85, lr: 1.74e-02, grad_scale: 32.0 2024-09-17 05:13:33,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=112270.66666666667, ans=0.2 2024-09-17 05:13:41,460 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.773e+02 3.495e+02 4.641e+02 1.016e+03, threshold=6.991e+02, percent-clipped=12.0 2024-09-17 05:13:43,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=112270.66666666667, ans=0.125 2024-09-17 05:13:51,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=15.0 2024-09-17 05:14:15,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=112364.0, ans=0.2 2024-09-17 05:14:36,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.72 vs. limit=15.0 2024-09-17 05:14:46,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112457.33333333333, ans=0.1 2024-09-17 05:14:50,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=112457.33333333333, ans=0.0 2024-09-17 05:14:56,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.15 vs. limit=15.0 2024-09-17 05:14:56,929 INFO [train.py:1198] (1/2) Epoch 7, batch 850, loss[loss=0.2873, simple_loss=0.3186, pruned_loss=0.09985, ctc_loss=0.1951, cr_loss=0.4307, over 34360.00 frames. ], tot_loss[loss=0.2889, simple_loss=0.3158, pruned_loss=0.1021, ctc_loss=0.1981, cr_loss=0.4527, over 6689026.61 frames. ], batch size: 103, lr: 1.74e-02, grad_scale: 16.0 2024-09-17 05:15:00,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=112504.0, ans=15.0 2024-09-17 05:15:05,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=112504.0, ans=0.0 2024-09-17 05:15:20,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=112550.66666666667, ans=0.2 2024-09-17 05:15:37,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=112597.33333333333, ans=0.125 2024-09-17 05:15:38,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=112597.33333333333, ans=0.0 2024-09-17 05:15:40,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=112597.33333333333, ans=0.125 2024-09-17 05:16:01,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=112644.0, ans=0.125 2024-09-17 05:16:11,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=112690.66666666667, ans=0.125 2024-09-17 05:16:18,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=112690.66666666667, ans=0.125 2024-09-17 05:16:21,489 INFO [train.py:1198] (1/2) Epoch 7, batch 900, loss[loss=0.2786, simple_loss=0.3023, pruned_loss=0.09961, ctc_loss=0.1935, cr_loss=0.4226, over 34439.00 frames. ], tot_loss[loss=0.2897, simple_loss=0.3163, pruned_loss=0.1025, ctc_loss=0.1988, cr_loss=0.4538, over 6697138.52 frames. ], batch size: 85, lr: 1.74e-02, grad_scale: 16.0 2024-09-17 05:16:28,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=112737.33333333333, ans=0.125 2024-09-17 05:16:31,091 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.173e+02 2.820e+02 3.398e+02 4.399e+02 9.788e+02, threshold=6.796e+02, percent-clipped=5.0 2024-09-17 05:16:32,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=112737.33333333333, ans=0.125 2024-09-17 05:16:36,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=112784.0, ans=0.125 2024-09-17 05:16:51,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=112784.0, ans=0.125 2024-09-17 05:17:06,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=112830.66666666667, ans=0.125 2024-09-17 05:17:27,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=112924.0, ans=0.0 2024-09-17 05:17:29,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.64 vs. limit=15.0 2024-09-17 05:17:43,583 INFO [train.py:1198] (1/2) Epoch 7, batch 950, loss[loss=0.2665, simple_loss=0.3005, pruned_loss=0.09095, ctc_loss=0.178, cr_loss=0.3771, over 34700.00 frames. ], tot_loss[loss=0.2899, simple_loss=0.3166, pruned_loss=0.1026, ctc_loss=0.1991, cr_loss=0.4537, over 6701132.28 frames. ], batch size: 87, lr: 1.74e-02, grad_scale: 16.0 2024-09-17 05:18:13,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=113017.33333333333, ans=0.125 2024-09-17 05:18:17,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=113064.0, ans=0.0 2024-09-17 05:18:19,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.49 vs. limit=22.5 2024-09-17 05:18:34,979 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:18:34,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=113110.66666666667, ans=0.125 2024-09-17 05:18:51,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=113157.33333333333, ans=0.0 2024-09-17 05:18:52,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=113157.33333333333, ans=0.025 2024-09-17 05:18:56,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=113157.33333333333, ans=0.0 2024-09-17 05:19:08,919 INFO [train.py:1198] (1/2) Epoch 7, batch 1000, loss[loss=0.2571, simple_loss=0.2911, pruned_loss=0.08627, ctc_loss=0.1692, cr_loss=0.415, over 34474.00 frames. ], tot_loss[loss=0.2907, simple_loss=0.3171, pruned_loss=0.103, ctc_loss=0.1999, cr_loss=0.4547, over 6694556.12 frames. ], batch size: 90, lr: 1.74e-02, grad_scale: 16.0 2024-09-17 05:19:09,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=113204.0, ans=0.07 2024-09-17 05:19:18,771 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.102e+02 2.693e+02 3.339e+02 4.792e+02 1.468e+03, threshold=6.678e+02, percent-clipped=2.0 2024-09-17 05:19:27,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.00 vs. limit=6.0 2024-09-17 05:19:53,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=113297.33333333333, ans=0.125 2024-09-17 05:20:21,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=113390.66666666667, ans=0.125 2024-09-17 05:20:21,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=113390.66666666667, ans=0.0 2024-09-17 05:20:24,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=113390.66666666667, ans=0.2 2024-09-17 05:20:30,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.70 vs. limit=15.0 2024-09-17 05:20:30,929 INFO [train.py:1198] (1/2) Epoch 7, batch 1050, loss[loss=0.3049, simple_loss=0.3314, pruned_loss=0.1091, ctc_loss=0.2088, cr_loss=0.458, over 34561.00 frames. ], tot_loss[loss=0.2897, simple_loss=0.3162, pruned_loss=0.1026, ctc_loss=0.1991, cr_loss=0.4533, over 6702993.46 frames. ], batch size: 99, lr: 1.73e-02, grad_scale: 16.0 2024-09-17 05:20:52,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=113484.0, ans=0.2 2024-09-17 05:20:54,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=113484.0, ans=0.2 2024-09-17 05:21:03,396 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-09-17 05:21:10,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=113530.66666666667, ans=0.0 2024-09-17 05:21:26,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=113577.33333333333, ans=0.2 2024-09-17 05:21:28,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=113577.33333333333, ans=0.0 2024-09-17 05:21:48,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=113624.0, ans=0.025 2024-09-17 05:21:51,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2024-09-17 05:21:53,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=113670.66666666667, ans=0.125 2024-09-17 05:21:54,734 INFO [train.py:1198] (1/2) Epoch 7, batch 1100, loss[loss=0.2768, simple_loss=0.3072, pruned_loss=0.09605, ctc_loss=0.1856, cr_loss=0.4278, over 34357.00 frames. ], tot_loss[loss=0.2895, simple_loss=0.316, pruned_loss=0.1026, ctc_loss=0.1987, cr_loss=0.4525, over 6716038.77 frames. ], batch size: 91, lr: 1.73e-02, grad_scale: 16.0 2024-09-17 05:22:01,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=113670.66666666667, ans=0.2 2024-09-17 05:22:04,604 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.236e+02 2.879e+02 3.781e+02 4.667e+02 8.318e+02, threshold=7.562e+02, percent-clipped=7.0 2024-09-17 05:22:06,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=113670.66666666667, ans=0.0 2024-09-17 05:22:10,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.61 vs. limit=12.0 2024-09-17 05:22:12,104 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.59 vs. limit=12.0 2024-09-17 05:22:32,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=113764.0, ans=0.2 2024-09-17 05:22:35,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.51 vs. limit=15.0 2024-09-17 05:22:37,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=113764.0, ans=0.0 2024-09-17 05:22:52,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=113810.66666666667, ans=0.125 2024-09-17 05:23:14,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=113857.33333333333, ans=0.125 2024-09-17 05:23:19,440 INFO [train.py:1198] (1/2) Epoch 7, batch 1150, loss[loss=0.2923, simple_loss=0.3167, pruned_loss=0.1046, ctc_loss=0.2004, cr_loss=0.4626, over 34364.00 frames. ], tot_loss[loss=0.2897, simple_loss=0.3161, pruned_loss=0.1027, ctc_loss=0.1989, cr_loss=0.4521, over 6715577.18 frames. ], batch size: 91, lr: 1.73e-02, grad_scale: 16.0 2024-09-17 05:23:46,193 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:23:47,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=113950.66666666667, ans=0.2 2024-09-17 05:23:49,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=113950.66666666667, ans=0.1 2024-09-17 05:23:52,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=113997.33333333333, ans=0.125 2024-09-17 05:24:01,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=113997.33333333333, ans=0.1 2024-09-17 05:24:02,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=113997.33333333333, ans=0.0 2024-09-17 05:24:04,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=113997.33333333333, ans=0.0 2024-09-17 05:24:30,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=114090.66666666667, ans=0.0 2024-09-17 05:24:32,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=114090.66666666667, ans=0.125 2024-09-17 05:24:41,650 INFO [train.py:1198] (1/2) Epoch 7, batch 1200, loss[loss=0.2958, simple_loss=0.3242, pruned_loss=0.104, ctc_loss=0.2038, cr_loss=0.4648, over 34566.00 frames. ], tot_loss[loss=0.2913, simple_loss=0.3174, pruned_loss=0.1035, ctc_loss=0.2002, cr_loss=0.4538, over 6708119.30 frames. ], batch size: 99, lr: 1.73e-02, grad_scale: 32.0 2024-09-17 05:24:50,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.41 vs. limit=15.0 2024-09-17 05:24:51,457 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.237e+02 2.759e+02 3.232e+02 3.961e+02 7.936e+02, threshold=6.464e+02, percent-clipped=2.0 2024-09-17 05:25:01,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=114184.0, ans=0.025 2024-09-17 05:26:05,859 INFO [train.py:1198] (1/2) Epoch 7, batch 1250, loss[loss=0.2945, simple_loss=0.3295, pruned_loss=0.1003, ctc_loss=0.2006, cr_loss=0.47, over 34323.00 frames. ], tot_loss[loss=0.2917, simple_loss=0.318, pruned_loss=0.1036, ctc_loss=0.2003, cr_loss=0.4558, over 6741707.86 frames. ], batch size: 107, lr: 1.73e-02, grad_scale: 32.0 2024-09-17 05:26:12,643 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.595e-02 2024-09-17 05:26:34,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=114417.33333333333, ans=0.125 2024-09-17 05:27:13,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=114557.33333333333, ans=0.0 2024-09-17 05:27:21,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=114557.33333333333, ans=0.125 2024-09-17 05:27:29,855 INFO [train.py:1198] (1/2) Epoch 7, batch 1300, loss[loss=0.3026, simple_loss=0.3308, pruned_loss=0.1062, ctc_loss=0.2115, cr_loss=0.492, over 33015.00 frames. ], tot_loss[loss=0.2907, simple_loss=0.3171, pruned_loss=0.1031, ctc_loss=0.1996, cr_loss=0.4549, over 6745244.99 frames. ], batch size: 130, lr: 1.73e-02, grad_scale: 32.0 2024-09-17 05:27:36,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=114604.0, ans=0.125 2024-09-17 05:27:39,649 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.189e+02 2.823e+02 3.409e+02 4.444e+02 9.555e+02, threshold=6.817e+02, percent-clipped=9.0 2024-09-17 05:27:49,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=114650.66666666667, ans=0.2 2024-09-17 05:27:53,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=114650.66666666667, ans=0.025 2024-09-17 05:27:56,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=114650.66666666667, ans=10.0 2024-09-17 05:28:03,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=114697.33333333333, ans=0.1 2024-09-17 05:28:29,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=114744.0, ans=0.125 2024-09-17 05:28:32,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=114744.0, ans=0.0 2024-09-17 05:28:38,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.24 vs. limit=15.0 2024-09-17 05:28:46,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=114790.66666666667, ans=0.2 2024-09-17 05:28:52,807 INFO [train.py:1198] (1/2) Epoch 7, batch 1350, loss[loss=0.2764, simple_loss=0.3101, pruned_loss=0.09385, ctc_loss=0.1876, cr_loss=0.4358, over 34548.00 frames. ], tot_loss[loss=0.2899, simple_loss=0.3164, pruned_loss=0.1027, ctc_loss=0.1989, cr_loss=0.4547, over 6764213.70 frames. ], batch size: 94, lr: 1.72e-02, grad_scale: 32.0 2024-09-17 05:29:12,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=114884.0, ans=0.0 2024-09-17 05:29:22,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=114884.0, ans=0.125 2024-09-17 05:29:32,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=114930.66666666667, ans=0.0 2024-09-17 05:30:05,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=115024.0, ans=0.125 2024-09-17 05:30:06,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=115024.0, ans=0.125 2024-09-17 05:30:09,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=115024.0, ans=0.0 2024-09-17 05:30:16,220 INFO [train.py:1198] (1/2) Epoch 7, batch 1400, loss[loss=0.2488, simple_loss=0.283, pruned_loss=0.0829, ctc_loss=0.1637, cr_loss=0.3985, over 34303.00 frames. ], tot_loss[loss=0.2892, simple_loss=0.316, pruned_loss=0.1023, ctc_loss=0.1981, cr_loss=0.4536, over 6776415.46 frames. ], batch size: 80, lr: 1.72e-02, grad_scale: 32.0 2024-09-17 05:30:16,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=115070.66666666667, ans=0.0 2024-09-17 05:30:26,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=115070.66666666667, ans=0.0 2024-09-17 05:30:28,009 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.131e+02 2.832e+02 3.416e+02 4.283e+02 7.365e+02, threshold=6.833e+02, percent-clipped=3.0 2024-09-17 05:30:39,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=115117.33333333333, ans=0.05 2024-09-17 05:30:44,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=115117.33333333333, ans=0.0 2024-09-17 05:30:57,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115164.0, ans=0.1 2024-09-17 05:31:11,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=115210.66666666667, ans=0.125 2024-09-17 05:31:23,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.84 vs. limit=15.0 2024-09-17 05:31:24,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=115257.33333333333, ans=0.125 2024-09-17 05:31:40,147 INFO [train.py:1198] (1/2) Epoch 7, batch 1450, loss[loss=0.3074, simple_loss=0.336, pruned_loss=0.1091, ctc_loss=0.2091, cr_loss=0.4705, over 34417.00 frames. ], tot_loss[loss=0.2897, simple_loss=0.3165, pruned_loss=0.1025, ctc_loss=0.1985, cr_loss=0.4542, over 6773129.31 frames. ], batch size: 110, lr: 1.72e-02, grad_scale: 32.0 2024-09-17 05:31:55,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=115350.66666666667, ans=0.2 2024-09-17 05:32:05,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=115350.66666666667, ans=0.125 2024-09-17 05:32:41,751 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.09 vs. limit=10.0 2024-09-17 05:32:44,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=115490.66666666667, ans=0.0 2024-09-17 05:32:49,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=115490.66666666667, ans=0.0 2024-09-17 05:32:56,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=115490.66666666667, ans=0.0 2024-09-17 05:33:04,193 INFO [train.py:1198] (1/2) Epoch 7, batch 1500, loss[loss=0.2916, simple_loss=0.3219, pruned_loss=0.1012, ctc_loss=0.1996, cr_loss=0.4695, over 34471.00 frames. ], tot_loss[loss=0.2898, simple_loss=0.3168, pruned_loss=0.1024, ctc_loss=0.1985, cr_loss=0.4547, over 6773776.37 frames. ], batch size: 100, lr: 1.72e-02, grad_scale: 32.0 2024-09-17 05:33:06,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=115537.33333333333, ans=0.0 2024-09-17 05:33:13,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.47 vs. limit=15.0 2024-09-17 05:33:14,135 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.283e+02 2.672e+02 3.066e+02 4.119e+02 6.766e+02, threshold=6.131e+02, percent-clipped=0.0 2024-09-17 05:33:19,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=115584.0, ans=0.125 2024-09-17 05:34:28,962 INFO [train.py:1198] (1/2) Epoch 7, batch 1550, loss[loss=0.2935, simple_loss=0.3232, pruned_loss=0.1027, ctc_loss=0.1936, cr_loss=0.4885, over 34445.00 frames. ], tot_loss[loss=0.2904, simple_loss=0.3171, pruned_loss=0.1028, ctc_loss=0.1991, cr_loss=0.4545, over 6745014.97 frames. ], batch size: 105, lr: 1.72e-02, grad_scale: 32.0 2024-09-17 05:34:30,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115770.66666666667, ans=0.1 2024-09-17 05:35:11,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=115864.0, ans=0.0 2024-09-17 05:35:31,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115910.66666666667, ans=0.1 2024-09-17 05:35:33,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2024-09-17 05:35:37,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=115957.33333333333, ans=0.025 2024-09-17 05:35:41,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=115957.33333333333, ans=0.0 2024-09-17 05:35:42,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=115957.33333333333, ans=0.5 2024-09-17 05:35:42,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=115957.33333333333, ans=0.125 2024-09-17 05:35:42,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=115957.33333333333, ans=0.0 2024-09-17 05:35:44,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=115957.33333333333, ans=0.125 2024-09-17 05:35:50,320 INFO [train.py:1198] (1/2) Epoch 7, batch 1600, loss[loss=0.3075, simple_loss=0.3341, pruned_loss=0.1096, ctc_loss=0.2112, cr_loss=0.487, over 34570.00 frames. ], tot_loss[loss=0.2908, simple_loss=0.3171, pruned_loss=0.1031, ctc_loss=0.1997, cr_loss=0.4555, over 6725507.69 frames. ], batch size: 99, lr: 1.72e-02, grad_scale: 32.0 2024-09-17 05:36:00,039 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.208e+02 2.849e+02 3.372e+02 4.117e+02 8.114e+02, threshold=6.744e+02, percent-clipped=3.0 2024-09-17 05:36:10,564 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:36:39,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.97 vs. limit=15.0 2024-09-17 05:36:45,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=116144.0, ans=0.125 2024-09-17 05:36:55,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=116144.0, ans=0.0 2024-09-17 05:37:01,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=116190.66666666667, ans=0.125 2024-09-17 05:37:14,243 INFO [train.py:1198] (1/2) Epoch 7, batch 1650, loss[loss=0.2741, simple_loss=0.313, pruned_loss=0.0905, ctc_loss=0.1824, cr_loss=0.4413, over 34396.00 frames. ], tot_loss[loss=0.2904, simple_loss=0.3169, pruned_loss=0.1029, ctc_loss=0.1995, cr_loss=0.4559, over 6718883.12 frames. ], batch size: 103, lr: 1.72e-02, grad_scale: 32.0 2024-09-17 05:37:16,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116237.33333333333, ans=0.1 2024-09-17 05:37:21,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=116237.33333333333, ans=0.025 2024-09-17 05:37:26,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=116237.33333333333, ans=0.09899494936611666 2024-09-17 05:37:33,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=22.5 2024-09-17 05:37:39,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=116284.0, ans=0.0 2024-09-17 05:37:39,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=116284.0, ans=0.125 2024-09-17 05:37:40,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=116284.0, ans=0.125 2024-09-17 05:37:57,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.53 vs. limit=6.0 2024-09-17 05:38:06,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=116377.33333333333, ans=0.1 2024-09-17 05:38:07,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=116377.33333333333, ans=0.125 2024-09-17 05:38:14,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=116377.33333333333, ans=0.125 2024-09-17 05:38:15,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=116377.33333333333, ans=0.125 2024-09-17 05:38:18,312 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.47 vs. limit=10.0 2024-09-17 05:38:22,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=116424.0, ans=0.125 2024-09-17 05:38:27,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=116424.0, ans=0.2 2024-09-17 05:38:30,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=116424.0, ans=0.125 2024-09-17 05:38:36,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=116470.66666666667, ans=0.025 2024-09-17 05:38:38,326 INFO [train.py:1198] (1/2) Epoch 7, batch 1700, loss[loss=0.2468, simple_loss=0.2798, pruned_loss=0.08274, ctc_loss=0.162, cr_loss=0.3966, over 34318.00 frames. ], tot_loss[loss=0.2898, simple_loss=0.3165, pruned_loss=0.1026, ctc_loss=0.1989, cr_loss=0.4553, over 6743944.66 frames. ], batch size: 80, lr: 1.71e-02, grad_scale: 32.0 2024-09-17 05:38:48,004 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.230e+02 2.675e+02 3.225e+02 4.147e+02 8.145e+02, threshold=6.450e+02, percent-clipped=5.0 2024-09-17 05:38:54,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=116517.33333333333, ans=0.0 2024-09-17 05:39:22,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=116564.0, ans=0.125 2024-09-17 05:39:29,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=116610.66666666667, ans=0.0 2024-09-17 05:39:59,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=116704.0, ans=0.0 2024-09-17 05:40:00,707 INFO [train.py:1198] (1/2) Epoch 7, batch 1750, loss[loss=0.2441, simple_loss=0.2744, pruned_loss=0.08247, ctc_loss=0.1627, cr_loss=0.4069, over 34179.00 frames. ], tot_loss[loss=0.2892, simple_loss=0.316, pruned_loss=0.1023, ctc_loss=0.1984, cr_loss=0.4546, over 6752595.99 frames. ], batch size: 78, lr: 1.71e-02, grad_scale: 16.0 2024-09-17 05:40:07,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=116704.0, ans=0.1 2024-09-17 05:40:22,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=116750.66666666667, ans=0.0 2024-09-17 05:40:43,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=116797.33333333333, ans=0.125 2024-09-17 05:41:16,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116890.66666666667, ans=0.1 2024-09-17 05:41:23,830 INFO [train.py:1198] (1/2) Epoch 7, batch 1800, loss[loss=0.3003, simple_loss=0.3275, pruned_loss=0.1065, ctc_loss=0.2073, cr_loss=0.4646, over 34687.00 frames. ], tot_loss[loss=0.2896, simple_loss=0.3161, pruned_loss=0.1025, ctc_loss=0.1988, cr_loss=0.4552, over 6755790.58 frames. ], batch size: 97, lr: 1.71e-02, grad_scale: 16.0 2024-09-17 05:41:35,497 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.644e+02 3.062e+02 3.891e+02 9.310e+02, threshold=6.123e+02, percent-clipped=5.0 2024-09-17 05:41:42,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.93 vs. limit=15.0 2024-09-17 05:41:54,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=116984.0, ans=0.125 2024-09-17 05:42:28,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=117077.33333333333, ans=0.0 2024-09-17 05:42:33,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=117124.0, ans=0.02 2024-09-17 05:42:48,345 INFO [train.py:1198] (1/2) Epoch 7, batch 1850, loss[loss=0.309, simple_loss=0.3317, pruned_loss=0.1125, ctc_loss=0.2108, cr_loss=0.4798, over 34440.00 frames. ], tot_loss[loss=0.2889, simple_loss=0.3156, pruned_loss=0.1022, ctc_loss=0.1982, cr_loss=0.4547, over 6763624.73 frames. ], batch size: 100, lr: 1.71e-02, grad_scale: 16.0 2024-09-17 05:43:09,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=117217.33333333333, ans=0.025 2024-09-17 05:43:22,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=117264.0, ans=0.0 2024-09-17 05:43:24,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=117264.0, ans=0.125 2024-09-17 05:43:30,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=117264.0, ans=0.125 2024-09-17 05:43:37,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=117310.66666666667, ans=0.125 2024-09-17 05:43:45,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=117310.66666666667, ans=0.125 2024-09-17 05:43:56,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.32 vs. limit=22.5 2024-09-17 05:44:10,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=117404.0, ans=0.05 2024-09-17 05:44:11,550 INFO [train.py:1198] (1/2) Epoch 7, batch 1900, loss[loss=0.2831, simple_loss=0.3183, pruned_loss=0.09569, ctc_loss=0.1909, cr_loss=0.4577, over 34388.00 frames. ], tot_loss[loss=0.289, simple_loss=0.316, pruned_loss=0.1021, ctc_loss=0.1981, cr_loss=0.455, over 6773202.90 frames. ], batch size: 103, lr: 1.71e-02, grad_scale: 16.0 2024-09-17 05:44:11,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=117404.0, ans=0.125 2024-09-17 05:44:22,949 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.232e+02 2.835e+02 3.426e+02 4.523e+02 7.778e+02, threshold=6.853e+02, percent-clipped=6.0 2024-09-17 05:44:28,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=117450.66666666667, ans=0.0 2024-09-17 05:44:43,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=117497.33333333333, ans=0.025 2024-09-17 05:45:07,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.61 vs. limit=15.0 2024-09-17 05:45:36,096 INFO [train.py:1198] (1/2) Epoch 7, batch 1950, loss[loss=0.2804, simple_loss=0.3132, pruned_loss=0.09626, ctc_loss=0.1869, cr_loss=0.4422, over 34383.00 frames. ], tot_loss[loss=0.2905, simple_loss=0.3174, pruned_loss=0.1028, ctc_loss=0.1991, cr_loss=0.4567, over 6790318.05 frames. ], batch size: 91, lr: 1.71e-02, grad_scale: 16.0 2024-09-17 05:45:38,879 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.01 vs. limit=15.0 2024-09-17 05:45:44,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=117637.33333333333, ans=0.5 2024-09-17 05:45:54,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=117684.0, ans=0.1 2024-09-17 05:46:43,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=117824.0, ans=0.125 2024-09-17 05:46:58,257 INFO [train.py:1198] (1/2) Epoch 7, batch 2000, loss[loss=0.2613, simple_loss=0.2886, pruned_loss=0.09103, ctc_loss=0.1767, cr_loss=0.4151, over 34172.00 frames. ], tot_loss[loss=0.2915, simple_loss=0.3183, pruned_loss=0.1032, ctc_loss=0.2001, cr_loss=0.4574, over 6764932.05 frames. ], batch size: 78, lr: 1.70e-02, grad_scale: 32.0 2024-09-17 05:47:09,765 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.295e+02 2.685e+02 3.093e+02 4.131e+02 7.354e+02, threshold=6.185e+02, percent-clipped=2.0 2024-09-17 05:47:40,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=117964.0, ans=0.0 2024-09-17 05:47:46,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=117964.0, ans=0.0 2024-09-17 05:47:49,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=118010.66666666667, ans=0.0 2024-09-17 05:48:22,381 INFO [train.py:1198] (1/2) Epoch 7, batch 2050, loss[loss=0.2612, simple_loss=0.2933, pruned_loss=0.08862, ctc_loss=0.1763, cr_loss=0.4114, over 34493.00 frames. ], tot_loss[loss=0.2904, simple_loss=0.317, pruned_loss=0.1029, ctc_loss=0.1994, cr_loss=0.456, over 6756944.14 frames. ], batch size: 82, lr: 1.70e-02, grad_scale: 32.0 2024-09-17 05:48:26,750 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2024-09-17 05:48:27,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=118104.0, ans=0.0 2024-09-17 05:48:35,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=118104.0, ans=0.125 2024-09-17 05:48:40,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=118150.66666666667, ans=0.1 2024-09-17 05:48:41,029 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:48:49,206 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:49:03,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=118197.33333333333, ans=0.0 2024-09-17 05:49:20,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.39 vs. limit=22.5 2024-09-17 05:49:45,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2024-09-17 05:49:46,537 INFO [train.py:1198] (1/2) Epoch 7, batch 2100, loss[loss=0.2948, simple_loss=0.3232, pruned_loss=0.104, ctc_loss=0.2005, cr_loss=0.4591, over 34530.00 frames. ], tot_loss[loss=0.2886, simple_loss=0.3157, pruned_loss=0.1019, ctc_loss=0.1977, cr_loss=0.4539, over 6769148.81 frames. ], batch size: 94, lr: 1.70e-02, grad_scale: 32.0 2024-09-17 05:49:57,949 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.199e+02 2.738e+02 3.387e+02 4.743e+02 7.569e+02, threshold=6.775e+02, percent-clipped=8.0 2024-09-17 05:50:01,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=118384.0, ans=0.125 2024-09-17 05:50:11,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=118384.0, ans=0.0 2024-09-17 05:50:33,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=15.11 vs. limit=15.0 2024-09-17 05:50:37,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=118477.33333333333, ans=0.1 2024-09-17 05:50:45,833 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:50:46,407 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.01 vs. limit=15.0 2024-09-17 05:51:03,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.36 vs. limit=6.0 2024-09-17 05:51:09,037 INFO [train.py:1198] (1/2) Epoch 7, batch 2150, loss[loss=0.2774, simple_loss=0.3086, pruned_loss=0.09532, ctc_loss=0.1899, cr_loss=0.4401, over 34364.00 frames. ], tot_loss[loss=0.2875, simple_loss=0.3149, pruned_loss=0.1013, ctc_loss=0.1966, cr_loss=0.453, over 6787823.34 frames. ], batch size: 91, lr: 1.70e-02, grad_scale: 32.0 2024-09-17 05:51:09,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=118570.66666666667, ans=0.125 2024-09-17 05:51:11,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.79 vs. limit=15.0 2024-09-17 05:51:11,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=118570.66666666667, ans=15.0 2024-09-17 05:51:31,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=118617.33333333333, ans=0.125 2024-09-17 05:51:58,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=118710.66666666667, ans=0.2 2024-09-17 05:52:07,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=118710.66666666667, ans=0.125 2024-09-17 05:52:10,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=118710.66666666667, ans=0.125 2024-09-17 05:52:13,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=118710.66666666667, ans=0.125 2024-09-17 05:52:13,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=118710.66666666667, ans=0.0 2024-09-17 05:52:32,798 INFO [train.py:1198] (1/2) Epoch 7, batch 2200, loss[loss=0.2992, simple_loss=0.3283, pruned_loss=0.1042, ctc_loss=0.208, cr_loss=0.5024, over 34451.00 frames. ], tot_loss[loss=0.2877, simple_loss=0.3152, pruned_loss=0.1014, ctc_loss=0.1967, cr_loss=0.4531, over 6783892.72 frames. ], batch size: 100, lr: 1.70e-02, grad_scale: 16.0 2024-09-17 05:52:45,982 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.216e+02 2.932e+02 4.001e+02 5.500e+02 1.081e+03, threshold=8.001e+02, percent-clipped=10.0 2024-09-17 05:53:04,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=118897.33333333333, ans=0.0 2024-09-17 05:53:18,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=118897.33333333333, ans=0.125 2024-09-17 05:53:24,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=118944.0, ans=0.1 2024-09-17 05:53:49,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=118990.66666666667, ans=0.125 2024-09-17 05:53:55,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=119037.33333333333, ans=0.5 2024-09-17 05:53:55,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=119037.33333333333, ans=0.2 2024-09-17 05:53:57,273 INFO [train.py:1198] (1/2) Epoch 7, batch 2250, loss[loss=0.2791, simple_loss=0.3139, pruned_loss=0.09513, ctc_loss=0.1866, cr_loss=0.4189, over 34403.00 frames. ], tot_loss[loss=0.2878, simple_loss=0.3152, pruned_loss=0.1014, ctc_loss=0.1966, cr_loss=0.4525, over 6780133.39 frames. ], batch size: 95, lr: 1.70e-02, grad_scale: 16.0 2024-09-17 05:54:00,074 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-09-17 05:54:17,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=119084.0, ans=0.125 2024-09-17 05:54:19,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=119084.0, ans=0.125 2024-09-17 05:54:26,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.60 vs. limit=15.0 2024-09-17 05:54:32,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=119130.66666666667, ans=0.125 2024-09-17 05:54:42,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.24 vs. limit=6.0 2024-09-17 05:55:11,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=119224.0, ans=0.125 2024-09-17 05:55:11,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=119224.0, ans=0.125 2024-09-17 05:55:16,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=119224.0, ans=0.125 2024-09-17 05:55:21,166 INFO [train.py:1198] (1/2) Epoch 7, batch 2300, loss[loss=0.2597, simple_loss=0.2896, pruned_loss=0.08905, ctc_loss=0.1766, cr_loss=0.4111, over 34240.00 frames. ], tot_loss[loss=0.2865, simple_loss=0.3139, pruned_loss=0.101, ctc_loss=0.1958, cr_loss=0.451, over 6766166.66 frames. ], batch size: 83, lr: 1.69e-02, grad_scale: 16.0 2024-09-17 05:55:34,181 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.097e+02 2.938e+02 3.676e+02 5.047e+02 8.946e+02, threshold=7.352e+02, percent-clipped=2.0 2024-09-17 05:56:02,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=119364.0, ans=10.0 2024-09-17 05:56:09,661 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.82 vs. limit=15.0 2024-09-17 05:56:22,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.47 vs. limit=10.0 2024-09-17 05:56:43,167 INFO [train.py:1198] (1/2) Epoch 7, batch 2350, loss[loss=0.2875, simple_loss=0.3166, pruned_loss=0.1008, ctc_loss=0.1953, cr_loss=0.4428, over 34684.00 frames. ], tot_loss[loss=0.2864, simple_loss=0.3139, pruned_loss=0.1008, ctc_loss=0.1957, cr_loss=0.451, over 6772231.40 frames. ], batch size: 97, lr: 1.69e-02, grad_scale: 16.0 2024-09-17 05:56:49,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=119504.0, ans=0.0 2024-09-17 05:57:08,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=119550.66666666667, ans=0.125 2024-09-17 05:57:21,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=119597.33333333333, ans=0.0 2024-09-17 05:57:36,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=119644.0, ans=0.025 2024-09-17 05:57:54,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=119690.66666666667, ans=0.125 2024-09-17 05:58:07,047 INFO [train.py:1198] (1/2) Epoch 7, batch 2400, loss[loss=0.2661, simple_loss=0.2967, pruned_loss=0.09092, ctc_loss=0.1806, cr_loss=0.4402, over 34563.00 frames. ], tot_loss[loss=0.2871, simple_loss=0.3147, pruned_loss=0.1011, ctc_loss=0.1962, cr_loss=0.4524, over 6777085.48 frames. ], batch size: 89, lr: 1.69e-02, grad_scale: 32.0 2024-09-17 05:58:20,143 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.147e+02 2.720e+02 3.718e+02 4.783e+02 8.658e+02, threshold=7.436e+02, percent-clipped=4.0 2024-09-17 05:58:37,936 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.98 vs. limit=6.0 2024-09-17 05:58:41,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.24 vs. limit=22.5 2024-09-17 05:58:51,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=119830.66666666667, ans=0.1 2024-09-17 05:59:14,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=119924.0, ans=0.2 2024-09-17 05:59:26,483 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 05:59:30,948 INFO [train.py:1198] (1/2) Epoch 7, batch 2450, loss[loss=0.2981, simple_loss=0.3294, pruned_loss=0.1034, ctc_loss=0.2071, cr_loss=0.4659, over 34419.00 frames. ], tot_loss[loss=0.288, simple_loss=0.3156, pruned_loss=0.1015, ctc_loss=0.1969, cr_loss=0.453, over 6751153.25 frames. ], batch size: 95, lr: 1.69e-02, grad_scale: 32.0 2024-09-17 05:59:34,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=119970.66666666667, ans=15.0 2024-09-17 05:59:51,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.94 vs. limit=22.5 2024-09-17 06:00:25,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=120110.66666666667, ans=0.0 2024-09-17 06:00:54,923 INFO [train.py:1198] (1/2) Epoch 7, batch 2500, loss[loss=0.2969, simple_loss=0.3223, pruned_loss=0.1063, ctc_loss=0.2087, cr_loss=0.4301, over 34453.00 frames. ], tot_loss[loss=0.2882, simple_loss=0.3157, pruned_loss=0.1015, ctc_loss=0.1971, cr_loss=0.4538, over 6763586.14 frames. ], batch size: 100, lr: 1.69e-02, grad_scale: 32.0 2024-09-17 06:01:08,224 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.383e+02 3.055e+02 3.767e+02 4.929e+02 8.187e+02, threshold=7.535e+02, percent-clipped=2.0 2024-09-17 06:01:10,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=120250.66666666667, ans=0.125 2024-09-17 06:01:26,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=120297.33333333333, ans=0.0 2024-09-17 06:01:28,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=120297.33333333333, ans=0.025 2024-09-17 06:01:28,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=120297.33333333333, ans=0.1 2024-09-17 06:01:33,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120297.33333333333, ans=0.1 2024-09-17 06:01:38,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.77 vs. limit=22.5 2024-09-17 06:01:44,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=120344.0, ans=0.0 2024-09-17 06:02:03,007 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:02:05,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.26 vs. limit=22.5 2024-09-17 06:02:17,808 INFO [train.py:1198] (1/2) Epoch 7, batch 2550, loss[loss=0.2454, simple_loss=0.2715, pruned_loss=0.08517, ctc_loss=0.1662, cr_loss=0.3931, over 34163.00 frames. ], tot_loss[loss=0.288, simple_loss=0.3154, pruned_loss=0.1015, ctc_loss=0.1968, cr_loss=0.4531, over 6767077.33 frames. ], batch size: 78, lr: 1.69e-02, grad_scale: 32.0 2024-09-17 06:02:29,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=120437.33333333333, ans=0.125 2024-09-17 06:02:32,074 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2024-09-17 06:02:55,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=120530.66666666667, ans=0.1 2024-09-17 06:03:05,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120530.66666666667, ans=0.1 2024-09-17 06:03:07,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=120577.33333333333, ans=0.0 2024-09-17 06:03:18,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=120577.33333333333, ans=0.125 2024-09-17 06:03:41,341 INFO [train.py:1198] (1/2) Epoch 7, batch 2600, loss[loss=0.2844, simple_loss=0.3092, pruned_loss=0.1009, ctc_loss=0.195, cr_loss=0.469, over 34352.00 frames. ], tot_loss[loss=0.2887, simple_loss=0.3161, pruned_loss=0.1019, ctc_loss=0.1974, cr_loss=0.4543, over 6763868.92 frames. ], batch size: 91, lr: 1.69e-02, grad_scale: 32.0 2024-09-17 06:03:54,360 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.224e+02 2.799e+02 3.378e+02 4.129e+02 8.903e+02, threshold=6.756e+02, percent-clipped=1.0 2024-09-17 06:03:59,608 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:04:08,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.58 vs. limit=22.5 2024-09-17 06:05:04,867 INFO [train.py:1198] (1/2) Epoch 7, batch 2650, loss[loss=0.2969, simple_loss=0.3242, pruned_loss=0.1042, ctc_loss=0.206, cr_loss=0.4973, over 34241.00 frames. ], tot_loss[loss=0.2889, simple_loss=0.3164, pruned_loss=0.1019, ctc_loss=0.1975, cr_loss=0.4558, over 6771210.20 frames. ], batch size: 117, lr: 1.68e-02, grad_scale: 32.0 2024-09-17 06:05:37,941 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.086e-02 2024-09-17 06:05:42,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=120997.33333333333, ans=0.2 2024-09-17 06:05:44,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.42 vs. limit=22.5 2024-09-17 06:05:52,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.31 vs. limit=22.5 2024-09-17 06:06:08,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=121044.0, ans=0.0 2024-09-17 06:06:17,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=121090.66666666667, ans=0.1 2024-09-17 06:06:22,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=121090.66666666667, ans=0.2 2024-09-17 06:06:23,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=121090.66666666667, ans=0.0 2024-09-17 06:06:28,400 INFO [train.py:1198] (1/2) Epoch 7, batch 2700, loss[loss=0.2938, simple_loss=0.3252, pruned_loss=0.1019, ctc_loss=0.2003, cr_loss=0.4636, over 34620.00 frames. ], tot_loss[loss=0.2888, simple_loss=0.3165, pruned_loss=0.1017, ctc_loss=0.1973, cr_loss=0.455, over 6765887.35 frames. ], batch size: 102, lr: 1.68e-02, grad_scale: 16.0 2024-09-17 06:06:28,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=121137.33333333333, ans=0.0 2024-09-17 06:06:43,157 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.760e+02 3.231e+02 4.150e+02 7.383e+02, threshold=6.462e+02, percent-clipped=1.0 2024-09-17 06:06:49,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.09 vs. limit=15.0 2024-09-17 06:06:58,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=121184.0, ans=0.1 2024-09-17 06:07:03,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=121230.66666666667, ans=0.125 2024-09-17 06:07:03,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=121230.66666666667, ans=0.125 2024-09-17 06:07:25,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=121277.33333333333, ans=0.1 2024-09-17 06:07:31,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=121277.33333333333, ans=0.125 2024-09-17 06:07:48,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=121324.0, ans=0.0 2024-09-17 06:07:51,369 INFO [train.py:1198] (1/2) Epoch 7, batch 2750, loss[loss=0.2855, simple_loss=0.3093, pruned_loss=0.1023, ctc_loss=0.1956, cr_loss=0.4477, over 34622.00 frames. ], tot_loss[loss=0.2873, simple_loss=0.315, pruned_loss=0.1011, ctc_loss=0.1963, cr_loss=0.4529, over 6763447.75 frames. ], batch size: 88, lr: 1.68e-02, grad_scale: 16.0 2024-09-17 06:07:51,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=121370.66666666667, ans=0.125 2024-09-17 06:07:54,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.89 vs. limit=5.0 2024-09-17 06:08:00,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=121370.66666666667, ans=0.07 2024-09-17 06:08:23,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=121417.33333333333, ans=0.125 2024-09-17 06:08:24,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=121464.0, ans=0.125 2024-09-17 06:08:25,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.93 vs. limit=15.0 2024-09-17 06:08:31,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=121464.0, ans=0.1 2024-09-17 06:08:37,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=121464.0, ans=0.125 2024-09-17 06:09:14,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=121604.0, ans=0.0 2024-09-17 06:09:15,741 INFO [train.py:1198] (1/2) Epoch 7, batch 2800, loss[loss=0.335, simple_loss=0.3443, pruned_loss=0.1288, ctc_loss=0.2431, cr_loss=0.4854, over 23705.00 frames. ], tot_loss[loss=0.2875, simple_loss=0.3151, pruned_loss=0.1012, ctc_loss=0.1966, cr_loss=0.4538, over 6741576.82 frames. ], batch size: 244, lr: 1.68e-02, grad_scale: 32.0 2024-09-17 06:09:20,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=121604.0, ans=0.0 2024-09-17 06:09:25,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=121604.0, ans=0.0 2024-09-17 06:09:30,387 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.220e+02 2.705e+02 3.267e+02 4.174e+02 7.318e+02, threshold=6.534e+02, percent-clipped=5.0 2024-09-17 06:09:37,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=121650.66666666667, ans=0.0 2024-09-17 06:09:40,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=121650.66666666667, ans=0.025 2024-09-17 06:09:43,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=121650.66666666667, ans=0.2 2024-09-17 06:09:45,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=121650.66666666667, ans=0.125 2024-09-17 06:10:17,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=121744.0, ans=0.1 2024-09-17 06:10:38,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=121837.33333333333, ans=0.0 2024-09-17 06:10:39,818 INFO [train.py:1198] (1/2) Epoch 7, batch 2850, loss[loss=0.2792, simple_loss=0.3073, pruned_loss=0.09713, ctc_loss=0.1917, cr_loss=0.4623, over 34487.00 frames. ], tot_loss[loss=0.2888, simple_loss=0.3159, pruned_loss=0.102, ctc_loss=0.1978, cr_loss=0.4554, over 6725087.42 frames. ], batch size: 90, lr: 1.68e-02, grad_scale: 16.0 2024-09-17 06:10:55,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=121884.0, ans=0.1 2024-09-17 06:11:09,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=121884.0, ans=0.125 2024-09-17 06:11:26,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=121930.66666666667, ans=0.125 2024-09-17 06:11:49,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=15.0 2024-09-17 06:12:03,899 INFO [train.py:1198] (1/2) Epoch 7, batch 2900, loss[loss=0.2843, simple_loss=0.3149, pruned_loss=0.09874, ctc_loss=0.1894, cr_loss=0.4595, over 34538.00 frames. ], tot_loss[loss=0.2897, simple_loss=0.317, pruned_loss=0.1022, ctc_loss=0.1982, cr_loss=0.4572, over 6755425.12 frames. ], batch size: 94, lr: 1.68e-02, grad_scale: 16.0 2024-09-17 06:12:17,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=122070.66666666667, ans=0.125 2024-09-17 06:12:20,377 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.166e+02 2.787e+02 3.457e+02 4.398e+02 7.328e+02, threshold=6.914e+02, percent-clipped=4.0 2024-09-17 06:12:20,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=122117.33333333333, ans=0.2 2024-09-17 06:12:27,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=122117.33333333333, ans=0.0 2024-09-17 06:12:27,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=122117.33333333333, ans=0.125 2024-09-17 06:12:35,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=122164.0, ans=0.5 2024-09-17 06:12:43,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=122164.0, ans=0.125 2024-09-17 06:13:03,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=122210.66666666667, ans=0.125 2024-09-17 06:13:28,222 INFO [train.py:1198] (1/2) Epoch 7, batch 2950, loss[loss=0.2721, simple_loss=0.2963, pruned_loss=0.09641, ctc_loss=0.1864, cr_loss=0.4449, over 34640.00 frames. ], tot_loss[loss=0.2879, simple_loss=0.3154, pruned_loss=0.1014, ctc_loss=0.1968, cr_loss=0.4542, over 6750177.39 frames. ], batch size: 88, lr: 1.68e-02, grad_scale: 16.0 2024-09-17 06:13:28,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=122304.0, ans=0.0 2024-09-17 06:13:33,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122304.0, ans=0.1 2024-09-17 06:14:03,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=122397.33333333333, ans=0.125 2024-09-17 06:14:03,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=122397.33333333333, ans=0.125 2024-09-17 06:14:12,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=122397.33333333333, ans=10.0 2024-09-17 06:14:15,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.18 vs. limit=15.0 2024-09-17 06:14:22,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=122444.0, ans=0.0 2024-09-17 06:14:30,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=122444.0, ans=0.125 2024-09-17 06:14:30,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=122444.0, ans=10.0 2024-09-17 06:14:50,632 INFO [train.py:1198] (1/2) Epoch 7, batch 3000, loss[loss=0.2906, simple_loss=0.3162, pruned_loss=0.1033, ctc_loss=0.197, cr_loss=0.4714, over 34536.00 frames. ], tot_loss[loss=0.2873, simple_loss=0.3148, pruned_loss=0.1012, ctc_loss=0.1965, cr_loss=0.4538, over 6750608.05 frames. ], batch size: 94, lr: 1.67e-02, grad_scale: 16.0 2024-09-17 06:14:50,633 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 06:15:07,482 INFO [train.py:1230] (1/2) Epoch 7, validation: loss=0.1642, simple_loss=0.2628, pruned_loss=0.02715, ctc_loss=0.05723, cr_loss=1.5e-14, over 944034.00 frames. 2024-09-17 06:15:07,483 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 06:15:07,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=122537.33333333333, ans=0.05 2024-09-17 06:15:18,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2024-09-17 06:15:24,122 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.417e+02 2.802e+02 3.349e+02 4.460e+02 7.502e+02, threshold=6.698e+02, percent-clipped=2.0 2024-09-17 06:15:26,785 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=12.0 2024-09-17 06:15:31,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=122584.0, ans=0.125 2024-09-17 06:15:40,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=122630.66666666667, ans=0.125 2024-09-17 06:16:06,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=122677.33333333333, ans=0.025 2024-09-17 06:16:31,074 INFO [train.py:1198] (1/2) Epoch 7, batch 3050, loss[loss=0.2741, simple_loss=0.3019, pruned_loss=0.09523, ctc_loss=0.1898, cr_loss=0.4456, over 34599.00 frames. ], tot_loss[loss=0.2878, simple_loss=0.3152, pruned_loss=0.1014, ctc_loss=0.1968, cr_loss=0.4546, over 6742412.79 frames. ], batch size: 89, lr: 1.67e-02, grad_scale: 16.0 2024-09-17 06:16:45,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=122817.33333333333, ans=0.0 2024-09-17 06:16:50,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=122817.33333333333, ans=0.125 2024-09-17 06:16:56,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=122817.33333333333, ans=0.1 2024-09-17 06:17:11,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=122864.0, ans=0.125 2024-09-17 06:17:17,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=122910.66666666667, ans=0.125 2024-09-17 06:17:32,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=122910.66666666667, ans=0.0 2024-09-17 06:17:47,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2024-09-17 06:17:48,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=122957.33333333333, ans=0.125 2024-09-17 06:17:53,205 INFO [train.py:1198] (1/2) Epoch 7, batch 3100, loss[loss=0.3224, simple_loss=0.346, pruned_loss=0.1164, ctc_loss=0.2268, cr_loss=0.5174, over 34242.00 frames. ], tot_loss[loss=0.2873, simple_loss=0.3147, pruned_loss=0.1012, ctc_loss=0.1963, cr_loss=0.4544, over 6742960.90 frames. ], batch size: 117, lr: 1.67e-02, grad_scale: 16.0 2024-09-17 06:17:55,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=123004.0, ans=0.0 2024-09-17 06:18:09,377 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.194e+02 2.762e+02 3.336e+02 4.315e+02 8.681e+02, threshold=6.673e+02, percent-clipped=4.0 2024-09-17 06:18:14,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=123050.66666666667, ans=0.125 2024-09-17 06:18:22,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=123050.66666666667, ans=0.0 2024-09-17 06:18:34,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2024-09-17 06:18:35,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=123097.33333333333, ans=0.1 2024-09-17 06:18:37,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.61 vs. limit=12.0 2024-09-17 06:18:57,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=123190.66666666667, ans=0.025 2024-09-17 06:19:14,199 INFO [train.py:1198] (1/2) Epoch 7, batch 3150, loss[loss=0.3003, simple_loss=0.3283, pruned_loss=0.1057, ctc_loss=0.2069, cr_loss=0.4889, over 33825.00 frames. ], tot_loss[loss=0.2869, simple_loss=0.3146, pruned_loss=0.101, ctc_loss=0.1959, cr_loss=0.4536, over 6748323.53 frames. ], batch size: 122, lr: 1.67e-02, grad_scale: 16.0 2024-09-17 06:19:19,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=123237.33333333333, ans=0.0 2024-09-17 06:19:21,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=123237.33333333333, ans=0.125 2024-09-17 06:19:45,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2024-09-17 06:19:58,676 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2024-09-17 06:20:14,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=123377.33333333333, ans=0.125 2024-09-17 06:20:15,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=123377.33333333333, ans=0.0 2024-09-17 06:20:34,746 INFO [train.py:1198] (1/2) Epoch 7, batch 3200, loss[loss=0.2804, simple_loss=0.3089, pruned_loss=0.09741, ctc_loss=0.1932, cr_loss=0.4615, over 34505.00 frames. ], tot_loss[loss=0.2863, simple_loss=0.314, pruned_loss=0.1007, ctc_loss=0.1953, cr_loss=0.4526, over 6761525.74 frames. ], batch size: 94, lr: 1.67e-02, grad_scale: 32.0 2024-09-17 06:20:41,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=123470.66666666667, ans=0.125 2024-09-17 06:20:50,774 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.665e+02 3.426e+02 4.426e+02 7.800e+02, threshold=6.852e+02, percent-clipped=3.0 2024-09-17 06:20:51,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=123517.33333333333, ans=0.2 2024-09-17 06:20:55,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=123517.33333333333, ans=0.2 2024-09-17 06:20:57,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=123517.33333333333, ans=0.125 2024-09-17 06:21:04,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=123517.33333333333, ans=0.2 2024-09-17 06:21:09,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=123564.0, ans=0.125 2024-09-17 06:21:22,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=123610.66666666667, ans=0.1 2024-09-17 06:21:25,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=123610.66666666667, ans=0.125 2024-09-17 06:21:48,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2024-09-17 06:21:49,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2024-09-17 06:21:55,532 INFO [train.py:1198] (1/2) Epoch 7, batch 3250, loss[loss=0.2758, simple_loss=0.3095, pruned_loss=0.09338, ctc_loss=0.1825, cr_loss=0.4727, over 34660.00 frames. ], tot_loss[loss=0.2868, simple_loss=0.3147, pruned_loss=0.1008, ctc_loss=0.1956, cr_loss=0.4533, over 6771549.33 frames. ], batch size: 98, lr: 1.67e-02, grad_scale: 32.0 2024-09-17 06:22:23,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=123750.66666666667, ans=0.125 2024-09-17 06:22:31,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=123797.33333333333, ans=0.125 2024-09-17 06:22:50,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=123844.0, ans=0.125 2024-09-17 06:23:17,611 INFO [train.py:1198] (1/2) Epoch 7, batch 3300, loss[loss=0.3004, simple_loss=0.3274, pruned_loss=0.1067, ctc_loss=0.2115, cr_loss=0.4435, over 33126.00 frames. ], tot_loss[loss=0.2851, simple_loss=0.3132, pruned_loss=0.1, ctc_loss=0.1944, cr_loss=0.4515, over 6769192.34 frames. ], batch size: 130, lr: 1.67e-02, grad_scale: 32.0 2024-09-17 06:23:24,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=123937.33333333333, ans=0.1 2024-09-17 06:23:28,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.70 vs. limit=15.0 2024-09-17 06:23:34,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=123984.0, ans=0.0 2024-09-17 06:23:35,299 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.306e+02 2.892e+02 3.540e+02 4.679e+02 9.057e+02, threshold=7.081e+02, percent-clipped=5.0 2024-09-17 06:23:35,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=123984.0, ans=0.0 2024-09-17 06:23:58,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=124030.66666666667, ans=0.0 2024-09-17 06:24:07,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=15.0 2024-09-17 06:24:14,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=124077.33333333333, ans=0.0 2024-09-17 06:24:14,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=124077.33333333333, ans=0.1 2024-09-17 06:24:20,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=124077.33333333333, ans=0.0 2024-09-17 06:24:30,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=124124.0, ans=0.125 2024-09-17 06:24:30,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=124124.0, ans=0.125 2024-09-17 06:24:30,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_na.min_abs, batch_count=124124.0, ans=0.02 2024-09-17 06:24:38,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=124170.66666666667, ans=0.125 2024-09-17 06:24:38,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-17 06:24:39,843 INFO [train.py:1198] (1/2) Epoch 7, batch 3350, loss[loss=0.2964, simple_loss=0.3279, pruned_loss=0.1034, ctc_loss=0.2017, cr_loss=0.4429, over 33882.00 frames. ], tot_loss[loss=0.2868, simple_loss=0.3145, pruned_loss=0.1009, ctc_loss=0.1959, cr_loss=0.4531, over 6743488.24 frames. ], batch size: 122, lr: 1.66e-02, grad_scale: 32.0 2024-09-17 06:24:44,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=124170.66666666667, ans=0.125 2024-09-17 06:24:53,688 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.09 vs. limit=22.5 2024-09-17 06:24:56,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=124217.33333333333, ans=0.2 2024-09-17 06:25:31,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=124310.66666666667, ans=0.125 2024-09-17 06:25:36,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=124310.66666666667, ans=0.025 2024-09-17 06:25:38,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=124310.66666666667, ans=0.125 2024-09-17 06:26:00,405 INFO [train.py:1198] (1/2) Epoch 7, batch 3400, loss[loss=0.2494, simple_loss=0.2802, pruned_loss=0.08469, ctc_loss=0.1644, cr_loss=0.4087, over 34164.00 frames. ], tot_loss[loss=0.287, simple_loss=0.3146, pruned_loss=0.101, ctc_loss=0.1961, cr_loss=0.4531, over 6734714.14 frames. ], batch size: 78, lr: 1.66e-02, grad_scale: 32.0 2024-09-17 06:26:07,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=22.5 2024-09-17 06:26:12,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.94 vs. limit=15.0 2024-09-17 06:26:16,530 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.182e+02 2.895e+02 3.410e+02 4.317e+02 7.899e+02, threshold=6.820e+02, percent-clipped=2.0 2024-09-17 06:26:25,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.70 vs. limit=22.5 2024-09-17 06:26:28,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-17 06:26:36,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=124497.33333333333, ans=0.125 2024-09-17 06:26:56,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=124544.0, ans=0.07 2024-09-17 06:27:08,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=124590.66666666667, ans=0.5 2024-09-17 06:27:12,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=124590.66666666667, ans=0.1 2024-09-17 06:27:14,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=124590.66666666667, ans=0.1 2024-09-17 06:27:20,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=124637.33333333333, ans=0.125 2024-09-17 06:27:22,030 INFO [train.py:1198] (1/2) Epoch 7, batch 3450, loss[loss=0.3138, simple_loss=0.3399, pruned_loss=0.1135, ctc_loss=0.213, cr_loss=0.4547, over 32925.00 frames. ], tot_loss[loss=0.2874, simple_loss=0.315, pruned_loss=0.1012, ctc_loss=0.1964, cr_loss=0.4532, over 6746860.88 frames. ], batch size: 130, lr: 1.66e-02, grad_scale: 32.0 2024-09-17 06:27:22,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=124637.33333333333, ans=0.125 2024-09-17 06:27:36,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=124684.0, ans=0.125 2024-09-17 06:28:00,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=124730.66666666667, ans=0.125 2024-09-17 06:28:06,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.42 vs. limit=12.0 2024-09-17 06:28:07,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=124730.66666666667, ans=0.125 2024-09-17 06:28:17,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=15.0 2024-09-17 06:28:31,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=124824.0, ans=0.125 2024-09-17 06:28:33,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=124824.0, ans=0.125 2024-09-17 06:28:34,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=124824.0, ans=0.125 2024-09-17 06:28:42,420 INFO [train.py:1198] (1/2) Epoch 7, batch 3500, loss[loss=0.2517, simple_loss=0.2836, pruned_loss=0.08469, ctc_loss=0.1688, cr_loss=0.4168, over 34452.00 frames. ], tot_loss[loss=0.2863, simple_loss=0.3141, pruned_loss=0.1007, ctc_loss=0.1955, cr_loss=0.4524, over 6748543.10 frames. ], batch size: 85, lr: 1.66e-02, grad_scale: 32.0 2024-09-17 06:28:58,489 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.176e+02 2.740e+02 3.381e+02 4.307e+02 7.715e+02, threshold=6.763e+02, percent-clipped=2.0 2024-09-17 06:29:01,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=124917.33333333333, ans=0.0 2024-09-17 06:29:05,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=124917.33333333333, ans=0.125 2024-09-17 06:29:08,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=124917.33333333333, ans=0.0 2024-09-17 06:29:34,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=125010.66666666667, ans=0.0 2024-09-17 06:29:35,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=125010.66666666667, ans=0.0 2024-09-17 06:29:55,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=125057.33333333333, ans=0.0 2024-09-17 06:30:02,845 INFO [train.py:1198] (1/2) Epoch 7, batch 3550, loss[loss=0.288, simple_loss=0.3213, pruned_loss=0.09838, ctc_loss=0.1969, cr_loss=0.462, over 34359.00 frames. ], tot_loss[loss=0.287, simple_loss=0.3147, pruned_loss=0.101, ctc_loss=0.1962, cr_loss=0.4539, over 6758358.65 frames. ], batch size: 103, lr: 1.66e-02, grad_scale: 32.0 2024-09-17 06:30:26,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.66 vs. limit=15.0 2024-09-17 06:30:56,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2024-09-17 06:31:01,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.86 vs. limit=15.0 2024-09-17 06:31:05,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=125290.66666666667, ans=0.125 2024-09-17 06:31:24,093 INFO [train.py:1198] (1/2) Epoch 7, batch 3600, loss[loss=0.2835, simple_loss=0.3107, pruned_loss=0.1001, ctc_loss=0.1925, cr_loss=0.4377, over 34483.00 frames. ], tot_loss[loss=0.2875, simple_loss=0.3151, pruned_loss=0.1012, ctc_loss=0.1965, cr_loss=0.455, over 6768109.05 frames. ], batch size: 90, lr: 1.66e-02, grad_scale: 32.0 2024-09-17 06:31:28,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.74 vs. limit=15.0 2024-09-17 06:31:32,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.23 vs. limit=15.0 2024-09-17 06:31:40,015 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.140e+02 2.669e+02 3.098e+02 4.239e+02 8.486e+02, threshold=6.196e+02, percent-clipped=2.0 2024-09-17 06:31:40,576 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:31:58,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=125430.66666666667, ans=0.0 2024-09-17 06:32:03,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=125430.66666666667, ans=0.1 2024-09-17 06:32:19,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=125477.33333333333, ans=0.0 2024-09-17 06:32:24,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=125477.33333333333, ans=0.125 2024-09-17 06:32:44,897 INFO [train.py:1198] (1/2) Epoch 7, batch 3650, loss[loss=0.3048, simple_loss=0.3324, pruned_loss=0.1084, ctc_loss=0.2104, cr_loss=0.4583, over 34453.00 frames. ], tot_loss[loss=0.2861, simple_loss=0.314, pruned_loss=0.1005, ctc_loss=0.1954, cr_loss=0.4529, over 6771090.02 frames. ], batch size: 110, lr: 1.66e-02, grad_scale: 32.0 2024-09-17 06:34:00,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=125757.33333333333, ans=0.2 2024-09-17 06:34:05,005 INFO [train.py:1198] (1/2) Epoch 7, batch 3700, loss[loss=0.2945, simple_loss=0.3215, pruned_loss=0.1046, ctc_loss=0.2014, cr_loss=0.453, over 34613.00 frames. ], tot_loss[loss=0.2856, simple_loss=0.3137, pruned_loss=0.1002, ctc_loss=0.1949, cr_loss=0.4528, over 6785493.03 frames. ], batch size: 102, lr: 1.65e-02, grad_scale: 32.0 2024-09-17 06:34:21,197 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.096e+02 2.720e+02 3.270e+02 4.005e+02 6.978e+02, threshold=6.541e+02, percent-clipped=3.0 2024-09-17 06:34:31,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=125850.66666666667, ans=0.125 2024-09-17 06:34:38,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.72 vs. limit=22.5 2024-09-17 06:34:48,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=125897.33333333333, ans=0.125 2024-09-17 06:34:52,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=125944.0, ans=0.125 2024-09-17 06:34:56,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=125944.0, ans=0.125 2024-09-17 06:35:14,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=125990.66666666667, ans=0.025 2024-09-17 06:35:27,278 INFO [train.py:1198] (1/2) Epoch 7, batch 3750, loss[loss=0.2875, simple_loss=0.3218, pruned_loss=0.09935, ctc_loss=0.1877, cr_loss=0.4227, over 34355.00 frames. ], tot_loss[loss=0.2891, simple_loss=0.317, pruned_loss=0.1017, ctc_loss=0.1975, cr_loss=0.4567, over 6786008.83 frames. ], batch size: 113, lr: 1.65e-02, grad_scale: 32.0 2024-09-17 06:35:46,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.54 vs. limit=15.0 2024-09-17 06:35:50,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=126084.0, ans=0.125 2024-09-17 06:36:03,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.66 vs. limit=15.0 2024-09-17 06:36:14,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=126177.33333333333, ans=0.1 2024-09-17 06:36:14,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=126177.33333333333, ans=0.0 2024-09-17 06:36:43,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=8.19 vs. limit=12.0 2024-09-17 06:36:47,242 INFO [train.py:1198] (1/2) Epoch 7, batch 3800, loss[loss=0.3235, simple_loss=0.3378, pruned_loss=0.1213, ctc_loss=0.232, cr_loss=0.503, over 30440.00 frames. ], tot_loss[loss=0.2933, simple_loss=0.3202, pruned_loss=0.1039, ctc_loss=0.2015, cr_loss=0.4607, over 6673415.23 frames. ], batch size: 176, lr: 1.65e-02, grad_scale: 32.0 2024-09-17 06:37:00,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=126270.66666666667, ans=0.125 2024-09-17 06:37:04,298 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.285e+02 2.587e+02 2.962e+02 3.552e+02 5.693e+02, threshold=5.924e+02, percent-clipped=0.0 2024-09-17 06:37:11,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=126317.33333333333, ans=0.0 2024-09-17 06:37:16,142 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.43 vs. limit=5.0 2024-09-17 06:37:18,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=126317.33333333333, ans=0.125 2024-09-17 06:37:30,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=11.38 vs. limit=12.0 2024-09-17 06:37:38,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.42 vs. limit=22.5 2024-09-17 06:37:46,425 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:38:01,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=126457.33333333333, ans=0.0 2024-09-17 06:38:09,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=126504.0, ans=10.0 2024-09-17 06:38:10,616 INFO [train.py:1198] (1/2) Epoch 7, batch 3850, loss[loss=0.3382, simple_loss=0.3418, pruned_loss=0.1323, ctc_loss=0.251, cr_loss=0.4953, over 23838.00 frames. ], tot_loss[loss=0.3013, simple_loss=0.3248, pruned_loss=0.1086, ctc_loss=0.2107, cr_loss=0.4633, over 6252470.28 frames. ], batch size: 244, lr: 1.65e-02, grad_scale: 32.0 2024-09-17 06:38:21,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=126504.0, ans=0.0 2024-09-17 06:38:24,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=126504.0, ans=0.07 2024-09-17 06:38:31,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=126550.66666666667, ans=0.1 2024-09-17 06:38:49,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=126597.33333333333, ans=0.04949747468305833 2024-09-17 06:39:40,624 INFO [train.py:1198] (1/2) Epoch 8, batch 0, loss[loss=0.269, simple_loss=0.2988, pruned_loss=0.09291, ctc_loss=0.1815, cr_loss=0.428, over 34501.00 frames. ], tot_loss[loss=0.269, simple_loss=0.2988, pruned_loss=0.09291, ctc_loss=0.1815, cr_loss=0.428, over 34501.00 frames. ], batch size: 85, lr: 1.55e-02, grad_scale: 32.0 2024-09-17 06:39:40,624 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 06:39:59,644 INFO [train.py:1230] (1/2) Epoch 8, validation: loss=0.168, simple_loss=0.2667, pruned_loss=0.02872, ctc_loss=0.05881, cr_loss=1.518e-14, over 944034.00 frames. 2024-09-17 06:39:59,644 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 06:40:16,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=126672.0, ans=0.125 2024-09-17 06:40:28,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=126672.0, ans=0.0 2024-09-17 06:40:47,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=126765.33333333333, ans=0.0 2024-09-17 06:40:55,691 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.319e+02 2.836e+02 3.283e+02 3.751e+02 7.205e+02, threshold=6.565e+02, percent-clipped=2.0 2024-09-17 06:41:21,861 INFO [train.py:1198] (1/2) Epoch 8, batch 50, loss[loss=0.2523, simple_loss=0.2828, pruned_loss=0.08565, ctc_loss=0.1706, cr_loss=0.4078, over 34505.00 frames. ], tot_loss[loss=0.2898, simple_loss=0.3172, pruned_loss=0.1022, ctc_loss=0.199, cr_loss=0.4583, over 1481791.69 frames. ], batch size: 82, lr: 1.55e-02, grad_scale: 32.0 2024-09-17 06:41:23,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=126858.66666666667, ans=0.125 2024-09-17 06:41:34,003 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2024-09-17 06:42:01,209 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.91 vs. limit=15.0 2024-09-17 06:42:19,017 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.66 vs. limit=15.0 2024-09-17 06:42:22,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2024-09-17 06:42:32,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=127045.33333333333, ans=0.125 2024-09-17 06:42:44,907 INFO [train.py:1198] (1/2) Epoch 8, batch 100, loss[loss=0.2662, simple_loss=0.2978, pruned_loss=0.09064, ctc_loss=0.179, cr_loss=0.4372, over 34578.00 frames. ], tot_loss[loss=0.2913, simple_loss=0.3187, pruned_loss=0.1028, ctc_loss=0.1994, cr_loss=0.4599, over 2628792.37 frames. ], batch size: 89, lr: 1.55e-02, grad_scale: 32.0 2024-09-17 06:43:33,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2024-09-17 06:43:42,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=127232.0, ans=15.0 2024-09-17 06:43:44,583 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.018e+02 2.833e+02 3.341e+02 4.593e+02 8.975e+02, threshold=6.682e+02, percent-clipped=5.0 2024-09-17 06:43:46,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=127232.0, ans=0.1 2024-09-17 06:43:48,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=127232.0, ans=0.04949747468305833 2024-09-17 06:44:10,800 INFO [train.py:1198] (1/2) Epoch 8, batch 150, loss[loss=0.2579, simple_loss=0.2895, pruned_loss=0.08745, ctc_loss=0.1753, cr_loss=0.4087, over 34474.00 frames. ], tot_loss[loss=0.2871, simple_loss=0.3155, pruned_loss=0.1007, ctc_loss=0.1957, cr_loss=0.4548, over 3555243.50 frames. ], batch size: 82, lr: 1.55e-02, grad_scale: 32.0 2024-09-17 06:44:44,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.14 vs. limit=10.0 2024-09-17 06:45:08,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=127465.33333333333, ans=0.125 2024-09-17 06:45:32,957 INFO [train.py:1198] (1/2) Epoch 8, batch 200, loss[loss=0.3038, simple_loss=0.3263, pruned_loss=0.1102, ctc_loss=0.2099, cr_loss=0.4739, over 31907.00 frames. ], tot_loss[loss=0.2841, simple_loss=0.3132, pruned_loss=0.09914, ctc_loss=0.1933, cr_loss=0.4525, over 4268971.34 frames. ], batch size: 145, lr: 1.55e-02, grad_scale: 32.0 2024-09-17 06:45:43,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=127558.66666666667, ans=0.125 2024-09-17 06:45:51,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=127605.33333333333, ans=0.1 2024-09-17 06:46:25,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.39 vs. limit=15.0 2024-09-17 06:46:29,180 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.593e+02 3.007e+02 3.899e+02 8.545e+02, threshold=6.014e+02, percent-clipped=2.0 2024-09-17 06:46:43,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=127745.33333333333, ans=0.125 2024-09-17 06:46:57,860 INFO [train.py:1198] (1/2) Epoch 8, batch 250, loss[loss=0.2916, simple_loss=0.3224, pruned_loss=0.1013, ctc_loss=0.1959, cr_loss=0.4763, over 34287.00 frames. ], tot_loss[loss=0.2834, simple_loss=0.3126, pruned_loss=0.09885, ctc_loss=0.1926, cr_loss=0.4514, over 4831234.55 frames. ], batch size: 117, lr: 1.55e-02, grad_scale: 32.0 2024-09-17 06:47:06,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=127792.0, ans=0.0 2024-09-17 06:47:18,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.96 vs. limit=22.5 2024-09-17 06:47:21,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=127838.66666666667, ans=0.07 2024-09-17 06:48:19,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=127978.66666666667, ans=0.125 2024-09-17 06:48:22,648 INFO [train.py:1198] (1/2) Epoch 8, batch 300, loss[loss=0.3087, simple_loss=0.3327, pruned_loss=0.1117, ctc_loss=0.213, cr_loss=0.4649, over 34333.00 frames. ], tot_loss[loss=0.2832, simple_loss=0.3121, pruned_loss=0.0989, ctc_loss=0.1925, cr_loss=0.4508, over 5260079.30 frames. ], batch size: 107, lr: 1.55e-02, grad_scale: 16.0 2024-09-17 06:48:26,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.74 vs. limit=15.0 2024-09-17 06:48:28,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=128025.33333333333, ans=0.125 2024-09-17 06:48:34,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=128025.33333333333, ans=0.0 2024-09-17 06:48:51,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=128072.0, ans=0.0 2024-09-17 06:49:06,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2024-09-17 06:49:07,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=128118.66666666667, ans=0.125 2024-09-17 06:49:08,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.06 vs. limit=15.0 2024-09-17 06:49:20,004 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.102e+02 2.742e+02 3.212e+02 4.166e+02 7.074e+02, threshold=6.424e+02, percent-clipped=7.0 2024-09-17 06:49:22,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=128165.33333333333, ans=0.2 2024-09-17 06:49:41,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=128212.0, ans=0.0 2024-09-17 06:49:42,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.08 vs. limit=22.5 2024-09-17 06:49:44,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.86 vs. limit=22.5 2024-09-17 06:49:44,790 INFO [train.py:1198] (1/2) Epoch 8, batch 350, loss[loss=0.2618, simple_loss=0.29, pruned_loss=0.08977, ctc_loss=0.1814, cr_loss=0.4436, over 34666.00 frames. ], tot_loss[loss=0.2833, simple_loss=0.3124, pruned_loss=0.09885, ctc_loss=0.1924, cr_loss=0.4512, over 5597091.45 frames. ], batch size: 84, lr: 1.54e-02, grad_scale: 16.0 2024-09-17 06:49:53,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=128258.66666666667, ans=0.0 2024-09-17 06:50:08,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=128305.33333333333, ans=0.125 2024-09-17 06:50:17,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=128352.0, ans=0.125 2024-09-17 06:50:50,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.70 vs. limit=22.5 2024-09-17 06:50:55,213 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=15.0 2024-09-17 06:51:08,533 INFO [train.py:1198] (1/2) Epoch 8, batch 400, loss[loss=0.267, simple_loss=0.308, pruned_loss=0.08752, ctc_loss=0.1731, cr_loss=0.4092, over 34421.00 frames. ], tot_loss[loss=0.2827, simple_loss=0.3119, pruned_loss=0.09854, ctc_loss=0.192, cr_loss=0.4507, over 5863806.61 frames. ], batch size: 95, lr: 1.54e-02, grad_scale: 32.0 2024-09-17 06:51:28,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.82 vs. limit=15.0 2024-09-17 06:51:47,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=128585.33333333333, ans=0.0 2024-09-17 06:51:51,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=15.0 2024-09-17 06:52:02,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=128632.0, ans=0.1 2024-09-17 06:52:08,486 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.716e+02 3.436e+02 4.335e+02 7.552e+02, threshold=6.871e+02, percent-clipped=3.0 2024-09-17 06:52:17,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=128678.66666666667, ans=0.0 2024-09-17 06:52:33,488 INFO [train.py:1198] (1/2) Epoch 8, batch 450, loss[loss=0.2853, simple_loss=0.3228, pruned_loss=0.09607, ctc_loss=0.1872, cr_loss=0.4567, over 34716.00 frames. ], tot_loss[loss=0.2824, simple_loss=0.3118, pruned_loss=0.0984, ctc_loss=0.1916, cr_loss=0.4501, over 6054151.29 frames. ], batch size: 97, lr: 1.54e-02, grad_scale: 32.0 2024-09-17 06:53:08,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=128818.66666666667, ans=0.1 2024-09-17 06:53:20,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.61 vs. limit=15.0 2024-09-17 06:53:26,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=128865.33333333333, ans=0.0 2024-09-17 06:53:54,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=128958.66666666667, ans=0.2 2024-09-17 06:53:56,252 INFO [train.py:1198] (1/2) Epoch 8, batch 500, loss[loss=0.2987, simple_loss=0.3252, pruned_loss=0.106, ctc_loss=0.2045, cr_loss=0.4834, over 34448.00 frames. ], tot_loss[loss=0.2808, simple_loss=0.3102, pruned_loss=0.09768, ctc_loss=0.1902, cr_loss=0.4484, over 6219980.96 frames. ], batch size: 110, lr: 1.54e-02, grad_scale: 32.0 2024-09-17 06:54:01,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=128958.66666666667, ans=0.1 2024-09-17 06:54:27,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=129005.33333333333, ans=0.125 2024-09-17 06:54:57,009 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.613e+02 3.294e+02 4.509e+02 9.073e+02, threshold=6.587e+02, percent-clipped=7.0 2024-09-17 06:54:58,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=129098.66666666667, ans=0.0 2024-09-17 06:55:03,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129145.33333333333, ans=0.1 2024-09-17 06:55:21,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=129192.0, ans=0.125 2024-09-17 06:55:22,638 INFO [train.py:1198] (1/2) Epoch 8, batch 550, loss[loss=0.2822, simple_loss=0.3173, pruned_loss=0.09573, ctc_loss=0.1898, cr_loss=0.4426, over 33922.00 frames. ], tot_loss[loss=0.2806, simple_loss=0.3101, pruned_loss=0.09754, ctc_loss=0.1902, cr_loss=0.4489, over 6330147.18 frames. ], batch size: 122, lr: 1.54e-02, grad_scale: 16.0 2024-09-17 06:55:39,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=129238.66666666667, ans=0.125 2024-09-17 06:55:41,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=129238.66666666667, ans=0.1 2024-09-17 06:55:42,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=129238.66666666667, ans=0.0 2024-09-17 06:55:56,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=129285.33333333333, ans=0.125 2024-09-17 06:56:25,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=129332.0, ans=0.125 2024-09-17 06:56:27,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=129378.66666666667, ans=0.0 2024-09-17 06:56:28,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=129378.66666666667, ans=0.0 2024-09-17 06:56:29,785 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2024-09-17 06:56:38,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=129378.66666666667, ans=0.2 2024-09-17 06:56:38,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=129378.66666666667, ans=0.07 2024-09-17 06:56:45,036 INFO [train.py:1198] (1/2) Epoch 8, batch 600, loss[loss=0.3005, simple_loss=0.3337, pruned_loss=0.1041, ctc_loss=0.2075, cr_loss=0.4413, over 34196.00 frames. ], tot_loss[loss=0.2807, simple_loss=0.3103, pruned_loss=0.09754, ctc_loss=0.1901, cr_loss=0.4488, over 6432792.71 frames. ], batch size: 117, lr: 1.54e-02, grad_scale: 16.0 2024-09-17 06:56:45,458 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:57:07,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.79 vs. limit=12.0 2024-09-17 06:57:29,810 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 06:57:32,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=129565.33333333333, ans=0.025 2024-09-17 06:57:39,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=129565.33333333333, ans=0.2 2024-09-17 06:57:43,945 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.757e+02 3.404e+02 4.208e+02 9.208e+02, threshold=6.808e+02, percent-clipped=5.0 2024-09-17 06:58:08,824 INFO [train.py:1198] (1/2) Epoch 8, batch 650, loss[loss=0.2705, simple_loss=0.303, pruned_loss=0.09223, ctc_loss=0.1799, cr_loss=0.4395, over 34543.00 frames. ], tot_loss[loss=0.2798, simple_loss=0.3096, pruned_loss=0.09707, ctc_loss=0.1892, cr_loss=0.4479, over 6523396.40 frames. ], batch size: 94, lr: 1.54e-02, grad_scale: 16.0 2024-09-17 06:58:09,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=129658.66666666667, ans=0.0 2024-09-17 06:58:16,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.24 vs. limit=12.0 2024-09-17 06:58:32,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=129705.33333333333, ans=0.0 2024-09-17 06:58:33,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=129705.33333333333, ans=0.025 2024-09-17 06:58:52,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=129752.0, ans=0.0 2024-09-17 06:58:57,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=129798.66666666667, ans=0.125 2024-09-17 06:59:33,612 INFO [train.py:1198] (1/2) Epoch 8, batch 700, loss[loss=0.2595, simple_loss=0.291, pruned_loss=0.08777, ctc_loss=0.1739, cr_loss=0.4434, over 34603.00 frames. ], tot_loss[loss=0.28, simple_loss=0.31, pruned_loss=0.09704, ctc_loss=0.1892, cr_loss=0.4486, over 6579671.32 frames. ], batch size: 89, lr: 1.53e-02, grad_scale: 16.0 2024-09-17 06:59:36,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.81 vs. limit=12.0 2024-09-17 06:59:43,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=129892.0, ans=0.125 2024-09-17 06:59:54,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=129938.66666666667, ans=0.0 2024-09-17 06:59:56,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=129938.66666666667, ans=0.2 2024-09-17 06:59:58,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=129938.66666666667, ans=0.0 2024-09-17 07:00:14,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=129985.33333333333, ans=0.125 2024-09-17 07:00:19,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=129985.33333333333, ans=0.125 2024-09-17 07:00:28,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=130032.0, ans=0.1 2024-09-17 07:00:32,639 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.795e+02 3.443e+02 5.123e+02 8.970e+02, threshold=6.886e+02, percent-clipped=8.0 2024-09-17 07:00:36,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=130032.0, ans=0.125 2024-09-17 07:00:51,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.34 vs. limit=15.0 2024-09-17 07:00:55,569 INFO [train.py:1198] (1/2) Epoch 8, batch 750, loss[loss=0.2894, simple_loss=0.3195, pruned_loss=0.1014, ctc_loss=0.1921, cr_loss=0.4539, over 34423.00 frames. ], tot_loss[loss=0.2797, simple_loss=0.3098, pruned_loss=0.09697, ctc_loss=0.1891, cr_loss=0.4481, over 6625524.46 frames. ], batch size: 95, lr: 1.53e-02, grad_scale: 16.0 2024-09-17 07:00:59,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=130125.33333333333, ans=0.125 2024-09-17 07:01:12,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=130172.0, ans=0.07 2024-09-17 07:01:25,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=130172.0, ans=0.1 2024-09-17 07:02:19,418 INFO [train.py:1198] (1/2) Epoch 8, batch 800, loss[loss=0.2482, simple_loss=0.2848, pruned_loss=0.08171, ctc_loss=0.1607, cr_loss=0.4009, over 34496.00 frames. ], tot_loss[loss=0.2804, simple_loss=0.3103, pruned_loss=0.09734, ctc_loss=0.1896, cr_loss=0.4493, over 6661562.02 frames. ], batch size: 85, lr: 1.53e-02, grad_scale: 32.0 2024-09-17 07:02:57,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=130452.0, ans=0.2 2024-09-17 07:03:14,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=130498.66666666667, ans=0.0 2024-09-17 07:03:20,652 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.281e+02 2.786e+02 3.206e+02 4.354e+02 7.549e+02, threshold=6.412e+02, percent-clipped=4.0 2024-09-17 07:03:22,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=130498.66666666667, ans=0.0 2024-09-17 07:03:27,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.05 vs. limit=15.0 2024-09-17 07:03:30,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=130545.33333333333, ans=0.1 2024-09-17 07:03:43,463 INFO [train.py:1198] (1/2) Epoch 8, batch 850, loss[loss=0.2942, simple_loss=0.3293, pruned_loss=0.1003, ctc_loss=0.1989, cr_loss=0.4683, over 34348.00 frames. ], tot_loss[loss=0.2799, simple_loss=0.3098, pruned_loss=0.09709, ctc_loss=0.1891, cr_loss=0.4482, over 6692162.79 frames. ], batch size: 103, lr: 1.53e-02, grad_scale: 32.0 2024-09-17 07:03:45,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=130592.0, ans=0.125 2024-09-17 07:03:58,419 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:04:25,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=130685.33333333333, ans=0.2 2024-09-17 07:04:27,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=130685.33333333333, ans=0.025 2024-09-17 07:04:47,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=130732.0, ans=0.125 2024-09-17 07:05:11,737 INFO [train.py:1198] (1/2) Epoch 8, batch 900, loss[loss=0.2471, simple_loss=0.283, pruned_loss=0.08161, ctc_loss=0.1628, cr_loss=0.3842, over 34474.00 frames. ], tot_loss[loss=0.2808, simple_loss=0.3105, pruned_loss=0.09756, ctc_loss=0.19, cr_loss=0.4497, over 6698994.57 frames. ], batch size: 85, lr: 1.53e-02, grad_scale: 16.0 2024-09-17 07:05:25,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=130825.33333333333, ans=0.0 2024-09-17 07:05:28,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=130872.0, ans=0.07 2024-09-17 07:05:35,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=130872.0, ans=0.5 2024-09-17 07:05:40,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=130872.0, ans=0.0 2024-09-17 07:06:03,972 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2024-09-17 07:06:05,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=130965.33333333333, ans=0.0 2024-09-17 07:06:14,832 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.108e+02 2.654e+02 3.234e+02 3.917e+02 7.730e+02, threshold=6.468e+02, percent-clipped=1.0 2024-09-17 07:06:23,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131012.0, ans=0.1 2024-09-17 07:06:34,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=131058.66666666667, ans=0.2 2024-09-17 07:06:36,069 INFO [train.py:1198] (1/2) Epoch 8, batch 950, loss[loss=0.2541, simple_loss=0.2884, pruned_loss=0.08476, ctc_loss=0.1702, cr_loss=0.4076, over 34685.00 frames. ], tot_loss[loss=0.2808, simple_loss=0.3105, pruned_loss=0.09755, ctc_loss=0.19, cr_loss=0.4493, over 6702340.28 frames. ], batch size: 87, lr: 1.53e-02, grad_scale: 16.0 2024-09-17 07:07:08,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=131105.33333333334, ans=0.0 2024-09-17 07:07:23,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=131152.0, ans=0.125 2024-09-17 07:07:44,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=131245.33333333334, ans=0.125 2024-09-17 07:08:00,401 INFO [train.py:1198] (1/2) Epoch 8, batch 1000, loss[loss=0.2692, simple_loss=0.3009, pruned_loss=0.09163, ctc_loss=0.1836, cr_loss=0.4369, over 34438.00 frames. ], tot_loss[loss=0.2822, simple_loss=0.3116, pruned_loss=0.09822, ctc_loss=0.1911, cr_loss=0.4506, over 6694800.01 frames. ], batch size: 90, lr: 1.53e-02, grad_scale: 16.0 2024-09-17 07:08:00,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=131292.0, ans=0.125 2024-09-17 07:08:01,587 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-17 07:08:34,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131385.33333333334, ans=0.1 2024-09-17 07:09:01,662 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.208e+02 2.662e+02 3.249e+02 3.859e+02 5.975e+02, threshold=6.498e+02, percent-clipped=0.0 2024-09-17 07:09:01,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=131432.0, ans=0.125 2024-09-17 07:09:18,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=131478.66666666666, ans=0.1 2024-09-17 07:09:22,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2024-09-17 07:09:24,717 INFO [train.py:1198] (1/2) Epoch 8, batch 1050, loss[loss=0.2738, simple_loss=0.314, pruned_loss=0.09042, ctc_loss=0.1772, cr_loss=0.4341, over 34559.00 frames. ], tot_loss[loss=0.2814, simple_loss=0.3108, pruned_loss=0.09795, ctc_loss=0.1907, cr_loss=0.4505, over 6704722.58 frames. ], batch size: 99, lr: 1.53e-02, grad_scale: 16.0 2024-09-17 07:09:25,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=131525.33333333334, ans=0.2 2024-09-17 07:09:33,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=131525.33333333334, ans=0.125 2024-09-17 07:09:39,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=131572.0, ans=0.0 2024-09-17 07:09:41,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=131572.0, ans=0.0 2024-09-17 07:09:52,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=131572.0, ans=0.0 2024-09-17 07:10:00,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=131618.66666666666, ans=0.0 2024-09-17 07:10:05,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=131618.66666666666, ans=10.0 2024-09-17 07:10:22,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2024-09-17 07:10:49,673 INFO [train.py:1198] (1/2) Epoch 8, batch 1100, loss[loss=0.2842, simple_loss=0.3119, pruned_loss=0.1001, ctc_loss=0.1883, cr_loss=0.4691, over 34357.00 frames. ], tot_loss[loss=0.2811, simple_loss=0.3105, pruned_loss=0.09779, ctc_loss=0.1903, cr_loss=0.4505, over 6717227.97 frames. ], batch size: 91, lr: 1.52e-02, grad_scale: 16.0 2024-09-17 07:10:54,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=131758.66666666666, ans=0.125 2024-09-17 07:10:55,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.07 vs. limit=15.0 2024-09-17 07:11:03,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2024-09-17 07:11:11,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=131805.33333333334, ans=0.125 2024-09-17 07:11:19,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=131805.33333333334, ans=0.125 2024-09-17 07:11:31,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=131852.0, ans=0.125 2024-09-17 07:11:39,932 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-09-17 07:11:50,655 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.053e+02 2.616e+02 3.141e+02 4.350e+02 9.153e+02, threshold=6.282e+02, percent-clipped=3.0 2024-09-17 07:12:12,171 INFO [train.py:1198] (1/2) Epoch 8, batch 1150, loss[loss=0.2633, simple_loss=0.2945, pruned_loss=0.08958, ctc_loss=0.1752, cr_loss=0.4467, over 34351.00 frames. ], tot_loss[loss=0.281, simple_loss=0.3105, pruned_loss=0.0977, ctc_loss=0.1902, cr_loss=0.4502, over 6714483.09 frames. ], batch size: 91, lr: 1.52e-02, grad_scale: 16.0 2024-09-17 07:12:17,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=131992.0, ans=0.125 2024-09-17 07:12:30,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=132038.66666666666, ans=0.125 2024-09-17 07:13:36,729 INFO [train.py:1198] (1/2) Epoch 8, batch 1200, loss[loss=0.2737, simple_loss=0.3125, pruned_loss=0.09025, ctc_loss=0.183, cr_loss=0.4457, over 34570.00 frames. ], tot_loss[loss=0.2817, simple_loss=0.3114, pruned_loss=0.09795, ctc_loss=0.1908, cr_loss=0.4503, over 6706689.84 frames. ], batch size: 99, lr: 1.52e-02, grad_scale: 32.0 2024-09-17 07:13:36,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=132225.33333333334, ans=0.125 2024-09-17 07:14:31,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.33 vs. limit=6.0 2024-09-17 07:14:40,320 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.126e+02 2.621e+02 2.913e+02 3.550e+02 1.257e+03, threshold=5.826e+02, percent-clipped=2.0 2024-09-17 07:14:42,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=132365.33333333334, ans=0.0 2024-09-17 07:14:52,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=132412.0, ans=0.1 2024-09-17 07:14:54,147 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:15:01,620 INFO [train.py:1198] (1/2) Epoch 8, batch 1250, loss[loss=0.2971, simple_loss=0.325, pruned_loss=0.1047, ctc_loss=0.2021, cr_loss=0.4803, over 34364.00 frames. ], tot_loss[loss=0.2816, simple_loss=0.3115, pruned_loss=0.09784, ctc_loss=0.1905, cr_loss=0.4511, over 6740510.04 frames. ], batch size: 107, lr: 1.52e-02, grad_scale: 32.0 2024-09-17 07:15:24,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=132505.33333333334, ans=0.125 2024-09-17 07:15:26,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=132505.33333333334, ans=0.2 2024-09-17 07:15:35,140 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:16:09,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=132645.33333333334, ans=0.125 2024-09-17 07:16:10,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.42 vs. limit=15.0 2024-09-17 07:16:14,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=132645.33333333334, ans=0.1 2024-09-17 07:16:22,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.97 vs. limit=10.0 2024-09-17 07:16:24,025 INFO [train.py:1198] (1/2) Epoch 8, batch 1300, loss[loss=0.2962, simple_loss=0.3302, pruned_loss=0.102, ctc_loss=0.1985, cr_loss=0.4628, over 33088.00 frames. ], tot_loss[loss=0.2804, simple_loss=0.3104, pruned_loss=0.09728, ctc_loss=0.1894, cr_loss=0.4489, over 6745605.30 frames. ], batch size: 130, lr: 1.52e-02, grad_scale: 32.0 2024-09-17 07:16:32,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=132692.0, ans=0.125 2024-09-17 07:16:45,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=132738.66666666666, ans=0.125 2024-09-17 07:17:12,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.41 vs. limit=15.0 2024-09-17 07:17:15,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=132832.0, ans=0.2 2024-09-17 07:17:22,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=132832.0, ans=0.125 2024-09-17 07:17:26,708 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.592e+02 3.253e+02 4.242e+02 8.460e+02, threshold=6.506e+02, percent-clipped=9.0 2024-09-17 07:17:28,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=132832.0, ans=0.125 2024-09-17 07:17:33,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=132878.66666666666, ans=0.0 2024-09-17 07:17:48,002 INFO [train.py:1198] (1/2) Epoch 8, batch 1350, loss[loss=0.2854, simple_loss=0.3151, pruned_loss=0.09956, ctc_loss=0.1965, cr_loss=0.4347, over 34549.00 frames. ], tot_loss[loss=0.2799, simple_loss=0.31, pruned_loss=0.09701, ctc_loss=0.1891, cr_loss=0.4488, over 6764473.86 frames. ], batch size: 94, lr: 1.52e-02, grad_scale: 32.0 2024-09-17 07:17:51,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=132925.33333333334, ans=0.125 2024-09-17 07:18:03,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=132972.0, ans=0.125 2024-09-17 07:18:05,108 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.27 vs. limit=15.0 2024-09-17 07:18:05,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.15 vs. limit=15.0 2024-09-17 07:18:16,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=132972.0, ans=0.0 2024-09-17 07:18:44,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=133065.33333333334, ans=0.125 2024-09-17 07:19:10,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=133158.66666666666, ans=0.125 2024-09-17 07:19:12,214 INFO [train.py:1198] (1/2) Epoch 8, batch 1400, loss[loss=0.2466, simple_loss=0.2767, pruned_loss=0.08305, ctc_loss=0.1687, cr_loss=0.4149, over 34279.00 frames. ], tot_loss[loss=0.2798, simple_loss=0.3098, pruned_loss=0.09703, ctc_loss=0.1892, cr_loss=0.4485, over 6776789.25 frames. ], batch size: 80, lr: 1.52e-02, grad_scale: 32.0 2024-09-17 07:19:13,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.34 vs. limit=10.0 2024-09-17 07:19:19,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=133158.66666666666, ans=0.125 2024-09-17 07:19:20,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=133158.66666666666, ans=0.07 2024-09-17 07:19:22,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=133158.66666666666, ans=0.0 2024-09-17 07:19:25,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=133158.66666666666, ans=0.1 2024-09-17 07:19:35,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=133205.33333333334, ans=0.025 2024-09-17 07:19:45,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2024-09-17 07:19:48,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=133252.0, ans=0.125 2024-09-17 07:19:52,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.93 vs. limit=6.0 2024-09-17 07:19:53,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=133252.0, ans=0.125 2024-09-17 07:20:14,464 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.152e+02 2.692e+02 3.161e+02 4.081e+02 7.552e+02, threshold=6.322e+02, percent-clipped=1.0 2024-09-17 07:20:36,068 INFO [train.py:1198] (1/2) Epoch 8, batch 1450, loss[loss=0.2974, simple_loss=0.3293, pruned_loss=0.1035, ctc_loss=0.2004, cr_loss=0.4596, over 34489.00 frames. ], tot_loss[loss=0.281, simple_loss=0.311, pruned_loss=0.09752, ctc_loss=0.1901, cr_loss=0.4499, over 6772610.90 frames. ], batch size: 110, lr: 1.52e-02, grad_scale: 16.0 2024-09-17 07:20:48,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=133392.0, ans=0.125 2024-09-17 07:21:09,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=133485.33333333334, ans=0.125 2024-09-17 07:21:11,866 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.25 vs. limit=12.0 2024-09-17 07:21:14,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=133485.33333333334, ans=0.125 2024-09-17 07:21:58,647 INFO [train.py:1198] (1/2) Epoch 8, batch 1500, loss[loss=0.2926, simple_loss=0.3237, pruned_loss=0.1016, ctc_loss=0.1965, cr_loss=0.4705, over 34484.00 frames. ], tot_loss[loss=0.2811, simple_loss=0.3113, pruned_loss=0.0975, ctc_loss=0.1901, cr_loss=0.4504, over 6774300.29 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 16.0 2024-09-17 07:22:07,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=133625.33333333334, ans=0.0 2024-09-17 07:22:17,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=133672.0, ans=0.2 2024-09-17 07:22:38,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2024-09-17 07:22:54,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=133765.33333333334, ans=0.1 2024-09-17 07:23:03,946 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.097e+02 2.569e+02 3.083e+02 3.927e+02 8.056e+02, threshold=6.167e+02, percent-clipped=2.0 2024-09-17 07:23:23,513 INFO [train.py:1198] (1/2) Epoch 8, batch 1550, loss[loss=0.2859, simple_loss=0.32, pruned_loss=0.09759, ctc_loss=0.1882, cr_loss=0.4756, over 34415.00 frames. ], tot_loss[loss=0.281, simple_loss=0.3108, pruned_loss=0.09753, ctc_loss=0.1902, cr_loss=0.4504, over 6745870.15 frames. ], batch size: 105, lr: 1.51e-02, grad_scale: 16.0 2024-09-17 07:23:27,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=133858.66666666666, ans=0.0 2024-09-17 07:24:00,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=133952.0, ans=0.07 2024-09-17 07:24:03,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=133952.0, ans=0.2 2024-09-17 07:24:10,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=133952.0, ans=0.0 2024-09-17 07:24:11,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=133952.0, ans=0.125 2024-09-17 07:24:20,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=15.0 2024-09-17 07:24:26,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=133998.66666666666, ans=0.125 2024-09-17 07:24:31,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=134045.33333333334, ans=0.125 2024-09-17 07:24:33,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=134045.33333333334, ans=0.125 2024-09-17 07:24:47,832 INFO [train.py:1198] (1/2) Epoch 8, batch 1600, loss[loss=0.3081, simple_loss=0.3335, pruned_loss=0.1101, ctc_loss=0.2164, cr_loss=0.4808, over 34578.00 frames. ], tot_loss[loss=0.2807, simple_loss=0.3105, pruned_loss=0.09744, ctc_loss=0.1901, cr_loss=0.4499, over 6724927.61 frames. ], batch size: 99, lr: 1.51e-02, grad_scale: 32.0 2024-09-17 07:24:55,041 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.51 vs. limit=10.0 2024-09-17 07:25:02,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=134138.66666666666, ans=0.04949747468305833 2024-09-17 07:25:06,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=134138.66666666666, ans=0.1 2024-09-17 07:25:27,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=134185.33333333334, ans=0.2 2024-09-17 07:25:29,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=134185.33333333334, ans=0.025 2024-09-17 07:25:37,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=134232.0, ans=0.125 2024-09-17 07:25:54,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.807e+02 3.464e+02 4.397e+02 1.066e+03, threshold=6.928e+02, percent-clipped=10.0 2024-09-17 07:26:04,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=134278.66666666666, ans=0.125 2024-09-17 07:26:06,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.89 vs. limit=15.0 2024-09-17 07:26:09,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=134278.66666666666, ans=0.025 2024-09-17 07:26:12,485 INFO [train.py:1198] (1/2) Epoch 8, batch 1650, loss[loss=0.2852, simple_loss=0.3198, pruned_loss=0.09688, ctc_loss=0.1939, cr_loss=0.4521, over 34360.00 frames. ], tot_loss[loss=0.2805, simple_loss=0.3104, pruned_loss=0.09727, ctc_loss=0.1898, cr_loss=0.4504, over 6718800.75 frames. ], batch size: 103, lr: 1.51e-02, grad_scale: 16.0 2024-09-17 07:26:27,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=134372.0, ans=0.0 2024-09-17 07:26:44,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=134418.66666666666, ans=0.125 2024-09-17 07:27:09,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=134465.33333333334, ans=0.125 2024-09-17 07:27:34,482 INFO [train.py:1198] (1/2) Epoch 8, batch 1700, loss[loss=0.2382, simple_loss=0.2725, pruned_loss=0.07853, ctc_loss=0.1539, cr_loss=0.4029, over 34274.00 frames. ], tot_loss[loss=0.2796, simple_loss=0.3098, pruned_loss=0.09679, ctc_loss=0.189, cr_loss=0.4492, over 6744184.37 frames. ], batch size: 80, lr: 1.51e-02, grad_scale: 16.0 2024-09-17 07:27:41,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=134558.66666666666, ans=0.025 2024-09-17 07:28:30,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=134698.66666666666, ans=0.125 2024-09-17 07:28:40,442 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.151e+02 2.649e+02 3.088e+02 3.767e+02 7.516e+02, threshold=6.175e+02, percent-clipped=1.0 2024-09-17 07:28:49,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-09-17 07:28:54,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=134745.33333333334, ans=0.1 2024-09-17 07:28:54,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.20 vs. limit=15.0 2024-09-17 07:28:58,786 INFO [train.py:1198] (1/2) Epoch 8, batch 1750, loss[loss=0.2436, simple_loss=0.2778, pruned_loss=0.08042, ctc_loss=0.1595, cr_loss=0.4153, over 34138.00 frames. ], tot_loss[loss=0.2791, simple_loss=0.3093, pruned_loss=0.09663, ctc_loss=0.1886, cr_loss=0.4488, over 6753224.00 frames. ], batch size: 78, lr: 1.51e-02, grad_scale: 16.0 2024-09-17 07:29:02,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=134792.0, ans=0.0 2024-09-17 07:29:05,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=134792.0, ans=0.025 2024-09-17 07:29:09,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=134792.0, ans=0.125 2024-09-17 07:29:25,376 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:29:27,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.92 vs. limit=15.0 2024-09-17 07:29:48,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=134932.0, ans=0.1 2024-09-17 07:29:53,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=134932.0, ans=0.025 2024-09-17 07:29:57,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=134932.0, ans=0.125 2024-09-17 07:29:57,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=134932.0, ans=0.125 2024-09-17 07:30:10,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=134978.66666666666, ans=0.1 2024-09-17 07:30:10,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=134978.66666666666, ans=0.09899494936611666 2024-09-17 07:30:17,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2024-09-17 07:30:22,828 INFO [train.py:1198] (1/2) Epoch 8, batch 1800, loss[loss=0.2732, simple_loss=0.31, pruned_loss=0.09147, ctc_loss=0.1809, cr_loss=0.4351, over 34709.00 frames. ], tot_loss[loss=0.2792, simple_loss=0.3096, pruned_loss=0.09654, ctc_loss=0.1884, cr_loss=0.4492, over 6755893.38 frames. ], batch size: 97, lr: 1.51e-02, grad_scale: 16.0 2024-09-17 07:30:39,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=135072.0, ans=0.125 2024-09-17 07:30:44,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=135072.0, ans=0.2 2024-09-17 07:31:28,859 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 2.671e+02 3.342e+02 4.452e+02 7.858e+02, threshold=6.684e+02, percent-clipped=7.0 2024-09-17 07:31:39,135 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:31:47,003 INFO [train.py:1198] (1/2) Epoch 8, batch 1850, loss[loss=0.2665, simple_loss=0.3057, pruned_loss=0.08758, ctc_loss=0.1742, cr_loss=0.4309, over 34465.00 frames. ], tot_loss[loss=0.2784, simple_loss=0.3089, pruned_loss=0.09623, ctc_loss=0.1879, cr_loss=0.4489, over 6764047.94 frames. ], batch size: 100, lr: 1.51e-02, grad_scale: 16.0 2024-09-17 07:31:57,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=135258.66666666666, ans=0.125 2024-09-17 07:32:03,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=135305.33333333334, ans=0.125 2024-09-17 07:32:27,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=135352.0, ans=0.125 2024-09-17 07:32:30,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=135352.0, ans=0.2 2024-09-17 07:32:32,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=135352.0, ans=0.125 2024-09-17 07:32:33,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=135352.0, ans=0.09899494936611666 2024-09-17 07:32:34,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.13 vs. limit=15.0 2024-09-17 07:32:47,586 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=15.0 2024-09-17 07:32:53,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=135445.33333333334, ans=0.1 2024-09-17 07:33:00,213 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-09-17 07:33:09,154 INFO [train.py:1198] (1/2) Epoch 8, batch 1900, loss[loss=0.2957, simple_loss=0.3343, pruned_loss=0.09942, ctc_loss=0.199, cr_loss=0.4645, over 34385.00 frames. ], tot_loss[loss=0.2793, simple_loss=0.3097, pruned_loss=0.09654, ctc_loss=0.1885, cr_loss=0.4499, over 6772859.58 frames. ], batch size: 103, lr: 1.50e-02, grad_scale: 16.0 2024-09-17 07:33:12,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=135492.0, ans=0.0 2024-09-17 07:33:20,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=135492.0, ans=0.125 2024-09-17 07:33:24,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.18 vs. limit=22.5 2024-09-17 07:33:56,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=135585.33333333334, ans=0.125 2024-09-17 07:34:10,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=135632.0, ans=0.025 2024-09-17 07:34:15,252 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.155e+02 2.834e+02 3.652e+02 4.552e+02 8.867e+02, threshold=7.303e+02, percent-clipped=8.0 2024-09-17 07:34:33,202 INFO [train.py:1198] (1/2) Epoch 8, batch 1950, loss[loss=0.2868, simple_loss=0.3184, pruned_loss=0.09903, ctc_loss=0.1921, cr_loss=0.4702, over 34376.00 frames. ], tot_loss[loss=0.2805, simple_loss=0.3111, pruned_loss=0.09696, ctc_loss=0.1892, cr_loss=0.4514, over 6789824.31 frames. ], batch size: 91, lr: 1.50e-02, grad_scale: 16.0 2024-09-17 07:35:10,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=135818.66666666666, ans=0.125 2024-09-17 07:35:26,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=15.0 2024-09-17 07:35:34,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=135865.33333333334, ans=0.125 2024-09-17 07:35:41,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=135912.0, ans=0.125 2024-09-17 07:35:44,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=135912.0, ans=0.025 2024-09-17 07:35:57,180 INFO [train.py:1198] (1/2) Epoch 8, batch 2000, loss[loss=0.2508, simple_loss=0.2798, pruned_loss=0.08554, ctc_loss=0.1717, cr_loss=0.4117, over 34169.00 frames. ], tot_loss[loss=0.2811, simple_loss=0.3114, pruned_loss=0.09737, ctc_loss=0.19, cr_loss=0.4518, over 6764864.76 frames. ], batch size: 78, lr: 1.50e-02, grad_scale: 32.0 2024-09-17 07:35:59,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=135958.66666666666, ans=0.125 2024-09-17 07:36:04,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=135958.66666666666, ans=0.05 2024-09-17 07:36:53,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=136098.66666666666, ans=0.125 2024-09-17 07:37:01,561 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.208e+02 2.744e+02 3.336e+02 4.372e+02 2.026e+03, threshold=6.672e+02, percent-clipped=3.0 2024-09-17 07:37:06,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=136145.33333333334, ans=0.125 2024-09-17 07:37:21,688 INFO [train.py:1198] (1/2) Epoch 8, batch 2050, loss[loss=0.2534, simple_loss=0.2854, pruned_loss=0.08527, ctc_loss=0.1702, cr_loss=0.4197, over 34485.00 frames. ], tot_loss[loss=0.2799, simple_loss=0.3102, pruned_loss=0.09686, ctc_loss=0.1892, cr_loss=0.45, over 6755364.39 frames. ], batch size: 82, lr: 1.50e-02, grad_scale: 32.0 2024-09-17 07:37:26,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.32 vs. limit=15.0 2024-09-17 07:37:38,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=136238.66666666666, ans=0.2 2024-09-17 07:38:08,196 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:38:34,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.88 vs. limit=15.0 2024-09-17 07:38:43,916 INFO [train.py:1198] (1/2) Epoch 8, batch 2100, loss[loss=0.2872, simple_loss=0.3132, pruned_loss=0.1022, ctc_loss=0.195, cr_loss=0.4474, over 34537.00 frames. ], tot_loss[loss=0.2791, simple_loss=0.3096, pruned_loss=0.09652, ctc_loss=0.1884, cr_loss=0.4494, over 6769626.25 frames. ], batch size: 94, lr: 1.50e-02, grad_scale: 32.0 2024-09-17 07:38:44,191 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:38:52,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=136425.33333333334, ans=0.0 2024-09-17 07:39:12,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.87 vs. limit=12.0 2024-09-17 07:39:23,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=136518.66666666666, ans=0.0 2024-09-17 07:39:25,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=136518.66666666666, ans=0.125 2024-09-17 07:39:26,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=136518.66666666666, ans=0.125 2024-09-17 07:39:31,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=136518.66666666666, ans=0.2 2024-09-17 07:39:39,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=136565.33333333334, ans=0.125 2024-09-17 07:39:49,145 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.101e+02 2.823e+02 3.602e+02 4.930e+02 1.111e+03, threshold=7.205e+02, percent-clipped=8.0 2024-09-17 07:39:55,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2024-09-17 07:40:07,203 INFO [train.py:1198] (1/2) Epoch 8, batch 2150, loss[loss=0.2828, simple_loss=0.3096, pruned_loss=0.09987, ctc_loss=0.1906, cr_loss=0.4527, over 34367.00 frames. ], tot_loss[loss=0.278, simple_loss=0.3088, pruned_loss=0.09591, ctc_loss=0.1873, cr_loss=0.4485, over 6788206.81 frames. ], batch size: 91, lr: 1.50e-02, grad_scale: 32.0 2024-09-17 07:40:25,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=136705.33333333334, ans=0.0 2024-09-17 07:40:27,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=136705.33333333334, ans=0.025 2024-09-17 07:40:30,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=136705.33333333334, ans=0.1 2024-09-17 07:40:45,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=136752.0, ans=0.0 2024-09-17 07:41:00,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=136798.66666666666, ans=0.04949747468305833 2024-09-17 07:41:01,395 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=15.0 2024-09-17 07:41:18,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=136845.33333333334, ans=0.0 2024-09-17 07:41:27,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=136845.33333333334, ans=0.09899494936611666 2024-09-17 07:41:31,745 INFO [train.py:1198] (1/2) Epoch 8, batch 2200, loss[loss=0.2758, simple_loss=0.3125, pruned_loss=0.0925, ctc_loss=0.1812, cr_loss=0.4467, over 34470.00 frames. ], tot_loss[loss=0.2783, simple_loss=0.309, pruned_loss=0.09609, ctc_loss=0.1876, cr_loss=0.4481, over 6784281.36 frames. ], batch size: 100, lr: 1.50e-02, grad_scale: 32.0 2024-09-17 07:41:43,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=136892.0, ans=0.0 2024-09-17 07:42:21,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=137032.0, ans=0.125 2024-09-17 07:42:35,971 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.603e+02 3.523e+02 4.551e+02 6.418e+02, threshold=7.047e+02, percent-clipped=0.0 2024-09-17 07:42:44,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=137078.66666666666, ans=0.2 2024-09-17 07:42:52,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=137078.66666666666, ans=0.125 2024-09-17 07:42:55,955 INFO [train.py:1198] (1/2) Epoch 8, batch 2250, loss[loss=0.2762, simple_loss=0.3124, pruned_loss=0.09304, ctc_loss=0.1805, cr_loss=0.4442, over 34419.00 frames. ], tot_loss[loss=0.2781, simple_loss=0.3089, pruned_loss=0.09601, ctc_loss=0.1876, cr_loss=0.4477, over 6781199.34 frames. ], batch size: 95, lr: 1.50e-02, grad_scale: 32.0 2024-09-17 07:42:58,130 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:43:07,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=137125.33333333334, ans=0.125 2024-09-17 07:43:11,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=137172.0, ans=0.1 2024-09-17 07:43:39,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=137218.66666666666, ans=0.0 2024-09-17 07:43:42,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=137218.66666666666, ans=0.1 2024-09-17 07:43:43,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=137265.33333333334, ans=0.125 2024-09-17 07:44:18,314 INFO [train.py:1198] (1/2) Epoch 8, batch 2300, loss[loss=0.2414, simple_loss=0.2762, pruned_loss=0.07921, ctc_loss=0.1621, cr_loss=0.3943, over 34297.00 frames. ], tot_loss[loss=0.2775, simple_loss=0.3081, pruned_loss=0.09577, ctc_loss=0.1873, cr_loss=0.4468, over 6765842.79 frames. ], batch size: 83, lr: 1.50e-02, grad_scale: 32.0 2024-09-17 07:44:30,818 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.30 vs. limit=22.5 2024-09-17 07:44:58,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=137452.0, ans=0.2 2024-09-17 07:45:20,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=137498.66666666666, ans=0.2 2024-09-17 07:45:23,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=137498.66666666666, ans=0.2 2024-09-17 07:45:24,825 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.188e+02 2.959e+02 3.767e+02 4.840e+02 7.548e+02, threshold=7.534e+02, percent-clipped=3.0 2024-09-17 07:45:33,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=137545.33333333334, ans=0.07 2024-09-17 07:45:38,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=137545.33333333334, ans=0.125 2024-09-17 07:45:43,030 INFO [train.py:1198] (1/2) Epoch 8, batch 2350, loss[loss=0.2896, simple_loss=0.3215, pruned_loss=0.09878, ctc_loss=0.1988, cr_loss=0.5067, over 34687.00 frames. ], tot_loss[loss=0.2778, simple_loss=0.3083, pruned_loss=0.09594, ctc_loss=0.1875, cr_loss=0.4475, over 6771623.82 frames. ], batch size: 97, lr: 1.49e-02, grad_scale: 32.0 2024-09-17 07:46:02,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2024-09-17 07:46:35,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=137732.0, ans=0.125 2024-09-17 07:46:49,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=137778.66666666666, ans=0.0 2024-09-17 07:46:55,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=137778.66666666666, ans=0.2 2024-09-17 07:47:07,166 INFO [train.py:1198] (1/2) Epoch 8, batch 2400, loss[loss=0.2712, simple_loss=0.3033, pruned_loss=0.09295, ctc_loss=0.1806, cr_loss=0.4293, over 34560.00 frames. ], tot_loss[loss=0.2778, simple_loss=0.3085, pruned_loss=0.09584, ctc_loss=0.1873, cr_loss=0.448, over 6775293.12 frames. ], batch size: 89, lr: 1.49e-02, grad_scale: 32.0 2024-09-17 07:47:37,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=137872.0, ans=0.2 2024-09-17 07:47:39,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2024-09-17 07:47:48,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=137918.66666666666, ans=0.0 2024-09-17 07:48:05,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=137965.33333333334, ans=0.0 2024-09-17 07:48:06,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=137965.33333333334, ans=0.5 2024-09-17 07:48:11,522 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.229e+02 2.898e+02 3.555e+02 4.689e+02 7.936e+02, threshold=7.109e+02, percent-clipped=1.0 2024-09-17 07:48:29,868 INFO [train.py:1198] (1/2) Epoch 8, batch 2450, loss[loss=0.2855, simple_loss=0.3146, pruned_loss=0.09979, ctc_loss=0.1927, cr_loss=0.4571, over 34425.00 frames. ], tot_loss[loss=0.2792, simple_loss=0.3098, pruned_loss=0.09643, ctc_loss=0.1885, cr_loss=0.4498, over 6751675.37 frames. ], batch size: 95, lr: 1.49e-02, grad_scale: 32.0 2024-09-17 07:48:31,104 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.52 vs. limit=15.0 2024-09-17 07:48:58,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=138105.33333333334, ans=0.025 2024-09-17 07:49:00,418 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 07:49:20,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=138198.66666666666, ans=0.5 2024-09-17 07:49:33,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=138198.66666666666, ans=0.2 2024-09-17 07:49:38,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=138245.33333333334, ans=0.02 2024-09-17 07:49:54,182 INFO [train.py:1198] (1/2) Epoch 8, batch 2500, loss[loss=0.278, simple_loss=0.315, pruned_loss=0.09194, ctc_loss=0.1885, cr_loss=0.4865, over 34465.00 frames. ], tot_loss[loss=0.2789, simple_loss=0.3096, pruned_loss=0.09631, ctc_loss=0.1881, cr_loss=0.4502, over 6762738.88 frames. ], batch size: 100, lr: 1.49e-02, grad_scale: 32.0 2024-09-17 07:50:09,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=138292.0, ans=0.125 2024-09-17 07:50:27,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=138385.33333333334, ans=0.1 2024-09-17 07:50:45,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=138432.0, ans=0.1 2024-09-17 07:51:02,061 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.987e+02 2.614e+02 3.185e+02 4.029e+02 6.455e+02, threshold=6.371e+02, percent-clipped=0.0 2024-09-17 07:51:02,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=138478.66666666666, ans=0.125 2024-09-17 07:51:06,692 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2024-09-17 07:51:10,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=138478.66666666666, ans=0.1 2024-09-17 07:51:18,551 INFO [train.py:1198] (1/2) Epoch 8, batch 2550, loss[loss=0.2427, simple_loss=0.2801, pruned_loss=0.07898, ctc_loss=0.1558, cr_loss=0.4029, over 34168.00 frames. ], tot_loss[loss=0.2783, simple_loss=0.3093, pruned_loss=0.09594, ctc_loss=0.1874, cr_loss=0.4495, over 6765740.94 frames. ], batch size: 78, lr: 1.49e-02, grad_scale: 16.0 2024-09-17 07:51:31,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=138525.33333333334, ans=0.2 2024-09-17 07:51:56,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=138618.66666666666, ans=0.125 2024-09-17 07:52:09,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=138665.33333333334, ans=0.125 2024-09-17 07:52:42,950 INFO [train.py:1198] (1/2) Epoch 8, batch 2600, loss[loss=0.2729, simple_loss=0.3072, pruned_loss=0.09238, ctc_loss=0.1802, cr_loss=0.4429, over 34349.00 frames. ], tot_loss[loss=0.2791, simple_loss=0.3099, pruned_loss=0.09633, ctc_loss=0.188, cr_loss=0.4504, over 6761834.01 frames. ], batch size: 91, lr: 1.49e-02, grad_scale: 16.0 2024-09-17 07:52:49,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=138758.66666666666, ans=0.1 2024-09-17 07:53:09,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=138805.33333333334, ans=0.07 2024-09-17 07:53:09,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=138805.33333333334, ans=0.0 2024-09-17 07:53:32,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=138898.66666666666, ans=0.125 2024-09-17 07:53:38,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=138898.66666666666, ans=0.1 2024-09-17 07:53:42,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=138898.66666666666, ans=0.1 2024-09-17 07:53:47,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=138898.66666666666, ans=0.0 2024-09-17 07:53:50,227 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 2.530e+02 2.917e+02 3.750e+02 9.271e+02, threshold=5.834e+02, percent-clipped=3.0 2024-09-17 07:54:01,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=138945.33333333334, ans=0.07 2024-09-17 07:54:06,350 INFO [train.py:1198] (1/2) Epoch 8, batch 2650, loss[loss=0.2865, simple_loss=0.3206, pruned_loss=0.09812, ctc_loss=0.1916, cr_loss=0.4455, over 34272.00 frames. ], tot_loss[loss=0.279, simple_loss=0.3099, pruned_loss=0.09628, ctc_loss=0.188, cr_loss=0.4503, over 6769496.76 frames. ], batch size: 117, lr: 1.49e-02, grad_scale: 16.0 2024-09-17 07:54:08,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=138992.0, ans=0.0 2024-09-17 07:54:23,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=139038.66666666666, ans=0.125 2024-09-17 07:54:33,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=139038.66666666666, ans=0.2 2024-09-17 07:54:49,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=139085.33333333334, ans=0.125 2024-09-17 07:55:01,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=139132.0, ans=0.1 2024-09-17 07:55:14,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=139178.66666666666, ans=0.04949747468305833 2024-09-17 07:55:28,637 INFO [train.py:1198] (1/2) Epoch 8, batch 2700, loss[loss=0.2768, simple_loss=0.3131, pruned_loss=0.09284, ctc_loss=0.1857, cr_loss=0.4418, over 34625.00 frames. ], tot_loss[loss=0.28, simple_loss=0.3107, pruned_loss=0.09673, ctc_loss=0.1887, cr_loss=0.4508, over 6763378.69 frames. ], batch size: 102, lr: 1.49e-02, grad_scale: 16.0 2024-09-17 07:55:37,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-09-17 07:56:32,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=139365.33333333334, ans=0.2 2024-09-17 07:56:36,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.71 vs. limit=15.0 2024-09-17 07:56:36,816 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.085e+02 2.718e+02 3.414e+02 4.299e+02 1.338e+03, threshold=6.828e+02, percent-clipped=6.0 2024-09-17 07:56:47,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=15.0 2024-09-17 07:56:53,254 INFO [train.py:1198] (1/2) Epoch 8, batch 2750, loss[loss=0.2774, simple_loss=0.308, pruned_loss=0.09552, ctc_loss=0.1874, cr_loss=0.4566, over 34660.00 frames. ], tot_loss[loss=0.2785, simple_loss=0.3093, pruned_loss=0.09608, ctc_loss=0.1876, cr_loss=0.4494, over 6761928.53 frames. ], batch size: 88, lr: 1.48e-02, grad_scale: 16.0 2024-09-17 07:57:02,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=139458.66666666666, ans=0.04949747468305833 2024-09-17 07:57:10,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=139505.33333333334, ans=0.025 2024-09-17 07:57:24,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.68 vs. limit=10.0 2024-09-17 07:57:25,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.14 vs. limit=15.0 2024-09-17 07:58:09,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=139645.33333333334, ans=0.0 2024-09-17 07:58:17,726 INFO [train.py:1198] (1/2) Epoch 8, batch 2800, loss[loss=0.3224, simple_loss=0.3354, pruned_loss=0.1217, ctc_loss=0.2371, cr_loss=0.4639, over 23548.00 frames. ], tot_loss[loss=0.2788, simple_loss=0.3094, pruned_loss=0.09627, ctc_loss=0.1879, cr_loss=0.4492, over 6740507.71 frames. ], batch size: 244, lr: 1.48e-02, grad_scale: 32.0 2024-09-17 07:58:18,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=139692.0, ans=0.125 2024-09-17 07:58:35,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=139738.66666666666, ans=0.125 2024-09-17 07:58:41,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.61 vs. limit=22.5 2024-09-17 07:58:50,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=139785.33333333334, ans=0.1 2024-09-17 07:59:05,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=139832.0, ans=0.125 2024-09-17 07:59:05,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=139832.0, ans=0.0 2024-09-17 07:59:23,642 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.186e+02 2.714e+02 3.225e+02 3.986e+02 6.961e+02, threshold=6.451e+02, percent-clipped=1.0 2024-09-17 07:59:40,024 INFO [train.py:1198] (1/2) Epoch 8, batch 2850, loss[loss=0.2706, simple_loss=0.3007, pruned_loss=0.09262, ctc_loss=0.185, cr_loss=0.4548, over 34461.00 frames. ], tot_loss[loss=0.2795, simple_loss=0.31, pruned_loss=0.09669, ctc_loss=0.1886, cr_loss=0.4499, over 6723801.35 frames. ], batch size: 90, lr: 1.48e-02, grad_scale: 32.0 2024-09-17 07:59:40,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=139925.33333333334, ans=0.0 2024-09-17 08:00:10,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=139972.0, ans=0.0 2024-09-17 08:00:45,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=140065.33333333334, ans=0.125 2024-09-17 08:00:51,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=140112.0, ans=0.0 2024-09-17 08:01:04,580 INFO [train.py:1198] (1/2) Epoch 8, batch 2900, loss[loss=0.2855, simple_loss=0.3141, pruned_loss=0.09906, ctc_loss=0.2004, cr_loss=0.4663, over 34526.00 frames. ], tot_loss[loss=0.2796, simple_loss=0.3105, pruned_loss=0.09655, ctc_loss=0.1884, cr_loss=0.451, over 6754294.41 frames. ], batch size: 94, lr: 1.48e-02, grad_scale: 32.0 2024-09-17 08:01:09,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=140158.66666666666, ans=0.125 2024-09-17 08:01:18,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=140158.66666666666, ans=0.125 2024-09-17 08:01:26,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=140205.33333333334, ans=0.5 2024-09-17 08:01:43,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.53 vs. limit=10.0 2024-09-17 08:01:46,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=140252.0, ans=0.0 2024-09-17 08:02:12,215 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.750e+02 3.387e+02 4.087e+02 6.788e+02, threshold=6.774e+02, percent-clipped=1.0 2024-09-17 08:02:12,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=140345.33333333334, ans=0.0 2024-09-17 08:02:13,213 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.68 vs. limit=15.0 2024-09-17 08:02:24,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=140345.33333333334, ans=0.2 2024-09-17 08:02:28,850 INFO [train.py:1198] (1/2) Epoch 8, batch 2950, loss[loss=0.2516, simple_loss=0.2879, pruned_loss=0.08351, ctc_loss=0.1622, cr_loss=0.3966, over 34651.00 frames. ], tot_loss[loss=0.2779, simple_loss=0.3088, pruned_loss=0.09582, ctc_loss=0.1872, cr_loss=0.448, over 6749386.00 frames. ], batch size: 88, lr: 1.48e-02, grad_scale: 32.0 2024-09-17 08:02:29,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=140392.0, ans=0.1 2024-09-17 08:03:12,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=140485.33333333334, ans=0.125 2024-09-17 08:03:17,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=140532.0, ans=0.125 2024-09-17 08:03:18,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=140532.0, ans=0.125 2024-09-17 08:03:20,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=140532.0, ans=0.125 2024-09-17 08:03:33,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=140578.66666666666, ans=0.07 2024-09-17 08:03:35,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=140578.66666666666, ans=0.125 2024-09-17 08:03:53,846 INFO [train.py:1198] (1/2) Epoch 8, batch 3000, loss[loss=0.2746, simple_loss=0.3079, pruned_loss=0.09358, ctc_loss=0.1836, cr_loss=0.4354, over 34544.00 frames. ], tot_loss[loss=0.2775, simple_loss=0.3085, pruned_loss=0.09558, ctc_loss=0.1869, cr_loss=0.4479, over 6748589.84 frames. ], batch size: 94, lr: 1.48e-02, grad_scale: 32.0 2024-09-17 08:03:53,846 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 08:04:10,646 INFO [train.py:1230] (1/2) Epoch 8, validation: loss=0.1614, simple_loss=0.2598, pruned_loss=0.02616, ctc_loss=0.05349, cr_loss=1.507e-14, over 944034.00 frames. 2024-09-17 08:04:10,646 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 08:05:02,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=140765.33333333334, ans=0.125 2024-09-17 08:05:10,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=140765.33333333334, ans=0.125 2024-09-17 08:05:15,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=140812.0, ans=0.0 2024-09-17 08:05:16,828 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.128e+02 2.646e+02 3.060e+02 4.187e+02 7.657e+02, threshold=6.120e+02, percent-clipped=1.0 2024-09-17 08:05:28,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=140812.0, ans=0.125 2024-09-17 08:05:33,188 INFO [train.py:1198] (1/2) Epoch 8, batch 3050, loss[loss=0.2778, simple_loss=0.311, pruned_loss=0.09509, ctc_loss=0.1831, cr_loss=0.4456, over 34591.00 frames. ], tot_loss[loss=0.2784, simple_loss=0.3094, pruned_loss=0.096, ctc_loss=0.1877, cr_loss=0.4489, over 6741338.21 frames. ], batch size: 89, lr: 1.48e-02, grad_scale: 32.0 2024-09-17 08:05:35,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=140858.66666666666, ans=0.2 2024-09-17 08:05:39,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=140858.66666666666, ans=0.2 2024-09-17 08:05:41,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=140858.66666666666, ans=0.125 2024-09-17 08:05:59,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=140905.33333333334, ans=0.0 2024-09-17 08:06:15,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.83 vs. limit=15.0 2024-09-17 08:06:19,436 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.457e-02 2024-09-17 08:06:37,312 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.39 vs. limit=10.0 2024-09-17 08:06:48,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141045.33333333334, ans=0.1 2024-09-17 08:06:50,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.67 vs. limit=22.5 2024-09-17 08:06:52,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=141092.0, ans=0.125 2024-09-17 08:06:54,246 INFO [train.py:1198] (1/2) Epoch 8, batch 3100, loss[loss=0.3059, simple_loss=0.3365, pruned_loss=0.1062, ctc_loss=0.2115, cr_loss=0.5101, over 34222.00 frames. ], tot_loss[loss=0.2784, simple_loss=0.3093, pruned_loss=0.09598, ctc_loss=0.1878, cr_loss=0.4486, over 6740700.92 frames. ], batch size: 117, lr: 1.48e-02, grad_scale: 32.0 2024-09-17 08:07:02,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=141092.0, ans=0.125 2024-09-17 08:07:10,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=141138.66666666666, ans=0.0 2024-09-17 08:07:13,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141138.66666666666, ans=0.1 2024-09-17 08:07:46,992 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2024-09-17 08:07:59,034 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.180e+02 2.580e+02 3.297e+02 4.464e+02 7.301e+02, threshold=6.594e+02, percent-clipped=7.0 2024-09-17 08:07:59,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=141278.66666666666, ans=0.2 2024-09-17 08:08:13,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=141325.33333333334, ans=0.125 2024-09-17 08:08:15,327 INFO [train.py:1198] (1/2) Epoch 8, batch 3150, loss[loss=0.2906, simple_loss=0.323, pruned_loss=0.1006, ctc_loss=0.1931, cr_loss=0.4599, over 33840.00 frames. ], tot_loss[loss=0.2781, simple_loss=0.3092, pruned_loss=0.09579, ctc_loss=0.1874, cr_loss=0.4481, over 6746871.58 frames. ], batch size: 122, lr: 1.48e-02, grad_scale: 32.0 2024-09-17 08:08:30,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=141372.0, ans=0.0 2024-09-17 08:08:46,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141418.66666666666, ans=0.1 2024-09-17 08:08:57,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=141418.66666666666, ans=0.125 2024-09-17 08:09:20,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=141512.0, ans=0.0 2024-09-17 08:09:37,826 INFO [train.py:1198] (1/2) Epoch 8, batch 3200, loss[loss=0.2745, simple_loss=0.3046, pruned_loss=0.09502, ctc_loss=0.1847, cr_loss=0.4373, over 34521.00 frames. ], tot_loss[loss=0.2773, simple_loss=0.3084, pruned_loss=0.0955, ctc_loss=0.1868, cr_loss=0.448, over 6760991.29 frames. ], batch size: 94, lr: 1.47e-02, grad_scale: 32.0 2024-09-17 08:09:46,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=141558.66666666666, ans=0.125 2024-09-17 08:10:07,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=141605.33333333334, ans=0.0 2024-09-17 08:10:09,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=141652.0, ans=0.125 2024-09-17 08:10:09,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=141652.0, ans=0.04949747468305833 2024-09-17 08:10:12,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=141652.0, ans=0.125 2024-09-17 08:10:41,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=141745.33333333334, ans=0.0 2024-09-17 08:10:42,669 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.065e+02 2.709e+02 3.134e+02 4.055e+02 7.575e+02, threshold=6.268e+02, percent-clipped=1.0 2024-09-17 08:10:44,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=141745.33333333334, ans=0.125 2024-09-17 08:11:00,310 INFO [train.py:1198] (1/2) Epoch 8, batch 3250, loss[loss=0.3059, simple_loss=0.3315, pruned_loss=0.1092, ctc_loss=0.2119, cr_loss=0.4913, over 34643.00 frames. ], tot_loss[loss=0.2777, simple_loss=0.309, pruned_loss=0.09551, ctc_loss=0.1869, cr_loss=0.4491, over 6770358.11 frames. ], batch size: 98, lr: 1.47e-02, grad_scale: 32.0 2024-09-17 08:11:05,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=141792.0, ans=0.025 2024-09-17 08:11:07,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=141792.0, ans=0.125 2024-09-17 08:11:37,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=141885.33333333334, ans=0.125 2024-09-17 08:11:41,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=141885.33333333334, ans=0.1 2024-09-17 08:11:42,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=141885.33333333334, ans=0.125 2024-09-17 08:11:52,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=141932.0, ans=0.0 2024-09-17 08:11:58,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=141932.0, ans=0.0 2024-09-17 08:12:10,165 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:12:21,099 INFO [train.py:1198] (1/2) Epoch 8, batch 3300, loss[loss=0.3014, simple_loss=0.3292, pruned_loss=0.1064, ctc_loss=0.2056, cr_loss=0.4925, over 33104.00 frames. ], tot_loss[loss=0.2764, simple_loss=0.3077, pruned_loss=0.09498, ctc_loss=0.1859, cr_loss=0.4471, over 6769752.13 frames. ], batch size: 130, lr: 1.47e-02, grad_scale: 16.0 2024-09-17 08:12:26,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.17 vs. limit=22.5 2024-09-17 08:12:42,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=142072.0, ans=0.0 2024-09-17 08:12:44,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=142072.0, ans=0.125 2024-09-17 08:12:49,669 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=22.5 2024-09-17 08:12:50,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=142072.0, ans=0.125 2024-09-17 08:13:22,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=142165.33333333334, ans=0.125 2024-09-17 08:13:24,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=142212.0, ans=0.1 2024-09-17 08:13:25,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=142212.0, ans=0.125 2024-09-17 08:13:27,308 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.895e+02 3.703e+02 4.654e+02 1.041e+03, threshold=7.406e+02, percent-clipped=11.0 2024-09-17 08:13:41,713 INFO [train.py:1198] (1/2) Epoch 8, batch 3350, loss[loss=0.2899, simple_loss=0.322, pruned_loss=0.1003, ctc_loss=0.1946, cr_loss=0.4551, over 33851.00 frames. ], tot_loss[loss=0.2777, simple_loss=0.3087, pruned_loss=0.09564, ctc_loss=0.1871, cr_loss=0.4479, over 6744219.02 frames. ], batch size: 122, lr: 1.47e-02, grad_scale: 16.0 2024-09-17 08:13:43,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=142258.66666666666, ans=0.125 2024-09-17 08:13:50,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=142258.66666666666, ans=0.05 2024-09-17 08:14:00,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=142305.33333333334, ans=0.07 2024-09-17 08:15:02,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=142492.0, ans=0.1 2024-09-17 08:15:03,971 INFO [train.py:1198] (1/2) Epoch 8, batch 3400, loss[loss=0.2515, simple_loss=0.2824, pruned_loss=0.08513, ctc_loss=0.169, cr_loss=0.4145, over 34177.00 frames. ], tot_loss[loss=0.2781, simple_loss=0.3089, pruned_loss=0.09591, ctc_loss=0.1875, cr_loss=0.4487, over 6734886.85 frames. ], batch size: 78, lr: 1.47e-02, grad_scale: 16.0 2024-09-17 08:15:10,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=142492.0, ans=0.1 2024-09-17 08:15:14,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=142492.0, ans=0.0 2024-09-17 08:15:15,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=142492.0, ans=0.1 2024-09-17 08:15:17,295 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:15:42,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=142585.33333333334, ans=0.025 2024-09-17 08:16:06,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=142632.0, ans=0.05 2024-09-17 08:16:08,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=142678.66666666666, ans=0.125 2024-09-17 08:16:12,598 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.235e+02 2.602e+02 3.094e+02 3.895e+02 7.043e+02, threshold=6.187e+02, percent-clipped=0.0 2024-09-17 08:16:25,455 INFO [train.py:1198] (1/2) Epoch 8, batch 3450, loss[loss=0.3037, simple_loss=0.3275, pruned_loss=0.1099, ctc_loss=0.2092, cr_loss=0.459, over 33014.00 frames. ], tot_loss[loss=0.2785, simple_loss=0.3092, pruned_loss=0.09618, ctc_loss=0.1878, cr_loss=0.449, over 6746253.69 frames. ], batch size: 130, lr: 1.47e-02, grad_scale: 8.0 2024-09-17 08:16:53,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=142772.0, ans=0.0 2024-09-17 08:17:02,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.54 vs. limit=22.5 2024-09-17 08:17:17,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=142865.33333333334, ans=0.125 2024-09-17 08:17:30,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=142912.0, ans=0.2 2024-09-17 08:17:46,436 INFO [train.py:1198] (1/2) Epoch 8, batch 3500, loss[loss=0.2463, simple_loss=0.2801, pruned_loss=0.0812, ctc_loss=0.1628, cr_loss=0.437, over 34469.00 frames. ], tot_loss[loss=0.2773, simple_loss=0.3082, pruned_loss=0.09555, ctc_loss=0.1868, cr_loss=0.4481, over 6747866.76 frames. ], batch size: 85, lr: 1.47e-02, grad_scale: 8.0 2024-09-17 08:17:46,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=142958.66666666666, ans=0.125 2024-09-17 08:17:54,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=142958.66666666666, ans=0.025 2024-09-17 08:17:57,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.68 vs. limit=12.0 2024-09-17 08:18:01,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=143005.33333333334, ans=0.125 2024-09-17 08:18:04,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=143005.33333333334, ans=0.125 2024-09-17 08:18:06,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=143005.33333333334, ans=0.125 2024-09-17 08:18:10,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=143005.33333333334, ans=0.2 2024-09-17 08:18:24,036 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:18:39,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=143098.66666666666, ans=0.0 2024-09-17 08:18:53,912 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.088e+02 2.871e+02 3.360e+02 4.140e+02 7.495e+02, threshold=6.719e+02, percent-clipped=7.0 2024-09-17 08:19:00,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=143145.33333333334, ans=0.2 2024-09-17 08:19:03,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=143145.33333333334, ans=0.125 2024-09-17 08:19:06,642 INFO [train.py:1198] (1/2) Epoch 8, batch 3550, loss[loss=0.2824, simple_loss=0.3154, pruned_loss=0.09619, ctc_loss=0.191, cr_loss=0.4726, over 34364.00 frames. ], tot_loss[loss=0.2775, simple_loss=0.3086, pruned_loss=0.09553, ctc_loss=0.1868, cr_loss=0.4484, over 6757594.15 frames. ], batch size: 103, lr: 1.47e-02, grad_scale: 8.0 2024-09-17 08:19:20,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=143192.0, ans=0.2 2024-09-17 08:19:28,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=143238.66666666666, ans=0.125 2024-09-17 08:19:44,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=143285.33333333334, ans=0.0 2024-09-17 08:19:55,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=143332.0, ans=0.125 2024-09-17 08:20:15,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143378.66666666666, ans=0.1 2024-09-17 08:20:28,356 INFO [train.py:1198] (1/2) Epoch 8, batch 3600, loss[loss=0.2739, simple_loss=0.3052, pruned_loss=0.09476, ctc_loss=0.1804, cr_loss=0.4243, over 34468.00 frames. ], tot_loss[loss=0.278, simple_loss=0.3091, pruned_loss=0.09578, ctc_loss=0.1873, cr_loss=0.4496, over 6767190.86 frames. ], batch size: 90, lr: 1.47e-02, grad_scale: 16.0 2024-09-17 08:20:30,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.50 vs. limit=22.5 2024-09-17 08:20:45,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=143472.0, ans=0.125 2024-09-17 08:20:46,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.96 vs. limit=22.5 2024-09-17 08:21:22,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=143565.33333333334, ans=0.125 2024-09-17 08:21:35,063 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.087e+02 2.758e+02 3.922e+02 5.920e+02 1.019e+03, threshold=7.843e+02, percent-clipped=16.0 2024-09-17 08:21:38,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=143612.0, ans=0.125 2024-09-17 08:21:48,024 INFO [train.py:1198] (1/2) Epoch 8, batch 3650, loss[loss=0.2921, simple_loss=0.3201, pruned_loss=0.1023, ctc_loss=0.1986, cr_loss=0.4935, over 34450.00 frames. ], tot_loss[loss=0.277, simple_loss=0.3082, pruned_loss=0.09528, ctc_loss=0.1863, cr_loss=0.4477, over 6770438.35 frames. ], batch size: 110, lr: 1.46e-02, grad_scale: 16.0 2024-09-17 08:21:54,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=143658.66666666666, ans=0.5 2024-09-17 08:22:04,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=143705.33333333334, ans=0.0 2024-09-17 08:22:14,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=143705.33333333334, ans=0.025 2024-09-17 08:22:19,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=143752.0, ans=0.2 2024-09-17 08:22:27,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=143752.0, ans=0.0 2024-09-17 08:22:32,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=143752.0, ans=0.07 2024-09-17 08:22:51,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=143845.33333333334, ans=0.125 2024-09-17 08:23:09,602 INFO [train.py:1198] (1/2) Epoch 8, batch 3700, loss[loss=0.294, simple_loss=0.3246, pruned_loss=0.1022, ctc_loss=0.1995, cr_loss=0.4749, over 34621.00 frames. ], tot_loss[loss=0.2768, simple_loss=0.3082, pruned_loss=0.09515, ctc_loss=0.1859, cr_loss=0.4472, over 6784734.42 frames. ], batch size: 102, lr: 1.46e-02, grad_scale: 16.0 2024-09-17 08:23:11,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=143892.0, ans=0.0 2024-09-17 08:23:20,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.54 vs. limit=22.5 2024-09-17 08:23:38,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=143938.66666666666, ans=0.1 2024-09-17 08:23:50,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=143985.33333333334, ans=0.0 2024-09-17 08:24:03,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=144032.0, ans=0.0 2024-09-17 08:24:17,855 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.169e+02 2.534e+02 2.766e+02 3.362e+02 6.921e+02, threshold=5.532e+02, percent-clipped=0.0 2024-09-17 08:24:30,679 INFO [train.py:1198] (1/2) Epoch 8, batch 3750, loss[loss=0.2875, simple_loss=0.3244, pruned_loss=0.09712, ctc_loss=0.1883, cr_loss=0.4662, over 34310.00 frames. ], tot_loss[loss=0.2807, simple_loss=0.3118, pruned_loss=0.09682, ctc_loss=0.1891, cr_loss=0.4527, over 6785811.55 frames. ], batch size: 113, lr: 1.46e-02, grad_scale: 16.0 2024-09-17 08:24:35,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=144125.33333333334, ans=0.025 2024-09-17 08:25:03,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=144218.66666666666, ans=0.09899494936611666 2024-09-17 08:25:09,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=144218.66666666666, ans=0.0 2024-09-17 08:25:09,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.55 vs. limit=15.0 2024-09-17 08:25:15,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=144218.66666666666, ans=0.0 2024-09-17 08:25:19,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=144265.33333333334, ans=0.2 2024-09-17 08:25:36,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.28 vs. limit=22.5 2024-09-17 08:25:49,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=144312.0, ans=0.125 2024-09-17 08:25:52,103 INFO [train.py:1198] (1/2) Epoch 8, batch 3800, loss[loss=0.3101, simple_loss=0.3254, pruned_loss=0.1155, ctc_loss=0.2188, cr_loss=0.4976, over 29729.00 frames. ], tot_loss[loss=0.2848, simple_loss=0.3149, pruned_loss=0.09898, ctc_loss=0.193, cr_loss=0.4565, over 6675313.90 frames. ], batch size: 175, lr: 1.46e-02, grad_scale: 16.0 2024-09-17 08:25:57,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=144358.66666666666, ans=0.2 2024-09-17 08:26:21,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-09-17 08:26:25,293 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.07 vs. limit=15.0 2024-09-17 08:26:44,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=144498.66666666666, ans=0.04949747468305833 2024-09-17 08:26:57,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=144545.33333333334, ans=0.2 2024-09-17 08:26:57,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=144545.33333333334, ans=0.0 2024-09-17 08:27:01,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=144545.33333333334, ans=0.0 2024-09-17 08:27:02,664 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.303e+02 2.641e+02 2.890e+02 3.205e+02 9.818e+02, threshold=5.781e+02, percent-clipped=1.0 2024-09-17 08:27:16,412 INFO [train.py:1198] (1/2) Epoch 8, batch 3850, loss[loss=0.3224, simple_loss=0.3405, pruned_loss=0.1192, ctc_loss=0.2383, cr_loss=0.4585, over 23577.00 frames. ], tot_loss[loss=0.2923, simple_loss=0.3194, pruned_loss=0.1033, ctc_loss=0.2015, cr_loss=0.4592, over 6253229.60 frames. ], batch size: 244, lr: 1.46e-02, grad_scale: 16.0 2024-09-17 08:27:44,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.54 vs. limit=6.0 2024-09-17 08:27:50,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=144685.33333333334, ans=0.125 2024-09-17 08:27:52,578 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2024-09-17 08:28:47,578 INFO [train.py:1198] (1/2) Epoch 9, batch 0, loss[loss=0.2561, simple_loss=0.2895, pruned_loss=0.08605, ctc_loss=0.1664, cr_loss=0.4338, over 34473.00 frames. ], tot_loss[loss=0.2561, simple_loss=0.2895, pruned_loss=0.08605, ctc_loss=0.1664, cr_loss=0.4338, over 34473.00 frames. ], batch size: 85, lr: 1.38e-02, grad_scale: 32.0 2024-09-17 08:28:47,579 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 08:29:04,362 INFO [train.py:1230] (1/2) Epoch 9, validation: loss=0.1643, simple_loss=0.2631, pruned_loss=0.02715, ctc_loss=0.05569, cr_loss=1.54e-14, over 944034.00 frames. 2024-09-17 08:29:04,363 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 08:29:22,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=144760.0, ans=0.1 2024-09-17 08:29:31,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=144760.0, ans=0.025 2024-09-17 08:29:31,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=144760.0, ans=0.0 2024-09-17 08:29:32,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=144760.0, ans=0.025 2024-09-17 08:29:43,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.28 vs. limit=12.0 2024-09-17 08:29:50,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.43 vs. limit=10.0 2024-09-17 08:30:00,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=144853.33333333334, ans=0.1 2024-09-17 08:30:02,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=144853.33333333334, ans=0.0 2024-09-17 08:30:04,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.64 vs. limit=15.0 2024-09-17 08:30:28,437 INFO [train.py:1198] (1/2) Epoch 9, batch 50, loss[loss=0.2532, simple_loss=0.2852, pruned_loss=0.08558, ctc_loss=0.1701, cr_loss=0.4034, over 34438.00 frames. ], tot_loss[loss=0.2799, simple_loss=0.3106, pruned_loss=0.09664, ctc_loss=0.1893, cr_loss=0.4515, over 1482191.83 frames. ], batch size: 82, lr: 1.38e-02, grad_scale: 16.0 2024-09-17 08:30:30,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=144946.66666666666, ans=0.2 2024-09-17 08:30:43,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=144993.33333333334, ans=0.2 2024-09-17 08:30:43,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=144993.33333333334, ans=0.125 2024-09-17 08:30:56,667 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.220e+02 2.666e+02 3.207e+02 4.206e+02 7.924e+02, threshold=6.414e+02, percent-clipped=7.0 2024-09-17 08:30:56,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=144993.33333333334, ans=0.0 2024-09-17 08:31:08,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=145040.0, ans=0.125 2024-09-17 08:31:20,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=145086.66666666666, ans=0.0 2024-09-17 08:31:21,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=145086.66666666666, ans=0.0 2024-09-17 08:31:37,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=145133.33333333334, ans=0.2 2024-09-17 08:31:46,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=145133.33333333334, ans=0.0 2024-09-17 08:31:53,041 INFO [train.py:1198] (1/2) Epoch 9, batch 100, loss[loss=0.2684, simple_loss=0.3013, pruned_loss=0.09062, ctc_loss=0.1802, cr_loss=0.4544, over 34580.00 frames. ], tot_loss[loss=0.28, simple_loss=0.3113, pruned_loss=0.09638, ctc_loss=0.1889, cr_loss=0.4536, over 2629114.17 frames. ], batch size: 89, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 08:32:29,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=145273.33333333334, ans=0.09899494936611666 2024-09-17 08:32:42,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=145320.0, ans=0.125 2024-09-17 08:32:43,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=145320.0, ans=0.125 2024-09-17 08:32:48,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=145320.0, ans=0.0 2024-09-17 08:33:07,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=145366.66666666666, ans=0.125 2024-09-17 08:33:11,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=145366.66666666666, ans=0.125 2024-09-17 08:33:13,986 INFO [train.py:1198] (1/2) Epoch 9, batch 150, loss[loss=0.2556, simple_loss=0.2874, pruned_loss=0.08617, ctc_loss=0.1712, cr_loss=0.4323, over 34480.00 frames. ], tot_loss[loss=0.2763, simple_loss=0.3081, pruned_loss=0.09473, ctc_loss=0.1859, cr_loss=0.448, over 3556454.25 frames. ], batch size: 82, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 08:33:28,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.70 vs. limit=6.0 2024-09-17 08:33:30,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=145460.0, ans=0.125 2024-09-17 08:33:35,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=145460.0, ans=0.0 2024-09-17 08:33:40,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145460.0, ans=0.1 2024-09-17 08:33:43,566 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.163e+02 2.678e+02 3.186e+02 4.014e+02 8.038e+02, threshold=6.372e+02, percent-clipped=3.0 2024-09-17 08:33:49,401 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.18 vs. limit=15.0 2024-09-17 08:34:08,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=145553.33333333334, ans=0.125 2024-09-17 08:34:08,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=145553.33333333334, ans=0.125 2024-09-17 08:34:12,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=145553.33333333334, ans=0.125 2024-09-17 08:34:20,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=145600.0, ans=0.125 2024-09-17 08:34:20,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=145600.0, ans=0.2 2024-09-17 08:34:38,235 INFO [train.py:1198] (1/2) Epoch 9, batch 200, loss[loss=0.2922, simple_loss=0.3204, pruned_loss=0.1022, ctc_loss=0.2021, cr_loss=0.4834, over 32059.00 frames. ], tot_loss[loss=0.2754, simple_loss=0.3069, pruned_loss=0.09444, ctc_loss=0.1852, cr_loss=0.4477, over 4271959.44 frames. ], batch size: 146, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 08:34:45,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=145646.66666666666, ans=0.0 2024-09-17 08:35:00,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=145693.33333333334, ans=0.125 2024-09-17 08:35:00,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.36 vs. limit=10.0 2024-09-17 08:35:06,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=145693.33333333334, ans=0.0 2024-09-17 08:35:08,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=145693.33333333334, ans=0.125 2024-09-17 08:35:22,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=145740.0, ans=0.0 2024-09-17 08:35:32,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145786.66666666666, ans=0.1 2024-09-17 08:35:37,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=145786.66666666666, ans=0.0 2024-09-17 08:35:43,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=145786.66666666666, ans=0.2 2024-09-17 08:36:03,009 INFO [train.py:1198] (1/2) Epoch 9, batch 250, loss[loss=0.2911, simple_loss=0.3226, pruned_loss=0.1004, ctc_loss=0.1982, cr_loss=0.4775, over 34254.00 frames. ], tot_loss[loss=0.2751, simple_loss=0.3069, pruned_loss=0.09424, ctc_loss=0.1849, cr_loss=0.4476, over 4833081.92 frames. ], batch size: 117, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 08:36:04,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2024-09-17 08:36:11,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=145880.0, ans=0.025 2024-09-17 08:36:13,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=145880.0, ans=0.125 2024-09-17 08:36:13,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.96 vs. limit=22.5 2024-09-17 08:36:21,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=145926.66666666666, ans=0.0 2024-09-17 08:36:25,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-17 08:36:26,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=145926.66666666666, ans=0.0 2024-09-17 08:36:32,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.166e+02 2.683e+02 3.385e+02 4.542e+02 8.415e+02, threshold=6.771e+02, percent-clipped=4.0 2024-09-17 08:36:43,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.93 vs. limit=15.0 2024-09-17 08:36:50,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=146020.0, ans=0.0 2024-09-17 08:36:58,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=12.0 2024-09-17 08:37:06,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=146020.0, ans=0.1 2024-09-17 08:37:17,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=146066.66666666666, ans=0.025 2024-09-17 08:37:17,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=146066.66666666666, ans=0.1 2024-09-17 08:37:20,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=146066.66666666666, ans=0.1 2024-09-17 08:37:23,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.06 vs. limit=15.0 2024-09-17 08:37:25,250 INFO [train.py:1198] (1/2) Epoch 9, batch 300, loss[loss=0.2872, simple_loss=0.3206, pruned_loss=0.09828, ctc_loss=0.1943, cr_loss=0.4566, over 34349.00 frames. ], tot_loss[loss=0.2741, simple_loss=0.3062, pruned_loss=0.09367, ctc_loss=0.1839, cr_loss=0.4463, over 5262395.36 frames. ], batch size: 107, lr: 1.38e-02, grad_scale: 8.0 2024-09-17 08:37:39,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.97 vs. limit=15.0 2024-09-17 08:38:02,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=146206.66666666666, ans=0.125 2024-09-17 08:38:03,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=146206.66666666666, ans=0.0 2024-09-17 08:38:07,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=146206.66666666666, ans=0.2 2024-09-17 08:38:49,039 INFO [train.py:1198] (1/2) Epoch 9, batch 350, loss[loss=0.2562, simple_loss=0.2807, pruned_loss=0.09008, ctc_loss=0.1729, cr_loss=0.4258, over 34290.00 frames. ], tot_loss[loss=0.2744, simple_loss=0.3066, pruned_loss=0.09374, ctc_loss=0.1842, cr_loss=0.447, over 5598526.41 frames. ], batch size: 83, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 08:38:49,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=146346.66666666666, ans=0.2 2024-09-17 08:38:50,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=146346.66666666666, ans=0.0 2024-09-17 08:39:20,742 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.062e+02 2.752e+02 3.231e+02 3.987e+02 8.289e+02, threshold=6.461e+02, percent-clipped=1.0 2024-09-17 08:39:42,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=146486.66666666666, ans=0.015 2024-09-17 08:39:44,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=22.5 2024-09-17 08:39:59,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=146533.33333333334, ans=0.025 2024-09-17 08:40:12,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=146580.0, ans=0.0 2024-09-17 08:40:13,359 INFO [train.py:1198] (1/2) Epoch 9, batch 400, loss[loss=0.2834, simple_loss=0.3158, pruned_loss=0.0973, ctc_loss=0.1928, cr_loss=0.447, over 34408.00 frames. ], tot_loss[loss=0.2733, simple_loss=0.3058, pruned_loss=0.09315, ctc_loss=0.1832, cr_loss=0.4461, over 5864678.74 frames. ], batch size: 95, lr: 1.37e-02, grad_scale: 16.0 2024-09-17 08:40:30,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=146626.66666666666, ans=0.0 2024-09-17 08:40:30,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=146626.66666666666, ans=0.125 2024-09-17 08:40:31,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=146626.66666666666, ans=0.125 2024-09-17 08:40:33,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=146626.66666666666, ans=0.125 2024-09-17 08:40:33,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=146626.66666666666, ans=0.2 2024-09-17 08:41:00,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.63 vs. limit=15.0 2024-09-17 08:41:15,125 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:41:18,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=146766.66666666666, ans=0.025 2024-09-17 08:41:35,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=146766.66666666666, ans=0.0 2024-09-17 08:41:37,927 INFO [train.py:1198] (1/2) Epoch 9, batch 450, loss[loss=0.2844, simple_loss=0.3176, pruned_loss=0.09741, ctc_loss=0.1902, cr_loss=0.4568, over 34717.00 frames. ], tot_loss[loss=0.2736, simple_loss=0.306, pruned_loss=0.09332, ctc_loss=0.1834, cr_loss=0.4466, over 6055152.92 frames. ], batch size: 97, lr: 1.37e-02, grad_scale: 16.0 2024-09-17 08:41:48,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=146813.33333333334, ans=0.125 2024-09-17 08:42:07,671 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.111e+02 2.572e+02 3.102e+02 4.196e+02 7.966e+02, threshold=6.205e+02, percent-clipped=5.0 2024-09-17 08:42:14,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=146906.66666666666, ans=0.1 2024-09-17 08:42:15,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.52 vs. limit=15.0 2024-09-17 08:42:42,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=147000.0, ans=0.0 2024-09-17 08:42:46,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=12.0 2024-09-17 08:42:50,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=147000.0, ans=0.025 2024-09-17 08:42:52,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=147000.0, ans=0.025 2024-09-17 08:42:55,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=147000.0, ans=0.125 2024-09-17 08:43:02,513 INFO [train.py:1198] (1/2) Epoch 9, batch 500, loss[loss=0.2959, simple_loss=0.3221, pruned_loss=0.1054, ctc_loss=0.1984, cr_loss=0.4779, over 34445.00 frames. ], tot_loss[loss=0.2729, simple_loss=0.3052, pruned_loss=0.09316, ctc_loss=0.1827, cr_loss=0.4458, over 6221194.81 frames. ], batch size: 110, lr: 1.37e-02, grad_scale: 16.0 2024-09-17 08:43:02,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=147046.66666666666, ans=0.0 2024-09-17 08:43:07,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=147046.66666666666, ans=0.125 2024-09-17 08:43:09,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=147046.66666666666, ans=0.125 2024-09-17 08:43:35,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=147140.0, ans=0.025 2024-09-17 08:44:00,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=147186.66666666666, ans=0.125 2024-09-17 08:44:07,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.16 vs. limit=22.5 2024-09-17 08:44:24,540 INFO [train.py:1198] (1/2) Epoch 9, batch 550, loss[loss=0.2977, simple_loss=0.3232, pruned_loss=0.1059, ctc_loss=0.207, cr_loss=0.4711, over 33787.00 frames. ], tot_loss[loss=0.2733, simple_loss=0.3054, pruned_loss=0.09334, ctc_loss=0.1831, cr_loss=0.4459, over 6330519.01 frames. ], batch size: 122, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 08:44:28,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=147280.0, ans=0.0 2024-09-17 08:44:42,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=147326.66666666666, ans=0.0 2024-09-17 08:44:51,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=147326.66666666666, ans=0.0 2024-09-17 08:44:55,727 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.660e+02 3.243e+02 4.046e+02 6.805e+02, threshold=6.485e+02, percent-clipped=3.0 2024-09-17 08:45:11,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=147373.33333333334, ans=0.125 2024-09-17 08:45:49,198 INFO [train.py:1198] (1/2) Epoch 9, batch 600, loss[loss=0.2839, simple_loss=0.3169, pruned_loss=0.09722, ctc_loss=0.1922, cr_loss=0.4482, over 34311.00 frames. ], tot_loss[loss=0.2733, simple_loss=0.3056, pruned_loss=0.09327, ctc_loss=0.183, cr_loss=0.4453, over 6432243.23 frames. ], batch size: 117, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 08:45:49,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=147513.33333333334, ans=0.125 2024-09-17 08:46:11,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.02 vs. limit=10.0 2024-09-17 08:46:30,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=147606.66666666666, ans=0.125 2024-09-17 08:46:31,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=147606.66666666666, ans=0.125 2024-09-17 08:46:37,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.15 vs. limit=12.0 2024-09-17 08:46:48,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=147653.33333333334, ans=0.125 2024-09-17 08:46:52,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=147653.33333333334, ans=0.0 2024-09-17 08:46:55,511 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:47:07,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=147700.0, ans=0.09899494936611666 2024-09-17 08:47:11,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=147746.66666666666, ans=0.07 2024-09-17 08:47:13,334 INFO [train.py:1198] (1/2) Epoch 9, batch 650, loss[loss=0.268, simple_loss=0.3006, pruned_loss=0.091, ctc_loss=0.1781, cr_loss=0.4454, over 34555.00 frames. ], tot_loss[loss=0.2721, simple_loss=0.3047, pruned_loss=0.09268, ctc_loss=0.1821, cr_loss=0.4445, over 6523064.77 frames. ], batch size: 94, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 08:47:25,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2024-09-17 08:47:44,623 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.510e+02 3.000e+02 3.906e+02 7.505e+02, threshold=6.000e+02, percent-clipped=5.0 2024-09-17 08:47:48,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=147840.0, ans=0.125 2024-09-17 08:47:56,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2024-09-17 08:48:08,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.60 vs. limit=15.0 2024-09-17 08:48:20,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.87 vs. limit=15.0 2024-09-17 08:48:33,051 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:48:35,834 INFO [train.py:1198] (1/2) Epoch 9, batch 700, loss[loss=0.2568, simple_loss=0.2892, pruned_loss=0.08723, ctc_loss=0.1664, cr_loss=0.4195, over 34610.00 frames. ], tot_loss[loss=0.272, simple_loss=0.3049, pruned_loss=0.09252, ctc_loss=0.1817, cr_loss=0.4442, over 6580440.22 frames. ], batch size: 89, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 08:48:45,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=147980.0, ans=0.07 2024-09-17 08:49:00,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=148026.66666666666, ans=0.125 2024-09-17 08:49:12,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=148073.33333333334, ans=0.125 2024-09-17 08:49:51,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-09-17 08:49:57,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=148166.66666666666, ans=0.0 2024-09-17 08:49:59,828 INFO [train.py:1198] (1/2) Epoch 9, batch 750, loss[loss=0.2771, simple_loss=0.3099, pruned_loss=0.09437, ctc_loss=0.1862, cr_loss=0.4576, over 34397.00 frames. ], tot_loss[loss=0.2714, simple_loss=0.3043, pruned_loss=0.0923, ctc_loss=0.1812, cr_loss=0.443, over 6622428.78 frames. ], batch size: 95, lr: 1.37e-02, grad_scale: 8.0 2024-09-17 08:50:03,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=148213.33333333334, ans=0.025 2024-09-17 08:50:30,393 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.176e+02 2.901e+02 3.883e+02 5.474e+02 8.898e+02, threshold=7.766e+02, percent-clipped=17.0 2024-09-17 08:50:34,199 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:51:11,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=148400.0, ans=0.125 2024-09-17 08:51:13,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=148400.0, ans=0.0 2024-09-17 08:51:13,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2024-09-17 08:51:24,169 INFO [train.py:1198] (1/2) Epoch 9, batch 800, loss[loss=0.2338, simple_loss=0.2726, pruned_loss=0.07454, ctc_loss=0.1482, cr_loss=0.4062, over 34451.00 frames. ], tot_loss[loss=0.2719, simple_loss=0.3046, pruned_loss=0.09257, ctc_loss=0.1816, cr_loss=0.444, over 6658264.87 frames. ], batch size: 85, lr: 1.36e-02, grad_scale: 16.0 2024-09-17 08:51:33,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.23 vs. limit=15.0 2024-09-17 08:51:35,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=148446.66666666666, ans=0.125 2024-09-17 08:51:43,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.00 vs. limit=22.5 2024-09-17 08:51:49,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=148493.33333333334, ans=0.2 2024-09-17 08:51:54,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=148493.33333333334, ans=0.125 2024-09-17 08:52:19,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-09-17 08:52:30,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=148633.33333333334, ans=0.0 2024-09-17 08:52:38,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=148633.33333333334, ans=0.1 2024-09-17 08:52:46,139 INFO [train.py:1198] (1/2) Epoch 9, batch 850, loss[loss=0.2796, simple_loss=0.316, pruned_loss=0.09413, ctc_loss=0.1854, cr_loss=0.4467, over 34374.00 frames. ], tot_loss[loss=0.2717, simple_loss=0.3045, pruned_loss=0.09246, ctc_loss=0.1813, cr_loss=0.4435, over 6692161.00 frames. ], batch size: 103, lr: 1.36e-02, grad_scale: 16.0 2024-09-17 08:53:19,020 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.680e+02 3.241e+02 4.133e+02 8.120e+02, threshold=6.481e+02, percent-clipped=1.0 2024-09-17 08:53:19,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=148773.33333333334, ans=0.2 2024-09-17 08:53:24,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=148773.33333333334, ans=0.125 2024-09-17 08:53:30,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=148773.33333333334, ans=0.1 2024-09-17 08:53:42,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=148820.0, ans=0.125 2024-09-17 08:53:55,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=148866.66666666666, ans=0.025 2024-09-17 08:54:00,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=148866.66666666666, ans=0.125 2024-09-17 08:54:04,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=148866.66666666666, ans=0.0 2024-09-17 08:54:07,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=148866.66666666666, ans=0.07 2024-09-17 08:54:10,242 INFO [train.py:1198] (1/2) Epoch 9, batch 900, loss[loss=0.2551, simple_loss=0.2874, pruned_loss=0.0863, ctc_loss=0.1697, cr_loss=0.404, over 34465.00 frames. ], tot_loss[loss=0.272, simple_loss=0.3047, pruned_loss=0.0926, ctc_loss=0.1816, cr_loss=0.4436, over 6698908.02 frames. ], batch size: 85, lr: 1.36e-02, grad_scale: 16.0 2024-09-17 08:54:14,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.33 vs. limit=15.0 2024-09-17 08:54:18,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=148913.33333333334, ans=0.0 2024-09-17 08:54:53,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=149006.66666666666, ans=0.0 2024-09-17 08:55:12,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.36 vs. limit=22.5 2024-09-17 08:55:13,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=149053.33333333334, ans=0.0 2024-09-17 08:55:33,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=15.0 2024-09-17 08:55:34,124 INFO [train.py:1198] (1/2) Epoch 9, batch 950, loss[loss=0.2566, simple_loss=0.29, pruned_loss=0.08658, ctc_loss=0.1691, cr_loss=0.4076, over 34689.00 frames. ], tot_loss[loss=0.2724, simple_loss=0.3051, pruned_loss=0.09279, ctc_loss=0.1819, cr_loss=0.4442, over 6701815.08 frames. ], batch size: 87, lr: 1.36e-02, grad_scale: 16.0 2024-09-17 08:55:37,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=149146.66666666666, ans=0.2 2024-09-17 08:55:40,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=149146.66666666666, ans=0.125 2024-09-17 08:55:44,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=149146.66666666666, ans=0.0 2024-09-17 08:56:05,286 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.062e+02 2.628e+02 2.983e+02 3.741e+02 9.958e+02, threshold=5.967e+02, percent-clipped=4.0 2024-09-17 08:56:09,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-09-17 08:56:12,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=149240.0, ans=0.125 2024-09-17 08:56:14,269 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2024-09-17 08:56:22,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=15.0 2024-09-17 08:56:31,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=15.0 2024-09-17 08:56:35,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=149286.66666666666, ans=0.125 2024-09-17 08:56:36,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.31 vs. limit=10.0 2024-09-17 08:56:39,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=149286.66666666666, ans=0.0 2024-09-17 08:56:39,599 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.20 vs. limit=22.5 2024-09-17 08:57:00,187 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 08:57:04,750 INFO [train.py:1198] (1/2) Epoch 9, batch 1000, loss[loss=0.2651, simple_loss=0.2938, pruned_loss=0.09257, ctc_loss=0.1742, cr_loss=0.4116, over 34496.00 frames. ], tot_loss[loss=0.2734, simple_loss=0.3057, pruned_loss=0.09332, ctc_loss=0.1828, cr_loss=0.4454, over 6694436.86 frames. ], batch size: 90, lr: 1.36e-02, grad_scale: 16.0 2024-09-17 08:57:29,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=149426.66666666666, ans=0.125 2024-09-17 08:57:47,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=149473.33333333334, ans=0.0 2024-09-17 08:57:54,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=149520.0, ans=0.0 2024-09-17 08:58:12,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=149566.66666666666, ans=0.125 2024-09-17 08:58:24,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=149566.66666666666, ans=0.125 2024-09-17 08:58:27,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=149613.33333333334, ans=0.2 2024-09-17 08:58:29,310 INFO [train.py:1198] (1/2) Epoch 9, batch 1050, loss[loss=0.2803, simple_loss=0.3178, pruned_loss=0.09385, ctc_loss=0.1873, cr_loss=0.442, over 34573.00 frames. ], tot_loss[loss=0.2722, simple_loss=0.3048, pruned_loss=0.09277, ctc_loss=0.1818, cr_loss=0.4446, over 6702385.03 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 16.0 2024-09-17 08:58:44,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=149660.0, ans=0.1 2024-09-17 08:58:59,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=149660.0, ans=0.125 2024-09-17 08:59:00,535 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.085e+02 2.746e+02 3.330e+02 4.408e+02 8.336e+02, threshold=6.660e+02, percent-clipped=5.0 2024-09-17 08:59:16,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.48 vs. limit=15.0 2024-09-17 08:59:48,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=149800.0, ans=0.05 2024-09-17 08:59:51,784 INFO [train.py:1198] (1/2) Epoch 9, batch 1100, loss[loss=0.2717, simple_loss=0.3032, pruned_loss=0.09258, ctc_loss=0.178, cr_loss=0.4868, over 34336.00 frames. ], tot_loss[loss=0.2718, simple_loss=0.3047, pruned_loss=0.09241, ctc_loss=0.1812, cr_loss=0.4439, over 6715905.62 frames. ], batch size: 91, lr: 1.36e-02, grad_scale: 16.0 2024-09-17 08:59:57,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.68 vs. limit=22.5 2024-09-17 09:00:16,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.85 vs. limit=10.0 2024-09-17 09:00:21,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=149893.33333333334, ans=0.125 2024-09-17 09:00:30,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=149940.0, ans=0.125 2024-09-17 09:00:38,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=149940.0, ans=0.0 2024-09-17 09:01:08,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.94 vs. limit=15.0 2024-09-17 09:01:13,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=150033.33333333334, ans=0.025 2024-09-17 09:01:16,141 INFO [train.py:1198] (1/2) Epoch 9, batch 1150, loss[loss=0.2716, simple_loss=0.3043, pruned_loss=0.09209, ctc_loss=0.1815, cr_loss=0.4584, over 34331.00 frames. ], tot_loss[loss=0.2722, simple_loss=0.3049, pruned_loss=0.09271, ctc_loss=0.1816, cr_loss=0.4437, over 6714116.82 frames. ], batch size: 91, lr: 1.36e-02, grad_scale: 16.0 2024-09-17 09:01:31,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=150126.66666666666, ans=0.125 2024-09-17 09:01:44,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=150126.66666666666, ans=0.1 2024-09-17 09:01:47,567 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.092e+02 2.669e+02 3.256e+02 4.509e+02 7.801e+02, threshold=6.513e+02, percent-clipped=3.0 2024-09-17 09:01:51,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.38 vs. limit=15.0 2024-09-17 09:01:52,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.00 vs. limit=15.0 2024-09-17 09:02:18,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=150220.0, ans=0.1 2024-09-17 09:02:22,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=150266.66666666666, ans=0.125 2024-09-17 09:02:41,192 INFO [train.py:1198] (1/2) Epoch 9, batch 1200, loss[loss=0.2743, simple_loss=0.3136, pruned_loss=0.09074, ctc_loss=0.1808, cr_loss=0.4342, over 34556.00 frames. ], tot_loss[loss=0.273, simple_loss=0.3057, pruned_loss=0.09305, ctc_loss=0.1823, cr_loss=0.4451, over 6706865.03 frames. ], batch size: 99, lr: 1.36e-02, grad_scale: 32.0 2024-09-17 09:02:46,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=150313.33333333334, ans=0.0 2024-09-17 09:02:49,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=150313.33333333334, ans=0.125 2024-09-17 09:03:17,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=150406.66666666666, ans=0.1 2024-09-17 09:03:19,533 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:03:24,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=150406.66666666666, ans=0.0 2024-09-17 09:03:25,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=150406.66666666666, ans=0.125 2024-09-17 09:03:30,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=150453.33333333334, ans=0.0 2024-09-17 09:03:34,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=150453.33333333334, ans=0.025 2024-09-17 09:03:44,173 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:03:50,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=150500.0, ans=0.2 2024-09-17 09:04:05,305 INFO [train.py:1198] (1/2) Epoch 9, batch 1250, loss[loss=0.2861, simple_loss=0.3175, pruned_loss=0.09914, ctc_loss=0.1895, cr_loss=0.4655, over 34350.00 frames. ], tot_loss[loss=0.2737, simple_loss=0.3063, pruned_loss=0.09329, ctc_loss=0.1828, cr_loss=0.4463, over 6741333.23 frames. ], batch size: 107, lr: 1.36e-02, grad_scale: 16.0 2024-09-17 09:04:05,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=150546.66666666666, ans=0.1 2024-09-17 09:04:13,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=150546.66666666666, ans=0.0 2024-09-17 09:04:19,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.95 vs. limit=12.0 2024-09-17 09:04:36,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=150640.0, ans=0.0 2024-09-17 09:04:38,314 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.211e+02 2.617e+02 3.157e+02 4.050e+02 8.692e+02, threshold=6.314e+02, percent-clipped=1.0 2024-09-17 09:04:58,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2024-09-17 09:05:01,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=150686.66666666666, ans=0.125 2024-09-17 09:05:08,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=150686.66666666666, ans=0.125 2024-09-17 09:05:13,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=150733.33333333334, ans=0.025 2024-09-17 09:05:15,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=150733.33333333334, ans=0.0 2024-09-17 09:05:20,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=150733.33333333334, ans=0.0 2024-09-17 09:05:21,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=150733.33333333334, ans=0.125 2024-09-17 09:05:24,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=150733.33333333334, ans=0.2 2024-09-17 09:05:27,946 INFO [train.py:1198] (1/2) Epoch 9, batch 1300, loss[loss=0.2772, simple_loss=0.3131, pruned_loss=0.09373, ctc_loss=0.1818, cr_loss=0.4382, over 33222.00 frames. ], tot_loss[loss=0.2726, simple_loss=0.3054, pruned_loss=0.09285, ctc_loss=0.1818, cr_loss=0.4446, over 6743739.80 frames. ], batch size: 130, lr: 1.35e-02, grad_scale: 16.0 2024-09-17 09:05:51,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=150826.66666666666, ans=0.125 2024-09-17 09:06:01,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=150873.33333333334, ans=0.1 2024-09-17 09:06:01,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=150873.33333333334, ans=0.2 2024-09-17 09:06:04,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=150873.33333333334, ans=0.2 2024-09-17 09:06:14,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=150873.33333333334, ans=0.1 2024-09-17 09:06:46,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=150966.66666666666, ans=0.0 2024-09-17 09:06:52,393 INFO [train.py:1198] (1/2) Epoch 9, batch 1350, loss[loss=0.2635, simple_loss=0.301, pruned_loss=0.08704, ctc_loss=0.1718, cr_loss=0.4381, over 34540.00 frames. ], tot_loss[loss=0.2725, simple_loss=0.3053, pruned_loss=0.09273, ctc_loss=0.1817, cr_loss=0.4455, over 6762920.43 frames. ], batch size: 94, lr: 1.35e-02, grad_scale: 16.0 2024-09-17 09:06:52,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=151013.33333333334, ans=0.125 2024-09-17 09:06:52,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=151013.33333333334, ans=0.125 2024-09-17 09:07:24,839 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.121e+02 2.540e+02 3.177e+02 4.589e+02 8.191e+02, threshold=6.355e+02, percent-clipped=6.0 2024-09-17 09:07:26,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=151106.66666666666, ans=0.125 2024-09-17 09:07:40,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.79 vs. limit=15.0 2024-09-17 09:07:56,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=151153.33333333334, ans=0.0 2024-09-17 09:08:02,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.48 vs. limit=15.0 2024-09-17 09:08:16,294 INFO [train.py:1198] (1/2) Epoch 9, batch 1400, loss[loss=0.2292, simple_loss=0.2645, pruned_loss=0.07495, ctc_loss=0.1468, cr_loss=0.3648, over 34285.00 frames. ], tot_loss[loss=0.2719, simple_loss=0.3049, pruned_loss=0.09243, ctc_loss=0.1812, cr_loss=0.4439, over 6774887.29 frames. ], batch size: 80, lr: 1.35e-02, grad_scale: 16.0 2024-09-17 09:08:17,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=15.0 2024-09-17 09:08:26,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=151246.66666666666, ans=0.07 2024-09-17 09:08:28,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=151246.66666666666, ans=0.125 2024-09-17 09:08:39,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=151293.33333333334, ans=0.125 2024-09-17 09:08:41,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=151293.33333333334, ans=0.125 2024-09-17 09:08:41,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=151293.33333333334, ans=0.1 2024-09-17 09:08:48,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2024-09-17 09:08:52,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=151340.0, ans=0.025 2024-09-17 09:09:15,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=151386.66666666666, ans=0.1 2024-09-17 09:09:23,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=151433.33333333334, ans=0.125 2024-09-17 09:09:32,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=151433.33333333334, ans=0.0 2024-09-17 09:09:38,364 INFO [train.py:1198] (1/2) Epoch 9, batch 1450, loss[loss=0.286, simple_loss=0.3199, pruned_loss=0.09812, ctc_loss=0.1911, cr_loss=0.4412, over 34436.00 frames. ], tot_loss[loss=0.2724, simple_loss=0.3055, pruned_loss=0.09262, ctc_loss=0.1816, cr_loss=0.4447, over 6771346.40 frames. ], batch size: 110, lr: 1.35e-02, grad_scale: 16.0 2024-09-17 09:09:54,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.61 vs. limit=22.5 2024-09-17 09:10:13,592 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.575e+02 3.157e+02 3.826e+02 7.223e+02, threshold=6.314e+02, percent-clipped=1.0 2024-09-17 09:10:18,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=151573.33333333334, ans=0.125 2024-09-17 09:11:02,731 INFO [train.py:1198] (1/2) Epoch 9, batch 1500, loss[loss=0.2757, simple_loss=0.3136, pruned_loss=0.09202, ctc_loss=0.1798, cr_loss=0.4464, over 34445.00 frames. ], tot_loss[loss=0.2722, simple_loss=0.3053, pruned_loss=0.09249, ctc_loss=0.1813, cr_loss=0.4444, over 6772685.77 frames. ], batch size: 100, lr: 1.35e-02, grad_scale: 16.0 2024-09-17 09:11:18,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=151760.0, ans=0.0 2024-09-17 09:11:43,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=151806.66666666666, ans=0.125 2024-09-17 09:11:48,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=151806.66666666666, ans=0.07 2024-09-17 09:12:01,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=151853.33333333334, ans=0.0 2024-09-17 09:12:10,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-09-17 09:12:16,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=151900.0, ans=0.025 2024-09-17 09:12:16,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=151900.0, ans=0.0 2024-09-17 09:12:19,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.41 vs. limit=22.5 2024-09-17 09:12:22,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=151900.0, ans=0.125 2024-09-17 09:12:27,195 INFO [train.py:1198] (1/2) Epoch 9, batch 1550, loss[loss=0.3164, simple_loss=0.3435, pruned_loss=0.1122, ctc_loss=0.2205, cr_loss=0.5187, over 34434.00 frames. ], tot_loss[loss=0.2726, simple_loss=0.3055, pruned_loss=0.09274, ctc_loss=0.1819, cr_loss=0.4449, over 6744887.14 frames. ], batch size: 105, lr: 1.35e-02, grad_scale: 16.0 2024-09-17 09:12:37,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=151946.66666666666, ans=0.0 2024-09-17 09:12:47,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=151993.33333333334, ans=0.0 2024-09-17 09:12:53,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=151993.33333333334, ans=0.125 2024-09-17 09:12:59,897 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.116e+02 2.747e+02 3.419e+02 4.345e+02 1.436e+03, threshold=6.838e+02, percent-clipped=4.0 2024-09-17 09:13:06,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=152040.0, ans=0.2 2024-09-17 09:13:08,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=152040.0, ans=0.125 2024-09-17 09:13:16,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=152086.66666666666, ans=0.2 2024-09-17 09:13:18,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=152086.66666666666, ans=0.0 2024-09-17 09:13:20,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=152086.66666666666, ans=0.0 2024-09-17 09:13:51,968 INFO [train.py:1198] (1/2) Epoch 9, batch 1600, loss[loss=0.2609, simple_loss=0.3015, pruned_loss=0.08435, ctc_loss=0.1726, cr_loss=0.4254, over 34563.00 frames. ], tot_loss[loss=0.2725, simple_loss=0.3053, pruned_loss=0.0928, ctc_loss=0.182, cr_loss=0.4448, over 6724488.25 frames. ], batch size: 99, lr: 1.35e-02, grad_scale: 32.0 2024-09-17 09:13:54,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=152180.0, ans=0.04949747468305833 2024-09-17 09:13:54,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.36 vs. limit=15.0 2024-09-17 09:13:58,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=152180.0, ans=0.2 2024-09-17 09:14:04,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-09-17 09:14:05,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=152180.0, ans=0.125 2024-09-17 09:14:10,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=152226.66666666666, ans=0.1 2024-09-17 09:14:16,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=152226.66666666666, ans=0.0 2024-09-17 09:14:20,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=152226.66666666666, ans=15.0 2024-09-17 09:14:23,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=152273.33333333334, ans=0.125 2024-09-17 09:14:48,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=152320.0, ans=0.2 2024-09-17 09:14:50,106 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2024-09-17 09:15:15,598 INFO [train.py:1198] (1/2) Epoch 9, batch 1650, loss[loss=0.289, simple_loss=0.3258, pruned_loss=0.09765, ctc_loss=0.1912, cr_loss=0.4646, over 34369.00 frames. ], tot_loss[loss=0.2726, simple_loss=0.3054, pruned_loss=0.09285, ctc_loss=0.1821, cr_loss=0.4448, over 6717050.17 frames. ], batch size: 103, lr: 1.35e-02, grad_scale: 32.0 2024-09-17 09:15:20,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=152413.33333333334, ans=0.0 2024-09-17 09:15:29,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=152413.33333333334, ans=0.0 2024-09-17 09:15:48,455 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.186e+02 2.562e+02 3.152e+02 3.712e+02 7.797e+02, threshold=6.304e+02, percent-clipped=1.0 2024-09-17 09:15:51,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=152506.66666666666, ans=0.025 2024-09-17 09:16:15,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=152553.33333333334, ans=0.0 2024-09-17 09:16:31,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=152600.0, ans=0.125 2024-09-17 09:16:34,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=152600.0, ans=0.2 2024-09-17 09:16:37,424 INFO [train.py:1198] (1/2) Epoch 9, batch 1700, loss[loss=0.2401, simple_loss=0.2716, pruned_loss=0.08001, ctc_loss=0.1622, cr_loss=0.4028, over 34269.00 frames. ], tot_loss[loss=0.272, simple_loss=0.3049, pruned_loss=0.09251, ctc_loss=0.1814, cr_loss=0.4449, over 6742639.80 frames. ], batch size: 80, lr: 1.35e-02, grad_scale: 32.0 2024-09-17 09:16:46,076 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:17:02,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=152693.33333333334, ans=0.0 2024-09-17 09:17:07,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=152693.33333333334, ans=0.1 2024-09-17 09:17:28,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=152786.66666666666, ans=0.125 2024-09-17 09:17:52,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=152833.33333333334, ans=0.0 2024-09-17 09:17:56,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=152833.33333333334, ans=0.1 2024-09-17 09:18:02,312 INFO [train.py:1198] (1/2) Epoch 9, batch 1750, loss[loss=0.2469, simple_loss=0.2778, pruned_loss=0.08277, ctc_loss=0.1662, cr_loss=0.4327, over 34141.00 frames. ], tot_loss[loss=0.2714, simple_loss=0.3042, pruned_loss=0.09227, ctc_loss=0.1811, cr_loss=0.4447, over 6753383.70 frames. ], batch size: 78, lr: 1.35e-02, grad_scale: 32.0 2024-09-17 09:18:06,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2024-09-17 09:18:20,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=152926.66666666666, ans=0.125 2024-09-17 09:18:30,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=152926.66666666666, ans=0.0 2024-09-17 09:18:35,545 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.193e+02 2.651e+02 3.057e+02 4.083e+02 8.656e+02, threshold=6.114e+02, percent-clipped=4.0 2024-09-17 09:18:48,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=15.0 2024-09-17 09:18:51,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2024-09-17 09:19:17,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=153066.66666666666, ans=0.1 2024-09-17 09:19:27,042 INFO [train.py:1198] (1/2) Epoch 9, batch 1800, loss[loss=0.3024, simple_loss=0.334, pruned_loss=0.1057, ctc_loss=0.2031, cr_loss=0.4708, over 34689.00 frames. ], tot_loss[loss=0.2722, simple_loss=0.3049, pruned_loss=0.09262, ctc_loss=0.1817, cr_loss=0.4457, over 6755692.92 frames. ], batch size: 97, lr: 1.34e-02, grad_scale: 32.0 2024-09-17 09:19:27,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=153113.33333333334, ans=0.125 2024-09-17 09:19:40,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=153113.33333333334, ans=0.1 2024-09-17 09:19:45,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=153160.0, ans=0.125 2024-09-17 09:20:23,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=153253.33333333334, ans=0.125 2024-09-17 09:20:28,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=153253.33333333334, ans=0.025 2024-09-17 09:20:43,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=153300.0, ans=0.95 2024-09-17 09:20:49,661 INFO [train.py:1198] (1/2) Epoch 9, batch 1850, loss[loss=0.2695, simple_loss=0.3071, pruned_loss=0.08914, ctc_loss=0.1792, cr_loss=0.4455, over 34436.00 frames. ], tot_loss[loss=0.2718, simple_loss=0.3046, pruned_loss=0.0925, ctc_loss=0.1815, cr_loss=0.4454, over 6759885.70 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 32.0 2024-09-17 09:21:24,552 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.789e+02 4.018e+02 5.487e+02 1.084e+03, threshold=8.037e+02, percent-clipped=20.0 2024-09-17 09:21:41,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=153486.66666666666, ans=0.125 2024-09-17 09:21:43,261 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:21:48,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=153486.66666666666, ans=0.2 2024-09-17 09:22:11,614 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2024-09-17 09:22:13,862 INFO [train.py:1198] (1/2) Epoch 9, batch 1900, loss[loss=0.2835, simple_loss=0.3183, pruned_loss=0.09617, ctc_loss=0.1887, cr_loss=0.4644, over 34388.00 frames. ], tot_loss[loss=0.2725, simple_loss=0.3053, pruned_loss=0.09268, ctc_loss=0.1819, cr_loss=0.4463, over 6769543.48 frames. ], batch size: 103, lr: 1.34e-02, grad_scale: 32.0 2024-09-17 09:22:20,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=153580.0, ans=0.125 2024-09-17 09:22:42,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=153626.66666666666, ans=0.125 2024-09-17 09:23:13,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=153720.0, ans=0.2 2024-09-17 09:23:14,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=153720.0, ans=0.0 2024-09-17 09:23:29,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=153766.66666666666, ans=0.0 2024-09-17 09:23:37,662 INFO [train.py:1198] (1/2) Epoch 9, batch 1950, loss[loss=0.2661, simple_loss=0.3006, pruned_loss=0.08931, ctc_loss=0.1767, cr_loss=0.4417, over 34367.00 frames. ], tot_loss[loss=0.2735, simple_loss=0.3064, pruned_loss=0.09305, ctc_loss=0.1825, cr_loss=0.4475, over 6786745.61 frames. ], batch size: 91, lr: 1.34e-02, grad_scale: 32.0 2024-09-17 09:23:41,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=153813.33333333334, ans=0.0 2024-09-17 09:23:47,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=153813.33333333334, ans=0.1 2024-09-17 09:24:10,482 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.075e+02 2.698e+02 2.925e+02 3.761e+02 6.458e+02, threshold=5.850e+02, percent-clipped=0.0 2024-09-17 09:24:10,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=153906.66666666666, ans=0.125 2024-09-17 09:24:18,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=153906.66666666666, ans=0.125 2024-09-17 09:24:52,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=154000.0, ans=0.2 2024-09-17 09:25:02,180 INFO [train.py:1198] (1/2) Epoch 9, batch 2000, loss[loss=0.2389, simple_loss=0.2712, pruned_loss=0.07936, ctc_loss=0.1557, cr_loss=0.4161, over 34146.00 frames. ], tot_loss[loss=0.2742, simple_loss=0.3071, pruned_loss=0.09338, ctc_loss=0.183, cr_loss=0.4478, over 6762208.37 frames. ], batch size: 78, lr: 1.34e-02, grad_scale: 32.0 2024-09-17 09:25:04,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=154046.66666666666, ans=0.0 2024-09-17 09:25:07,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=154046.66666666666, ans=0.0 2024-09-17 09:25:07,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=154046.66666666666, ans=0.125 2024-09-17 09:25:10,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=154046.66666666666, ans=6.0 2024-09-17 09:25:35,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=154140.0, ans=0.1 2024-09-17 09:25:40,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=154140.0, ans=0.2 2024-09-17 09:25:47,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=154140.0, ans=0.125 2024-09-17 09:25:47,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=154140.0, ans=0.0 2024-09-17 09:25:50,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=154186.66666666666, ans=0.025 2024-09-17 09:26:03,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=154186.66666666666, ans=0.0 2024-09-17 09:26:08,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=154233.33333333334, ans=0.025 2024-09-17 09:26:08,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=154233.33333333334, ans=0.0 2024-09-17 09:26:26,549 INFO [train.py:1198] (1/2) Epoch 9, batch 2050, loss[loss=0.2516, simple_loss=0.2862, pruned_loss=0.08388, ctc_loss=0.1645, cr_loss=0.4088, over 34428.00 frames. ], tot_loss[loss=0.2729, simple_loss=0.3058, pruned_loss=0.09284, ctc_loss=0.1822, cr_loss=0.4461, over 6754555.35 frames. ], batch size: 82, lr: 1.34e-02, grad_scale: 32.0 2024-09-17 09:26:30,267 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:26:43,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=154326.66666666666, ans=0.125 2024-09-17 09:26:59,378 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.199e+02 2.769e+02 3.546e+02 4.409e+02 7.557e+02, threshold=7.091e+02, percent-clipped=8.0 2024-09-17 09:27:14,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=154420.0, ans=0.125 2024-09-17 09:27:25,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.09 vs. limit=15.0 2024-09-17 09:27:29,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=154420.0, ans=0.125 2024-09-17 09:27:32,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=154466.66666666666, ans=0.0 2024-09-17 09:27:48,535 INFO [train.py:1198] (1/2) Epoch 9, batch 2100, loss[loss=0.2672, simple_loss=0.3029, pruned_loss=0.08953, ctc_loss=0.1738, cr_loss=0.4444, over 34536.00 frames. ], tot_loss[loss=0.2716, simple_loss=0.3046, pruned_loss=0.09224, ctc_loss=0.1812, cr_loss=0.4445, over 6769049.87 frames. ], batch size: 94, lr: 1.34e-02, grad_scale: 32.0 2024-09-17 09:27:48,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=154513.33333333334, ans=0.1 2024-09-17 09:27:55,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=154513.33333333334, ans=0.125 2024-09-17 09:28:11,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=154560.0, ans=0.125 2024-09-17 09:28:18,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=154560.0, ans=0.2 2024-09-17 09:29:01,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=154700.0, ans=0.125 2024-09-17 09:29:03,330 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:29:08,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=154700.0, ans=0.5 2024-09-17 09:29:12,708 INFO [train.py:1198] (1/2) Epoch 9, batch 2150, loss[loss=0.2718, simple_loss=0.3047, pruned_loss=0.09292, ctc_loss=0.1786, cr_loss=0.4366, over 34353.00 frames. ], tot_loss[loss=0.2704, simple_loss=0.3036, pruned_loss=0.09173, ctc_loss=0.1801, cr_loss=0.4429, over 6788065.63 frames. ], batch size: 91, lr: 1.34e-02, grad_scale: 32.0 2024-09-17 09:29:16,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=154746.66666666666, ans=0.125 2024-09-17 09:29:27,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=154793.33333333334, ans=0.1 2024-09-17 09:29:29,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=154793.33333333334, ans=0.125 2024-09-17 09:29:41,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=154793.33333333334, ans=0.125 2024-09-17 09:29:45,881 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 2.610e+02 3.443e+02 4.717e+02 1.038e+03, threshold=6.886e+02, percent-clipped=9.0 2024-09-17 09:29:49,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=154840.0, ans=0.125 2024-09-17 09:30:26,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=154933.33333333334, ans=0.2 2024-09-17 09:30:37,163 INFO [train.py:1198] (1/2) Epoch 9, batch 2200, loss[loss=0.291, simple_loss=0.3215, pruned_loss=0.1013, ctc_loss=0.194, cr_loss=0.477, over 34455.00 frames. ], tot_loss[loss=0.2715, simple_loss=0.3043, pruned_loss=0.09232, ctc_loss=0.181, cr_loss=0.4449, over 6783354.38 frames. ], batch size: 100, lr: 1.34e-02, grad_scale: 32.0 2024-09-17 09:30:44,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=154980.0, ans=0.125 2024-09-17 09:30:56,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.01 vs. limit=22.5 2024-09-17 09:30:58,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=155026.66666666666, ans=0.0 2024-09-17 09:31:10,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=155073.33333333334, ans=0.125 2024-09-17 09:31:11,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=155073.33333333334, ans=0.1 2024-09-17 09:31:43,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=155166.66666666666, ans=0.025 2024-09-17 09:31:43,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=155166.66666666666, ans=0.0 2024-09-17 09:31:56,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=155166.66666666666, ans=0.1 2024-09-17 09:31:59,468 INFO [train.py:1198] (1/2) Epoch 9, batch 2250, loss[loss=0.2596, simple_loss=0.2974, pruned_loss=0.08524, ctc_loss=0.1695, cr_loss=0.4363, over 34423.00 frames. ], tot_loss[loss=0.2709, simple_loss=0.304, pruned_loss=0.09197, ctc_loss=0.1803, cr_loss=0.4436, over 6780317.81 frames. ], batch size: 95, lr: 1.34e-02, grad_scale: 16.0 2024-09-17 09:31:59,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=155213.33333333334, ans=0.2 2024-09-17 09:32:19,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=155260.0, ans=0.1 2024-09-17 09:32:34,152 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.531e+02 2.989e+02 3.821e+02 5.982e+02, threshold=5.979e+02, percent-clipped=0.0 2024-09-17 09:32:48,494 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:32:58,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=155353.33333333334, ans=0.125 2024-09-17 09:32:58,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=155353.33333333334, ans=0.125 2024-09-17 09:33:19,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=155400.0, ans=10.0 2024-09-17 09:33:21,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=155400.0, ans=0.125 2024-09-17 09:33:21,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=155400.0, ans=0.125 2024-09-17 09:33:24,318 INFO [train.py:1198] (1/2) Epoch 9, batch 2300, loss[loss=0.2445, simple_loss=0.2804, pruned_loss=0.07993, ctc_loss=0.1632, cr_loss=0.4039, over 34287.00 frames. ], tot_loss[loss=0.2698, simple_loss=0.303, pruned_loss=0.09152, ctc_loss=0.1795, cr_loss=0.4414, over 6765674.74 frames. ], batch size: 83, lr: 1.34e-02, grad_scale: 16.0 2024-09-17 09:33:29,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=155446.66666666666, ans=0.125 2024-09-17 09:33:36,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=155446.66666666666, ans=0.0 2024-09-17 09:33:44,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.38 vs. limit=15.0 2024-09-17 09:33:45,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=155493.33333333334, ans=0.0 2024-09-17 09:34:02,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=155540.0, ans=0.125 2024-09-17 09:34:33,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.89 vs. limit=15.0 2024-09-17 09:34:48,331 INFO [train.py:1198] (1/2) Epoch 9, batch 2350, loss[loss=0.2914, simple_loss=0.3263, pruned_loss=0.09908, ctc_loss=0.1942, cr_loss=0.4891, over 34705.00 frames. ], tot_loss[loss=0.27, simple_loss=0.3032, pruned_loss=0.09161, ctc_loss=0.1798, cr_loss=0.442, over 6772228.15 frames. ], batch size: 97, lr: 1.33e-02, grad_scale: 16.0 2024-09-17 09:34:56,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=155680.0, ans=0.125 2024-09-17 09:35:12,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.02 vs. limit=15.0 2024-09-17 09:35:22,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.55 vs. limit=6.0 2024-09-17 09:35:22,844 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.727e+02 3.332e+02 4.110e+02 8.022e+02, threshold=6.664e+02, percent-clipped=4.0 2024-09-17 09:35:27,409 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-17 09:35:41,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=155820.0, ans=0.0 2024-09-17 09:35:51,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=12.0 2024-09-17 09:36:00,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=155866.66666666666, ans=0.125 2024-09-17 09:36:01,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=155866.66666666666, ans=0.125 2024-09-17 09:36:04,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=155866.66666666666, ans=0.125 2024-09-17 09:36:06,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=155866.66666666666, ans=22.5 2024-09-17 09:36:11,055 INFO [train.py:1198] (1/2) Epoch 9, batch 2400, loss[loss=0.2567, simple_loss=0.2922, pruned_loss=0.08505, ctc_loss=0.1682, cr_loss=0.4364, over 34599.00 frames. ], tot_loss[loss=0.2702, simple_loss=0.3036, pruned_loss=0.09161, ctc_loss=0.1798, cr_loss=0.4425, over 6776094.85 frames. ], batch size: 89, lr: 1.33e-02, grad_scale: 32.0 2024-09-17 09:36:17,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=155913.33333333334, ans=0.125 2024-09-17 09:36:24,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=155913.33333333334, ans=0.2 2024-09-17 09:36:45,548 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.68 vs. limit=15.0 2024-09-17 09:36:51,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=156006.66666666666, ans=0.2 2024-09-17 09:37:14,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=156053.33333333334, ans=0.0 2024-09-17 09:37:34,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=156100.0, ans=0.1 2024-09-17 09:37:37,387 INFO [train.py:1198] (1/2) Epoch 9, batch 2450, loss[loss=0.2774, simple_loss=0.3128, pruned_loss=0.09373, ctc_loss=0.1847, cr_loss=0.4386, over 34409.00 frames. ], tot_loss[loss=0.2718, simple_loss=0.305, pruned_loss=0.09231, ctc_loss=0.1811, cr_loss=0.4449, over 6750399.03 frames. ], batch size: 95, lr: 1.33e-02, grad_scale: 16.0 2024-09-17 09:37:55,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=156193.33333333334, ans=0.0 2024-09-17 09:38:02,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=156193.33333333334, ans=0.125 2024-09-17 09:38:13,623 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.091e+02 2.668e+02 3.472e+02 4.873e+02 8.670e+02, threshold=6.945e+02, percent-clipped=5.0 2024-09-17 09:38:36,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=156286.66666666666, ans=0.125 2024-09-17 09:38:40,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=156286.66666666666, ans=0.125 2024-09-17 09:38:59,674 INFO [train.py:1198] (1/2) Epoch 9, batch 2500, loss[loss=0.2887, simple_loss=0.3247, pruned_loss=0.09753, ctc_loss=0.1914, cr_loss=0.484, over 34448.00 frames. ], tot_loss[loss=0.2719, simple_loss=0.3051, pruned_loss=0.09228, ctc_loss=0.1812, cr_loss=0.4455, over 6761411.24 frames. ], batch size: 100, lr: 1.33e-02, grad_scale: 16.0 2024-09-17 09:39:21,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=156426.66666666666, ans=0.2 2024-09-17 09:39:32,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=156473.33333333334, ans=0.125 2024-09-17 09:39:50,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.max_positive, batch_count=156520.0, ans=0.95 2024-09-17 09:39:58,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=156520.0, ans=0.125 2024-09-17 09:40:05,808 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:40:13,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=156566.66666666666, ans=0.125 2024-09-17 09:40:23,961 INFO [train.py:1198] (1/2) Epoch 9, batch 2550, loss[loss=0.2351, simple_loss=0.2702, pruned_loss=0.07665, ctc_loss=0.1537, cr_loss=0.4007, over 34173.00 frames. ], tot_loss[loss=0.2718, simple_loss=0.305, pruned_loss=0.09228, ctc_loss=0.1811, cr_loss=0.4452, over 6765648.87 frames. ], batch size: 78, lr: 1.33e-02, grad_scale: 16.0 2024-09-17 09:40:27,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=156613.33333333334, ans=0.125 2024-09-17 09:40:37,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=156613.33333333334, ans=0.125 2024-09-17 09:40:47,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=156660.0, ans=0.125 2024-09-17 09:40:52,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=156660.0, ans=0.125 2024-09-17 09:41:01,765 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.128e+02 2.634e+02 3.111e+02 4.367e+02 7.593e+02, threshold=6.221e+02, percent-clipped=1.0 2024-09-17 09:41:29,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.71 vs. limit=10.0 2024-09-17 09:41:40,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=156800.0, ans=0.2 2024-09-17 09:41:48,632 INFO [train.py:1198] (1/2) Epoch 9, batch 2600, loss[loss=0.2612, simple_loss=0.2941, pruned_loss=0.08842, ctc_loss=0.1724, cr_loss=0.4261, over 34391.00 frames. ], tot_loss[loss=0.2726, simple_loss=0.3058, pruned_loss=0.09264, ctc_loss=0.1818, cr_loss=0.4463, over 6762383.99 frames. ], batch size: 91, lr: 1.33e-02, grad_scale: 16.0 2024-09-17 09:42:03,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=156893.33333333334, ans=0.025 2024-09-17 09:42:19,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=156940.0, ans=0.2 2024-09-17 09:42:36,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=156986.66666666666, ans=0.0 2024-09-17 09:42:39,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=156986.66666666666, ans=0.0 2024-09-17 09:42:45,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=156986.66666666666, ans=0.125 2024-09-17 09:42:54,206 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.954e-03 2024-09-17 09:42:58,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=157033.33333333334, ans=0.5 2024-09-17 09:43:08,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=157080.0, ans=0.125 2024-09-17 09:43:09,883 INFO [train.py:1198] (1/2) Epoch 9, batch 2650, loss[loss=0.3007, simple_loss=0.3303, pruned_loss=0.1046, ctc_loss=0.2068, cr_loss=0.5088, over 34251.00 frames. ], tot_loss[loss=0.2723, simple_loss=0.3057, pruned_loss=0.09237, ctc_loss=0.1815, cr_loss=0.446, over 6769190.09 frames. ], batch size: 117, lr: 1.33e-02, grad_scale: 16.0 2024-09-17 09:43:12,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.90 vs. limit=15.0 2024-09-17 09:43:18,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=157080.0, ans=0.125 2024-09-17 09:43:33,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=157126.66666666666, ans=0.1 2024-09-17 09:43:45,866 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.043e+02 2.544e+02 2.880e+02 3.457e+02 6.991e+02, threshold=5.759e+02, percent-clipped=2.0 2024-09-17 09:44:02,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=157220.0, ans=0.2 2024-09-17 09:44:14,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=157220.0, ans=0.1 2024-09-17 09:44:25,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=157266.66666666666, ans=0.125 2024-09-17 09:44:33,765 INFO [train.py:1198] (1/2) Epoch 9, batch 2700, loss[loss=0.2765, simple_loss=0.3137, pruned_loss=0.09229, ctc_loss=0.1822, cr_loss=0.4595, over 34591.00 frames. ], tot_loss[loss=0.2724, simple_loss=0.3058, pruned_loss=0.09243, ctc_loss=0.1816, cr_loss=0.4471, over 6765567.35 frames. ], batch size: 102, lr: 1.33e-02, grad_scale: 16.0 2024-09-17 09:44:50,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2024-09-17 09:44:50,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=157360.0, ans=0.2 2024-09-17 09:44:52,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=157360.0, ans=0.025 2024-09-17 09:45:37,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.40 vs. limit=12.0 2024-09-17 09:45:43,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=157500.0, ans=0.0 2024-09-17 09:45:58,086 INFO [train.py:1198] (1/2) Epoch 9, batch 2750, loss[loss=0.2621, simple_loss=0.2896, pruned_loss=0.09098, ctc_loss=0.1774, cr_loss=0.4314, over 34673.00 frames. ], tot_loss[loss=0.2706, simple_loss=0.3042, pruned_loss=0.09164, ctc_loss=0.18, cr_loss=0.4446, over 6761712.16 frames. ], batch size: 88, lr: 1.33e-02, grad_scale: 16.0 2024-09-17 09:46:19,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=157593.33333333334, ans=0.125 2024-09-17 09:46:34,177 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.157e+02 2.646e+02 3.263e+02 4.299e+02 8.835e+02, threshold=6.527e+02, percent-clipped=6.0 2024-09-17 09:46:37,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=157640.0, ans=0.1 2024-09-17 09:46:41,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=157640.0, ans=0.2 2024-09-17 09:46:42,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=157640.0, ans=0.07 2024-09-17 09:46:44,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=157640.0, ans=0.125 2024-09-17 09:46:55,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=15.0 2024-09-17 09:47:04,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=157733.33333333334, ans=0.125 2024-09-17 09:47:11,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=157733.33333333334, ans=0.1 2024-09-17 09:47:13,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=157733.33333333334, ans=0.0 2024-09-17 09:47:18,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=157733.33333333334, ans=15.0 2024-09-17 09:47:20,990 INFO [train.py:1198] (1/2) Epoch 9, batch 2800, loss[loss=0.3149, simple_loss=0.3337, pruned_loss=0.1165, ctc_loss=0.2259, cr_loss=0.4477, over 23955.00 frames. ], tot_loss[loss=0.271, simple_loss=0.3043, pruned_loss=0.09192, ctc_loss=0.1804, cr_loss=0.4449, over 6740589.33 frames. ], batch size: 245, lr: 1.33e-02, grad_scale: 32.0 2024-09-17 09:47:24,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=157780.0, ans=0.2 2024-09-17 09:47:46,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.93 vs. limit=12.0 2024-09-17 09:47:53,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=157826.66666666666, ans=0.025 2024-09-17 09:48:01,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=157873.33333333334, ans=0.125 2024-09-17 09:48:26,280 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2024-09-17 09:48:40,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=157966.66666666666, ans=0.125 2024-09-17 09:48:46,916 INFO [train.py:1198] (1/2) Epoch 9, batch 2850, loss[loss=0.2566, simple_loss=0.2904, pruned_loss=0.08639, ctc_loss=0.1681, cr_loss=0.4083, over 34453.00 frames. ], tot_loss[loss=0.2715, simple_loss=0.3048, pruned_loss=0.09218, ctc_loss=0.1809, cr_loss=0.4448, over 6724464.01 frames. ], batch size: 90, lr: 1.32e-02, grad_scale: 32.0 2024-09-17 09:49:16,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=158060.0, ans=0.125 2024-09-17 09:49:21,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=158106.66666666666, ans=0.04949747468305833 2024-09-17 09:49:23,165 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.121e+02 2.632e+02 3.047e+02 3.655e+02 5.995e+02, threshold=6.094e+02, percent-clipped=0.0 2024-09-17 09:49:51,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=158200.0, ans=0.025 2024-09-17 09:49:53,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=158200.0, ans=0.2 2024-09-17 09:50:09,101 INFO [train.py:1198] (1/2) Epoch 9, batch 2900, loss[loss=0.2594, simple_loss=0.3037, pruned_loss=0.08274, ctc_loss=0.1625, cr_loss=0.4279, over 34548.00 frames. ], tot_loss[loss=0.2725, simple_loss=0.3058, pruned_loss=0.09246, ctc_loss=0.1816, cr_loss=0.4464, over 6755228.66 frames. ], batch size: 94, lr: 1.32e-02, grad_scale: 32.0 2024-09-17 09:50:47,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=158340.0, ans=0.2 2024-09-17 09:51:00,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=158386.66666666666, ans=0.0 2024-09-17 09:51:06,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.51 vs. limit=12.0 2024-09-17 09:51:06,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.07 vs. limit=15.0 2024-09-17 09:51:10,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=158386.66666666666, ans=0.0 2024-09-17 09:51:23,058 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=15.0 2024-09-17 09:51:27,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=158433.33333333334, ans=0.1 2024-09-17 09:51:30,413 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:51:31,719 INFO [train.py:1198] (1/2) Epoch 9, batch 2950, loss[loss=0.2601, simple_loss=0.2925, pruned_loss=0.0888, ctc_loss=0.1694, cr_loss=0.4041, over 34646.00 frames. ], tot_loss[loss=0.2707, simple_loss=0.3041, pruned_loss=0.09169, ctc_loss=0.1802, cr_loss=0.4445, over 6750886.20 frames. ], batch size: 88, lr: 1.32e-02, grad_scale: 32.0 2024-09-17 09:51:33,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=158480.0, ans=0.2 2024-09-17 09:52:05,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=158526.66666666666, ans=0.125 2024-09-17 09:52:11,992 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.101e+02 2.673e+02 3.145e+02 3.989e+02 7.470e+02, threshold=6.290e+02, percent-clipped=3.0 2024-09-17 09:52:12,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=158573.33333333334, ans=0.0 2024-09-17 09:52:25,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=158620.0, ans=12.0 2024-09-17 09:52:30,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=158620.0, ans=0.2 2024-09-17 09:52:48,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=158666.66666666666, ans=0.125 2024-09-17 09:52:53,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=158666.66666666666, ans=0.0 2024-09-17 09:52:58,502 INFO [train.py:1198] (1/2) Epoch 9, batch 3000, loss[loss=0.2651, simple_loss=0.3029, pruned_loss=0.08761, ctc_loss=0.172, cr_loss=0.4455, over 34519.00 frames. ], tot_loss[loss=0.2702, simple_loss=0.3038, pruned_loss=0.09146, ctc_loss=0.1798, cr_loss=0.4437, over 6751555.30 frames. ], batch size: 94, lr: 1.32e-02, grad_scale: 32.0 2024-09-17 09:52:58,503 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 09:53:15,255 INFO [train.py:1230] (1/2) Epoch 9, validation: loss=0.1601, simple_loss=0.2581, pruned_loss=0.02583, ctc_loss=0.05205, cr_loss=1.532e-14, over 944034.00 frames. 2024-09-17 09:53:15,255 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 09:53:15,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=158713.33333333334, ans=0.2 2024-09-17 09:53:22,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=158713.33333333334, ans=0.125 2024-09-17 09:53:24,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=158713.33333333334, ans=0.125 2024-09-17 09:53:38,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=158760.0, ans=0.0 2024-09-17 09:53:39,631 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.68 vs. limit=15.0 2024-09-17 09:53:55,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=158806.66666666666, ans=0.07 2024-09-17 09:54:16,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=158853.33333333334, ans=0.0 2024-09-17 09:54:24,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=158900.0, ans=0.1 2024-09-17 09:54:25,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=158900.0, ans=0.0 2024-09-17 09:54:29,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=158900.0, ans=0.125 2024-09-17 09:54:37,018 INFO [train.py:1198] (1/2) Epoch 9, batch 3050, loss[loss=0.2473, simple_loss=0.2886, pruned_loss=0.0793, ctc_loss=0.1572, cr_loss=0.3998, over 34588.00 frames. ], tot_loss[loss=0.2714, simple_loss=0.3049, pruned_loss=0.092, ctc_loss=0.1808, cr_loss=0.4447, over 6742166.71 frames. ], batch size: 89, lr: 1.32e-02, grad_scale: 32.0 2024-09-17 09:54:50,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=158946.66666666666, ans=0.125 2024-09-17 09:54:53,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=158993.33333333334, ans=22.5 2024-09-17 09:54:57,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2024-09-17 09:55:02,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=158993.33333333334, ans=0.1 2024-09-17 09:55:12,554 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.087e+02 2.699e+02 3.238e+02 3.911e+02 9.060e+02, threshold=6.477e+02, percent-clipped=4.0 2024-09-17 09:55:27,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=159086.66666666666, ans=0.0 2024-09-17 09:55:27,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=159086.66666666666, ans=0.0 2024-09-17 09:55:40,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=159133.33333333334, ans=10.0 2024-09-17 09:55:41,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=159133.33333333334, ans=0.0 2024-09-17 09:55:43,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=159133.33333333334, ans=0.0 2024-09-17 09:55:57,613 INFO [train.py:1198] (1/2) Epoch 9, batch 3100, loss[loss=0.2923, simple_loss=0.3226, pruned_loss=0.1016, ctc_loss=0.2002, cr_loss=0.4658, over 34247.00 frames. ], tot_loss[loss=0.2707, simple_loss=0.3042, pruned_loss=0.09171, ctc_loss=0.1803, cr_loss=0.4447, over 6742171.20 frames. ], batch size: 117, lr: 1.32e-02, grad_scale: 16.0 2024-09-17 09:56:23,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=159226.66666666666, ans=0.125 2024-09-17 09:57:10,824 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 09:57:22,033 INFO [train.py:1198] (1/2) Epoch 9, batch 3150, loss[loss=0.2814, simple_loss=0.314, pruned_loss=0.09593, ctc_loss=0.1905, cr_loss=0.4692, over 33770.00 frames. ], tot_loss[loss=0.2702, simple_loss=0.3038, pruned_loss=0.09146, ctc_loss=0.1798, cr_loss=0.4433, over 6748139.19 frames. ], batch size: 122, lr: 1.32e-02, grad_scale: 16.0 2024-09-17 09:57:27,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=159413.33333333334, ans=0.125 2024-09-17 09:57:46,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=159460.0, ans=0.125 2024-09-17 09:57:51,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=159460.0, ans=0.125 2024-09-17 09:57:58,792 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.620e+02 2.861e+02 3.667e+02 7.264e+02, threshold=5.722e+02, percent-clipped=2.0 2024-09-17 09:58:12,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=159553.33333333334, ans=0.125 2024-09-17 09:58:23,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=159553.33333333334, ans=0.125 2024-09-17 09:58:31,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=159600.0, ans=0.125 2024-09-17 09:58:37,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=159600.0, ans=10.0 2024-09-17 09:58:42,551 INFO [train.py:1198] (1/2) Epoch 9, batch 3200, loss[loss=0.2747, simple_loss=0.3101, pruned_loss=0.0921, ctc_loss=0.1826, cr_loss=0.4651, over 34511.00 frames. ], tot_loss[loss=0.2687, simple_loss=0.3026, pruned_loss=0.09071, ctc_loss=0.1785, cr_loss=0.4416, over 6761833.79 frames. ], batch size: 94, lr: 1.32e-02, grad_scale: 32.0 2024-09-17 09:58:52,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=159646.66666666666, ans=0.125 2024-09-17 09:59:05,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=159693.33333333334, ans=0.125 2024-09-17 09:59:20,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=159740.0, ans=0.1 2024-09-17 09:59:20,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=159740.0, ans=0.2 2024-09-17 10:00:03,696 INFO [train.py:1198] (1/2) Epoch 9, batch 3250, loss[loss=0.2811, simple_loss=0.32, pruned_loss=0.0937, ctc_loss=0.1866, cr_loss=0.4379, over 34673.00 frames. ], tot_loss[loss=0.2694, simple_loss=0.3035, pruned_loss=0.09097, ctc_loss=0.179, cr_loss=0.4426, over 6771206.71 frames. ], batch size: 98, lr: 1.32e-02, grad_scale: 32.0 2024-09-17 10:00:04,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2024-09-17 10:00:31,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.30 vs. limit=15.0 2024-09-17 10:00:37,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=159973.33333333334, ans=0.025 2024-09-17 10:00:40,608 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.148e+02 2.522e+02 3.113e+02 4.458e+02 8.402e+02, threshold=6.227e+02, percent-clipped=11.0 2024-09-17 10:00:44,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.82 vs. limit=12.0 2024-09-17 10:00:52,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=160020.0, ans=0.0 2024-09-17 10:01:24,335 INFO [train.py:1198] (1/2) Epoch 9, batch 3300, loss[loss=0.2792, simple_loss=0.3171, pruned_loss=0.09341, ctc_loss=0.1826, cr_loss=0.4513, over 33109.00 frames. ], tot_loss[loss=0.2686, simple_loss=0.3026, pruned_loss=0.09064, ctc_loss=0.1783, cr_loss=0.441, over 6769178.05 frames. ], batch size: 130, lr: 1.32e-02, grad_scale: 32.0 2024-09-17 10:01:24,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=160113.33333333334, ans=0.1 2024-09-17 10:02:00,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=160206.66666666666, ans=0.0 2024-09-17 10:02:06,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=160206.66666666666, ans=0.125 2024-09-17 10:02:38,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=160300.0, ans=0.125 2024-09-17 10:02:39,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=160300.0, ans=0.0 2024-09-17 10:02:47,637 INFO [train.py:1198] (1/2) Epoch 9, batch 3350, loss[loss=0.2923, simple_loss=0.3205, pruned_loss=0.1024, ctc_loss=0.2033, cr_loss=0.4668, over 33761.00 frames. ], tot_loss[loss=0.2702, simple_loss=0.3039, pruned_loss=0.09144, ctc_loss=0.1798, cr_loss=0.4428, over 6744342.54 frames. ], batch size: 122, lr: 1.32e-02, grad_scale: 32.0 2024-09-17 10:02:52,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=160346.66666666666, ans=0.025 2024-09-17 10:02:53,467 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.84 vs. limit=22.5 2024-09-17 10:03:09,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.80 vs. limit=12.0 2024-09-17 10:03:12,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=160393.33333333334, ans=0.1 2024-09-17 10:03:24,907 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.533e+02 3.272e+02 4.617e+02 1.046e+03, threshold=6.545e+02, percent-clipped=9.0 2024-09-17 10:03:34,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=160486.66666666666, ans=0.0 2024-09-17 10:03:41,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2024-09-17 10:03:44,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=160486.66666666666, ans=0.125 2024-09-17 10:03:49,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=160486.66666666666, ans=0.125 2024-09-17 10:03:59,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=160533.33333333334, ans=0.2 2024-09-17 10:04:01,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.41 vs. limit=15.0 2024-09-17 10:04:05,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=160533.33333333334, ans=0.2 2024-09-17 10:04:08,542 INFO [train.py:1198] (1/2) Epoch 9, batch 3400, loss[loss=0.2366, simple_loss=0.2722, pruned_loss=0.07751, ctc_loss=0.1521, cr_loss=0.3869, over 34121.00 frames. ], tot_loss[loss=0.2701, simple_loss=0.3035, pruned_loss=0.09154, ctc_loss=0.1798, cr_loss=0.4428, over 6733636.33 frames. ], batch size: 78, lr: 1.31e-02, grad_scale: 32.0 2024-09-17 10:04:13,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=160580.0, ans=0.0 2024-09-17 10:04:16,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=160580.0, ans=0.125 2024-09-17 10:04:44,710 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.96 vs. limit=15.0 2024-09-17 10:04:47,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=160673.33333333334, ans=0.0 2024-09-17 10:04:49,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=160673.33333333334, ans=0.2 2024-09-17 10:04:57,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=160720.0, ans=0.125 2024-09-17 10:04:57,548 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.42 vs. limit=22.5 2024-09-17 10:05:24,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=160766.66666666666, ans=0.1 2024-09-17 10:05:27,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=160813.33333333334, ans=0.1 2024-09-17 10:05:29,072 INFO [train.py:1198] (1/2) Epoch 9, batch 3450, loss[loss=0.2939, simple_loss=0.3267, pruned_loss=0.1014, ctc_loss=0.1996, cr_loss=0.4596, over 33106.00 frames. ], tot_loss[loss=0.2709, simple_loss=0.3043, pruned_loss=0.09178, ctc_loss=0.1802, cr_loss=0.4447, over 6745768.46 frames. ], batch size: 130, lr: 1.31e-02, grad_scale: 16.0 2024-09-17 10:05:34,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=160813.33333333334, ans=0.05 2024-09-17 10:06:07,739 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.033e+02 2.501e+02 3.075e+02 4.115e+02 7.266e+02, threshold=6.151e+02, percent-clipped=4.0 2024-09-17 10:06:20,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=160953.33333333334, ans=0.1 2024-09-17 10:06:25,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=160953.33333333334, ans=0.1 2024-09-17 10:06:30,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=160953.33333333334, ans=0.125 2024-09-17 10:06:32,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.34 vs. limit=15.0 2024-09-17 10:06:49,401 INFO [train.py:1198] (1/2) Epoch 9, batch 3500, loss[loss=0.2472, simple_loss=0.2836, pruned_loss=0.08134, ctc_loss=0.1598, cr_loss=0.4068, over 34439.00 frames. ], tot_loss[loss=0.2702, simple_loss=0.3036, pruned_loss=0.09153, ctc_loss=0.1797, cr_loss=0.4441, over 6747637.98 frames. ], batch size: 85, lr: 1.31e-02, grad_scale: 16.0 2024-09-17 10:06:55,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=161046.66666666666, ans=0.0 2024-09-17 10:06:55,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=161046.66666666666, ans=0.0 2024-09-17 10:07:00,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=161046.66666666666, ans=0.125 2024-09-17 10:07:11,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=22.5 2024-09-17 10:07:15,329 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.78 vs. limit=15.0 2024-09-17 10:07:30,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=161140.0, ans=0.1 2024-09-17 10:07:34,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=161140.0, ans=0.125 2024-09-17 10:07:56,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.28 vs. limit=15.0 2024-09-17 10:08:05,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.97 vs. limit=22.5 2024-09-17 10:08:11,345 INFO [train.py:1198] (1/2) Epoch 9, batch 3550, loss[loss=0.2836, simple_loss=0.3168, pruned_loss=0.09683, ctc_loss=0.1858, cr_loss=0.492, over 34394.00 frames. ], tot_loss[loss=0.27, simple_loss=0.3036, pruned_loss=0.09141, ctc_loss=0.1794, cr_loss=0.4441, over 6758186.99 frames. ], batch size: 103, lr: 1.31e-02, grad_scale: 16.0 2024-09-17 10:08:18,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=161280.0, ans=0.125 2024-09-17 10:08:49,409 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.488e+02 3.053e+02 4.183e+02 7.438e+02, threshold=6.105e+02, percent-clipped=3.0 2024-09-17 10:09:16,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.61 vs. limit=15.0 2024-09-17 10:09:17,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=161466.66666666666, ans=0.125 2024-09-17 10:09:28,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=161466.66666666666, ans=0.1 2024-09-17 10:09:30,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=161513.33333333334, ans=0.025 2024-09-17 10:09:31,527 INFO [train.py:1198] (1/2) Epoch 9, batch 3600, loss[loss=0.2818, simple_loss=0.3118, pruned_loss=0.09726, ctc_loss=0.1966, cr_loss=0.4476, over 34473.00 frames. ], tot_loss[loss=0.2702, simple_loss=0.3037, pruned_loss=0.09146, ctc_loss=0.1796, cr_loss=0.4447, over 6767394.97 frames. ], batch size: 90, lr: 1.31e-02, grad_scale: 32.0 2024-09-17 10:10:11,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=161606.66666666666, ans=0.125 2024-09-17 10:10:46,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=161700.0, ans=0.125 2024-09-17 10:10:53,502 INFO [train.py:1198] (1/2) Epoch 9, batch 3650, loss[loss=0.2932, simple_loss=0.3237, pruned_loss=0.1023, ctc_loss=0.1987, cr_loss=0.4571, over 34439.00 frames. ], tot_loss[loss=0.2694, simple_loss=0.3031, pruned_loss=0.09104, ctc_loss=0.179, cr_loss=0.4438, over 6769258.16 frames. ], batch size: 110, lr: 1.31e-02, grad_scale: 32.0 2024-09-17 10:11:32,141 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.625e+02 2.960e+02 3.596e+02 7.408e+02, threshold=5.919e+02, percent-clipped=3.0 2024-09-17 10:11:34,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.95 vs. limit=15.0 2024-09-17 10:11:43,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=161886.66666666666, ans=0.125 2024-09-17 10:11:56,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=161933.33333333334, ans=0.0 2024-09-17 10:12:13,450 INFO [train.py:1198] (1/2) Epoch 9, batch 3700, loss[loss=0.2822, simple_loss=0.3153, pruned_loss=0.09625, ctc_loss=0.1902, cr_loss=0.4631, over 34622.00 frames. ], tot_loss[loss=0.2689, simple_loss=0.3031, pruned_loss=0.09067, ctc_loss=0.1784, cr_loss=0.4428, over 6783793.67 frames. ], batch size: 102, lr: 1.31e-02, grad_scale: 32.0 2024-09-17 10:12:18,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=161980.0, ans=0.1 2024-09-17 10:12:33,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=162026.66666666666, ans=0.04949747468305833 2024-09-17 10:12:33,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=21.54 vs. limit=15.0 2024-09-17 10:12:35,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.24 vs. limit=15.0 2024-09-17 10:12:37,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2024-09-17 10:12:58,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=162073.33333333334, ans=0.5 2024-09-17 10:13:20,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.62 vs. limit=15.0 2024-09-17 10:13:30,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=162166.66666666666, ans=0.125 2024-09-17 10:13:34,545 INFO [train.py:1198] (1/2) Epoch 9, batch 3750, loss[loss=0.2687, simple_loss=0.3087, pruned_loss=0.08809, ctc_loss=0.1744, cr_loss=0.4412, over 34308.00 frames. ], tot_loss[loss=0.2728, simple_loss=0.3067, pruned_loss=0.09236, ctc_loss=0.1813, cr_loss=0.4477, over 6784765.76 frames. ], batch size: 113, lr: 1.31e-02, grad_scale: 32.0 2024-09-17 10:13:41,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=162213.33333333334, ans=0.0 2024-09-17 10:13:52,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=162260.0, ans=0.0 2024-09-17 10:13:52,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=162260.0, ans=0.125 2024-09-17 10:13:57,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=162260.0, ans=0.0 2024-09-17 10:14:14,584 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.200e+02 2.464e+02 2.707e+02 3.246e+02 6.773e+02, threshold=5.414e+02, percent-clipped=2.0 2024-09-17 10:14:41,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=162400.0, ans=0.0 2024-09-17 10:14:52,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=162400.0, ans=0.1 2024-09-17 10:14:57,012 INFO [train.py:1198] (1/2) Epoch 9, batch 3800, loss[loss=0.3112, simple_loss=0.3323, pruned_loss=0.1134, ctc_loss=0.2179, cr_loss=0.4918, over 29749.00 frames. ], tot_loss[loss=0.2776, simple_loss=0.3102, pruned_loss=0.0949, ctc_loss=0.1857, cr_loss=0.4526, over 6676366.68 frames. ], batch size: 175, lr: 1.31e-02, grad_scale: 32.0 2024-09-17 10:15:10,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=162446.66666666666, ans=0.1 2024-09-17 10:15:19,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=162493.33333333334, ans=0.0 2024-09-17 10:15:54,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=162586.66666666666, ans=0.125 2024-09-17 10:15:58,825 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=11.78 vs. limit=12.0 2024-09-17 10:15:59,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=162586.66666666666, ans=0.1 2024-09-17 10:16:05,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.22 vs. limit=22.5 2024-09-17 10:16:21,075 INFO [train.py:1198] (1/2) Epoch 9, batch 3850, loss[loss=0.315, simple_loss=0.3333, pruned_loss=0.1151, ctc_loss=0.2368, cr_loss=0.4786, over 24644.00 frames. ], tot_loss[loss=0.2848, simple_loss=0.3145, pruned_loss=0.09904, ctc_loss=0.194, cr_loss=0.4561, over 6250865.95 frames. ], batch size: 245, lr: 1.31e-02, grad_scale: 32.0 2024-09-17 10:16:21,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=162680.0, ans=0.0 2024-09-17 10:17:00,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=11.25 vs. limit=12.0 2024-09-17 10:17:01,108 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.236e+02 2.644e+02 2.898e+02 3.246e+02 5.848e+02, threshold=5.796e+02, percent-clipped=1.0 2024-09-17 10:17:53,496 INFO [train.py:1198] (1/2) Epoch 10, batch 0, loss[loss=0.2507, simple_loss=0.2861, pruned_loss=0.08288, ctc_loss=0.1659, cr_loss=0.4092, over 34473.00 frames. ], tot_loss[loss=0.2507, simple_loss=0.2861, pruned_loss=0.08288, ctc_loss=0.1659, cr_loss=0.4092, over 34473.00 frames. ], batch size: 85, lr: 1.24e-02, grad_scale: 32.0 2024-09-17 10:17:53,497 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 10:18:10,286 INFO [train.py:1230] (1/2) Epoch 10, validation: loss=0.1608, simple_loss=0.2599, pruned_loss=0.02556, ctc_loss=0.05286, cr_loss=1.599e-14, over 944034.00 frames. 2024-09-17 10:18:10,286 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 10:18:20,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=162806.0, ans=0.0 2024-09-17 10:18:27,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=162852.66666666666, ans=0.2 2024-09-17 10:18:35,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=162852.66666666666, ans=0.0 2024-09-17 10:18:50,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=162899.33333333334, ans=0.125 2024-09-17 10:18:53,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=12.0 2024-09-17 10:19:12,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=162946.0, ans=0.125 2024-09-17 10:19:21,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=162992.66666666666, ans=0.2 2024-09-17 10:19:21,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=162992.66666666666, ans=0.0 2024-09-17 10:19:34,797 INFO [train.py:1198] (1/2) Epoch 10, batch 50, loss[loss=0.241, simple_loss=0.2775, pruned_loss=0.07893, ctc_loss=0.1564, cr_loss=0.3836, over 34464.00 frames. ], tot_loss[loss=0.2719, simple_loss=0.3053, pruned_loss=0.09218, ctc_loss=0.1813, cr_loss=0.446, over 1480743.38 frames. ], batch size: 82, lr: 1.24e-02, grad_scale: 32.0 2024-09-17 10:19:35,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=163039.33333333334, ans=0.0 2024-09-17 10:19:37,440 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.61 vs. limit=22.5 2024-09-17 10:19:40,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=163039.33333333334, ans=0.0 2024-09-17 10:19:50,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=163086.0, ans=0.0 2024-09-17 10:20:16,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=163132.66666666666, ans=0.0 2024-09-17 10:20:31,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=163179.33333333334, ans=0.125 2024-09-17 10:20:46,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2024-09-17 10:20:52,063 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.090e+02 2.537e+02 3.055e+02 3.915e+02 7.541e+02, threshold=6.110e+02, percent-clipped=5.0 2024-09-17 10:20:52,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=163226.0, ans=0.125 2024-09-17 10:20:57,024 INFO [train.py:1198] (1/2) Epoch 10, batch 100, loss[loss=0.2445, simple_loss=0.2815, pruned_loss=0.07987, ctc_loss=0.1582, cr_loss=0.4053, over 34561.00 frames. ], tot_loss[loss=0.2738, simple_loss=0.3072, pruned_loss=0.09296, ctc_loss=0.1825, cr_loss=0.4486, over 2630177.34 frames. ], batch size: 89, lr: 1.24e-02, grad_scale: 32.0 2024-09-17 10:21:40,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=163366.0, ans=0.0 2024-09-17 10:21:40,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=163366.0, ans=0.125 2024-09-17 10:21:44,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.05 vs. limit=15.0 2024-09-17 10:21:53,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=163412.66666666666, ans=0.125 2024-09-17 10:22:04,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=163459.33333333334, ans=0.125 2024-09-17 10:22:15,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=163459.33333333334, ans=0.0 2024-09-17 10:22:20,233 INFO [train.py:1198] (1/2) Epoch 10, batch 150, loss[loss=0.2431, simple_loss=0.2778, pruned_loss=0.08018, ctc_loss=0.1577, cr_loss=0.4134, over 34475.00 frames. ], tot_loss[loss=0.27, simple_loss=0.3043, pruned_loss=0.09103, ctc_loss=0.1792, cr_loss=0.4447, over 3558493.80 frames. ], batch size: 82, lr: 1.24e-02, grad_scale: 32.0 2024-09-17 10:22:45,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=163552.66666666666, ans=0.125 2024-09-17 10:22:48,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.84 vs. limit=6.0 2024-09-17 10:23:02,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=163599.33333333334, ans=0.2 2024-09-17 10:23:02,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.49 vs. limit=10.0 2024-09-17 10:23:10,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=163646.0, ans=0.125 2024-09-17 10:23:26,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=163692.66666666666, ans=0.125 2024-09-17 10:23:39,322 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.534e+02 2.881e+02 3.727e+02 6.182e+02, threshold=5.762e+02, percent-clipped=2.0 2024-09-17 10:23:39,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=163692.66666666666, ans=0.0 2024-09-17 10:23:44,198 INFO [train.py:1198] (1/2) Epoch 10, batch 200, loss[loss=0.2965, simple_loss=0.3218, pruned_loss=0.1054, ctc_loss=0.2056, cr_loss=0.4818, over 31877.00 frames. ], tot_loss[loss=0.2681, simple_loss=0.3025, pruned_loss=0.09025, ctc_loss=0.1776, cr_loss=0.4426, over 4273107.24 frames. ], batch size: 145, lr: 1.24e-02, grad_scale: 32.0 2024-09-17 10:23:45,342 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.43 vs. limit=22.5 2024-09-17 10:24:02,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=163786.0, ans=0.125 2024-09-17 10:24:15,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=163832.66666666666, ans=0.0 2024-09-17 10:24:30,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=163832.66666666666, ans=0.0 2024-09-17 10:24:55,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=14.33 vs. limit=15.0 2024-09-17 10:25:08,240 INFO [train.py:1198] (1/2) Epoch 10, batch 250, loss[loss=0.2862, simple_loss=0.3203, pruned_loss=0.09736, ctc_loss=0.1932, cr_loss=0.467, over 34180.00 frames. ], tot_loss[loss=0.2675, simple_loss=0.3023, pruned_loss=0.08984, ctc_loss=0.1771, cr_loss=0.4417, over 4833238.84 frames. ], batch size: 117, lr: 1.24e-02, grad_scale: 32.0 2024-09-17 10:25:26,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=164019.33333333334, ans=0.125 2024-09-17 10:25:31,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=164019.33333333334, ans=0.1 2024-09-17 10:25:53,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.45 vs. limit=10.0 2024-09-17 10:25:54,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=164066.0, ans=0.1 2024-09-17 10:25:59,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=164112.66666666666, ans=0.0 2024-09-17 10:26:25,412 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.146e+02 2.691e+02 3.465e+02 4.661e+02 9.767e+02, threshold=6.930e+02, percent-clipped=13.0 2024-09-17 10:26:25,824 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.359e-02 2024-09-17 10:26:30,335 INFO [train.py:1198] (1/2) Epoch 10, batch 300, loss[loss=0.3094, simple_loss=0.3369, pruned_loss=0.1108, ctc_loss=0.2098, cr_loss=0.4613, over 34341.00 frames. ], tot_loss[loss=0.267, simple_loss=0.3017, pruned_loss=0.08966, ctc_loss=0.1767, cr_loss=0.4409, over 5263355.40 frames. ], batch size: 107, lr: 1.24e-02, grad_scale: 32.0 2024-09-17 10:26:43,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=164206.0, ans=0.035 2024-09-17 10:26:43,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164206.0, ans=0.1 2024-09-17 10:26:44,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=164206.0, ans=0.125 2024-09-17 10:26:52,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=164252.66666666666, ans=0.125 2024-09-17 10:27:16,730 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2024-09-17 10:27:34,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=164346.0, ans=0.2 2024-09-17 10:27:47,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164392.66666666666, ans=0.1 2024-09-17 10:27:55,590 INFO [train.py:1198] (1/2) Epoch 10, batch 350, loss[loss=0.245, simple_loss=0.2828, pruned_loss=0.07948, ctc_loss=0.1604, cr_loss=0.4027, over 34275.00 frames. ], tot_loss[loss=0.2677, simple_loss=0.3024, pruned_loss=0.08996, ctc_loss=0.1772, cr_loss=0.4435, over 5598200.15 frames. ], batch size: 83, lr: 1.24e-02, grad_scale: 32.0 2024-09-17 10:27:57,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=164439.33333333334, ans=0.0 2024-09-17 10:27:57,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=164439.33333333334, ans=0.125 2024-09-17 10:28:12,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=164486.0, ans=0.0 2024-09-17 10:28:22,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.62 vs. limit=6.0 2024-09-17 10:28:40,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=164532.66666666666, ans=0.125 2024-09-17 10:29:00,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=164579.33333333334, ans=0.125 2024-09-17 10:29:01,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.73 vs. limit=10.0 2024-09-17 10:29:15,979 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.124e+02 2.902e+02 3.466e+02 4.539e+02 6.725e+02, threshold=6.933e+02, percent-clipped=0.0 2024-09-17 10:29:19,272 INFO [train.py:1198] (1/2) Epoch 10, batch 400, loss[loss=0.272, simple_loss=0.3064, pruned_loss=0.09245, ctc_loss=0.1763, cr_loss=0.4347, over 34444.00 frames. ], tot_loss[loss=0.2667, simple_loss=0.3017, pruned_loss=0.08945, ctc_loss=0.1762, cr_loss=0.442, over 5864896.51 frames. ], batch size: 95, lr: 1.24e-02, grad_scale: 32.0 2024-09-17 10:29:19,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=164672.66666666666, ans=0.2 2024-09-17 10:29:44,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=164719.33333333334, ans=0.2 2024-09-17 10:29:45,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=164719.33333333334, ans=0.125 2024-09-17 10:30:09,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=164812.66666666666, ans=0.1 2024-09-17 10:30:39,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=164859.33333333334, ans=0.0 2024-09-17 10:30:43,946 INFO [train.py:1198] (1/2) Epoch 10, batch 450, loss[loss=0.277, simple_loss=0.3122, pruned_loss=0.09347, ctc_loss=0.1849, cr_loss=0.4441, over 34714.00 frames. ], tot_loss[loss=0.2673, simple_loss=0.3021, pruned_loss=0.0897, ctc_loss=0.1768, cr_loss=0.4432, over 6052656.18 frames. ], batch size: 97, lr: 1.23e-02, grad_scale: 32.0 2024-09-17 10:31:08,606 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.42 vs. limit=10.0 2024-09-17 10:31:14,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.40 vs. limit=15.0 2024-09-17 10:31:16,106 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:31:25,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=164999.33333333334, ans=0.125 2024-09-17 10:31:40,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=165046.0, ans=0.1 2024-09-17 10:32:03,306 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.087e+02 2.559e+02 2.905e+02 3.484e+02 6.146e+02, threshold=5.810e+02, percent-clipped=0.0 2024-09-17 10:32:06,582 INFO [train.py:1198] (1/2) Epoch 10, batch 500, loss[loss=0.2982, simple_loss=0.3341, pruned_loss=0.1018, ctc_loss=0.1978, cr_loss=0.4783, over 34464.00 frames. ], tot_loss[loss=0.2664, simple_loss=0.3012, pruned_loss=0.08934, ctc_loss=0.1761, cr_loss=0.4421, over 6217572.86 frames. ], batch size: 110, lr: 1.23e-02, grad_scale: 32.0 2024-09-17 10:32:08,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=165139.33333333334, ans=0.125 2024-09-17 10:32:38,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=165186.0, ans=0.1 2024-09-17 10:32:58,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=165279.33333333334, ans=0.125 2024-09-17 10:33:08,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=165279.33333333334, ans=0.05 2024-09-17 10:33:30,744 INFO [train.py:1198] (1/2) Epoch 10, batch 550, loss[loss=0.276, simple_loss=0.3087, pruned_loss=0.09428, ctc_loss=0.1867, cr_loss=0.4347, over 33725.00 frames. ], tot_loss[loss=0.2662, simple_loss=0.3012, pruned_loss=0.08924, ctc_loss=0.1758, cr_loss=0.442, over 6327804.79 frames. ], batch size: 122, lr: 1.23e-02, grad_scale: 16.0 2024-09-17 10:33:44,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-17 10:33:54,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=165419.33333333334, ans=0.125 2024-09-17 10:33:56,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.06 vs. limit=15.0 2024-09-17 10:33:58,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=165419.33333333334, ans=0.125 2024-09-17 10:34:02,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=165466.0, ans=0.0 2024-09-17 10:34:07,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.17 vs. limit=15.0 2024-09-17 10:34:08,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=165466.0, ans=0.2 2024-09-17 10:34:32,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=165512.66666666666, ans=0.0 2024-09-17 10:34:37,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=165559.33333333334, ans=0.2 2024-09-17 10:34:37,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=165559.33333333334, ans=0.0 2024-09-17 10:34:53,491 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.708e+02 3.255e+02 4.332e+02 1.611e+03, threshold=6.509e+02, percent-clipped=7.0 2024-09-17 10:34:55,171 INFO [train.py:1198] (1/2) Epoch 10, batch 600, loss[loss=0.2837, simple_loss=0.3219, pruned_loss=0.09471, ctc_loss=0.1884, cr_loss=0.4582, over 34168.00 frames. ], tot_loss[loss=0.2656, simple_loss=0.3007, pruned_loss=0.08886, ctc_loss=0.1751, cr_loss=0.4413, over 6430064.55 frames. ], batch size: 117, lr: 1.23e-02, grad_scale: 16.0 2024-09-17 10:35:15,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.03 vs. limit=15.0 2024-09-17 10:35:59,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=165792.66666666666, ans=0.125 2024-09-17 10:36:01,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=165792.66666666666, ans=0.0 2024-09-17 10:36:02,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2024-09-17 10:36:07,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=165792.66666666666, ans=0.0 2024-09-17 10:36:17,201 INFO [train.py:1198] (1/2) Epoch 10, batch 650, loss[loss=0.2525, simple_loss=0.2936, pruned_loss=0.08139, ctc_loss=0.1621, cr_loss=0.4076, over 34514.00 frames. ], tot_loss[loss=0.2646, simple_loss=0.3, pruned_loss=0.08836, ctc_loss=0.1742, cr_loss=0.44, over 6521906.67 frames. ], batch size: 94, lr: 1.23e-02, grad_scale: 16.0 2024-09-17 10:36:25,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=165839.33333333334, ans=0.1 2024-09-17 10:36:37,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=165886.0, ans=0.1 2024-09-17 10:36:39,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.27 vs. limit=15.0 2024-09-17 10:36:56,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=165932.66666666666, ans=0.125 2024-09-17 10:37:02,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=165932.66666666666, ans=0.125 2024-09-17 10:37:02,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-09-17 10:37:18,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=165979.33333333334, ans=0.025 2024-09-17 10:37:22,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=12.0 2024-09-17 10:37:39,161 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.039e+02 2.707e+02 3.331e+02 4.675e+02 7.681e+02, threshold=6.663e+02, percent-clipped=3.0 2024-09-17 10:37:40,867 INFO [train.py:1198] (1/2) Epoch 10, batch 700, loss[loss=0.2574, simple_loss=0.2906, pruned_loss=0.08647, ctc_loss=0.1705, cr_loss=0.4284, over 34602.00 frames. ], tot_loss[loss=0.2655, simple_loss=0.3007, pruned_loss=0.08884, ctc_loss=0.1751, cr_loss=0.4416, over 6577370.29 frames. ], batch size: 89, lr: 1.23e-02, grad_scale: 16.0 2024-09-17 10:37:52,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=166072.66666666666, ans=0.0 2024-09-17 10:38:15,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.80 vs. limit=22.5 2024-09-17 10:38:27,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.44 vs. limit=15.0 2024-09-17 10:38:33,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=166212.66666666666, ans=0.125 2024-09-17 10:38:51,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=166259.33333333334, ans=0.0 2024-09-17 10:39:05,779 INFO [train.py:1198] (1/2) Epoch 10, batch 750, loss[loss=0.2669, simple_loss=0.304, pruned_loss=0.08852, ctc_loss=0.1755, cr_loss=0.4413, over 34413.00 frames. ], tot_loss[loss=0.2652, simple_loss=0.3004, pruned_loss=0.08867, ctc_loss=0.1749, cr_loss=0.4407, over 6620532.01 frames. ], batch size: 95, lr: 1.23e-02, grad_scale: 16.0 2024-09-17 10:39:29,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=166352.66666666666, ans=0.125 2024-09-17 10:40:20,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=166492.66666666666, ans=0.0 2024-09-17 10:40:28,255 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.864e+02 3.610e+02 4.972e+02 7.755e+02, threshold=7.220e+02, percent-clipped=6.0 2024-09-17 10:40:29,886 INFO [train.py:1198] (1/2) Epoch 10, batch 800, loss[loss=0.2443, simple_loss=0.281, pruned_loss=0.07887, ctc_loss=0.1624, cr_loss=0.4316, over 34437.00 frames. ], tot_loss[loss=0.2652, simple_loss=0.3005, pruned_loss=0.08872, ctc_loss=0.1748, cr_loss=0.4404, over 6656682.32 frames. ], batch size: 85, lr: 1.23e-02, grad_scale: 32.0 2024-09-17 10:40:30,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=166539.33333333334, ans=0.04949747468305833 2024-09-17 10:40:35,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=166539.33333333334, ans=0.125 2024-09-17 10:40:46,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=166586.0, ans=0.0 2024-09-17 10:41:03,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=166632.66666666666, ans=0.0 2024-09-17 10:41:17,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=166679.33333333334, ans=0.1 2024-09-17 10:41:41,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.04 vs. limit=15.0 2024-09-17 10:41:41,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.94 vs. limit=22.5 2024-09-17 10:41:44,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=166726.0, ans=0.125 2024-09-17 10:41:48,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.83 vs. limit=15.0 2024-09-17 10:41:51,815 INFO [train.py:1198] (1/2) Epoch 10, batch 850, loss[loss=0.2778, simple_loss=0.318, pruned_loss=0.09153, ctc_loss=0.1791, cr_loss=0.4687, over 34386.00 frames. ], tot_loss[loss=0.2644, simple_loss=0.2999, pruned_loss=0.08821, ctc_loss=0.1739, cr_loss=0.4397, over 6690599.58 frames. ], batch size: 103, lr: 1.23e-02, grad_scale: 32.0 2024-09-17 10:41:55,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=166772.66666666666, ans=0.0 2024-09-17 10:42:00,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.57 vs. limit=15.0 2024-09-17 10:42:06,188 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.27 vs. limit=22.5 2024-09-17 10:42:10,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2024-09-17 10:42:28,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=166866.0, ans=0.025 2024-09-17 10:42:55,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=15.0 2024-09-17 10:43:00,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=166959.33333333334, ans=0.0 2024-09-17 10:43:03,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=166959.33333333334, ans=0.0 2024-09-17 10:43:16,088 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.548e+02 3.017e+02 4.139e+02 6.292e+02, threshold=6.034e+02, percent-clipped=0.0 2024-09-17 10:43:16,110 INFO [train.py:1198] (1/2) Epoch 10, batch 900, loss[loss=0.2343, simple_loss=0.2716, pruned_loss=0.0754, ctc_loss=0.149, cr_loss=0.4091, over 34534.00 frames. ], tot_loss[loss=0.265, simple_loss=0.3005, pruned_loss=0.08848, ctc_loss=0.1745, cr_loss=0.44, over 6695716.86 frames. ], batch size: 85, lr: 1.23e-02, grad_scale: 16.0 2024-09-17 10:43:25,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=167006.0, ans=0.0 2024-09-17 10:43:30,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=167052.66666666666, ans=0.125 2024-09-17 10:43:32,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.31 vs. limit=12.0 2024-09-17 10:44:09,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167146.0, ans=0.1 2024-09-17 10:44:15,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=167146.0, ans=0.125 2024-09-17 10:44:39,867 INFO [train.py:1198] (1/2) Epoch 10, batch 950, loss[loss=0.2406, simple_loss=0.2804, pruned_loss=0.07648, ctc_loss=0.1577, cr_loss=0.408, over 34693.00 frames. ], tot_loss[loss=0.2649, simple_loss=0.3004, pruned_loss=0.0885, ctc_loss=0.1744, cr_loss=0.4396, over 6701113.88 frames. ], batch size: 87, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 10:44:51,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=167239.33333333334, ans=0.0 2024-09-17 10:45:01,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=167286.0, ans=0.125 2024-09-17 10:45:19,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=167332.66666666666, ans=0.2 2024-09-17 10:45:28,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=167379.33333333334, ans=0.0 2024-09-17 10:45:36,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=167379.33333333334, ans=0.02 2024-09-17 10:45:36,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=167379.33333333334, ans=0.1 2024-09-17 10:45:37,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=167379.33333333334, ans=0.0 2024-09-17 10:45:49,949 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=19.35 vs. limit=15.0 2024-09-17 10:45:57,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167426.0, ans=0.1 2024-09-17 10:46:00,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=167426.0, ans=0.125 2024-09-17 10:46:03,846 INFO [train.py:1198] (1/2) Epoch 10, batch 1000, loss[loss=0.2543, simple_loss=0.2929, pruned_loss=0.0833, ctc_loss=0.1644, cr_loss=0.4072, over 34480.00 frames. ], tot_loss[loss=0.2657, simple_loss=0.3009, pruned_loss=0.08893, ctc_loss=0.175, cr_loss=0.44, over 6695043.98 frames. ], batch size: 90, lr: 1.23e-02, grad_scale: 8.0 2024-09-17 10:46:05,427 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.065e+02 2.667e+02 3.418e+02 4.325e+02 7.642e+02, threshold=6.835e+02, percent-clipped=7.0 2024-09-17 10:46:05,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167472.66666666666, ans=0.1 2024-09-17 10:46:20,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.91 vs. limit=22.5 2024-09-17 10:46:32,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=167519.33333333334, ans=0.0 2024-09-17 10:46:45,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=167566.0, ans=0.125 2024-09-17 10:46:47,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=167566.0, ans=0.0 2024-09-17 10:47:25,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=167706.0, ans=0.125 2024-09-17 10:47:25,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=167706.0, ans=0.125 2024-09-17 10:47:26,281 INFO [train.py:1198] (1/2) Epoch 10, batch 1050, loss[loss=0.2803, simple_loss=0.3208, pruned_loss=0.09245, ctc_loss=0.1838, cr_loss=0.454, over 34545.00 frames. ], tot_loss[loss=0.2648, simple_loss=0.3001, pruned_loss=0.08858, ctc_loss=0.1744, cr_loss=0.4391, over 6704017.88 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 10:47:41,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=167706.0, ans=0.125 2024-09-17 10:47:56,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=167752.66666666666, ans=0.1 2024-09-17 10:48:09,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=167799.33333333334, ans=0.125 2024-09-17 10:48:35,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=167892.66666666666, ans=0.125 2024-09-17 10:48:50,181 INFO [train.py:1198] (1/2) Epoch 10, batch 1100, loss[loss=0.2574, simple_loss=0.2926, pruned_loss=0.08483, ctc_loss=0.1706, cr_loss=0.4586, over 34340.00 frames. ], tot_loss[loss=0.2649, simple_loss=0.3, pruned_loss=0.08864, ctc_loss=0.1746, cr_loss=0.4394, over 6717107.88 frames. ], batch size: 91, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 10:48:51,759 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.476e+02 2.978e+02 3.598e+02 7.289e+02, threshold=5.956e+02, percent-clipped=1.0 2024-09-17 10:48:58,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=167939.33333333334, ans=0.0 2024-09-17 10:49:07,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=167986.0, ans=0.0 2024-09-17 10:49:26,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=167986.0, ans=0.0 2024-09-17 10:49:36,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=168032.66666666666, ans=0.0 2024-09-17 10:49:41,917 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 10:49:48,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=168079.33333333334, ans=0.1 2024-09-17 10:49:59,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=168079.33333333334, ans=0.125 2024-09-17 10:50:21,104 INFO [train.py:1198] (1/2) Epoch 10, batch 1150, loss[loss=0.2567, simple_loss=0.2958, pruned_loss=0.08354, ctc_loss=0.1683, cr_loss=0.4236, over 34339.00 frames. ], tot_loss[loss=0.265, simple_loss=0.3, pruned_loss=0.08874, ctc_loss=0.1749, cr_loss=0.4399, over 6715639.53 frames. ], batch size: 91, lr: 1.22e-02, grad_scale: 8.0 2024-09-17 10:50:59,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.36 vs. limit=22.5 2024-09-17 10:51:19,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=168312.66666666666, ans=0.025 2024-09-17 10:51:33,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-09-17 10:51:45,227 INFO [train.py:1198] (1/2) Epoch 10, batch 1200, loss[loss=0.2736, simple_loss=0.3094, pruned_loss=0.09193, ctc_loss=0.1799, cr_loss=0.4487, over 34574.00 frames. ], tot_loss[loss=0.2665, simple_loss=0.3013, pruned_loss=0.08937, ctc_loss=0.1761, cr_loss=0.4414, over 6708149.41 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 16.0 2024-09-17 10:51:46,854 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.096e+02 2.638e+02 3.247e+02 4.100e+02 6.552e+02, threshold=6.494e+02, percent-clipped=3.0 2024-09-17 10:51:47,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=168406.0, ans=0.125 2024-09-17 10:52:02,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=168452.66666666666, ans=0.125 2024-09-17 10:52:15,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=168452.66666666666, ans=0.025 2024-09-17 10:52:17,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.15 vs. limit=15.0 2024-09-17 10:52:20,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=168499.33333333334, ans=0.0 2024-09-17 10:52:40,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=168546.0, ans=0.07 2024-09-17 10:52:41,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=168546.0, ans=0.1 2024-09-17 10:52:54,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=168592.66666666666, ans=0.2 2024-09-17 10:52:54,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=168592.66666666666, ans=0.1 2024-09-17 10:53:07,538 INFO [train.py:1198] (1/2) Epoch 10, batch 1250, loss[loss=0.3051, simple_loss=0.3309, pruned_loss=0.1088, ctc_loss=0.2068, cr_loss=0.5095, over 34326.00 frames. ], tot_loss[loss=0.2667, simple_loss=0.3017, pruned_loss=0.08934, ctc_loss=0.176, cr_loss=0.4426, over 6742041.16 frames. ], batch size: 107, lr: 1.22e-02, grad_scale: 16.0 2024-09-17 10:53:20,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=15.0 2024-09-17 10:53:27,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=168686.0, ans=0.0 2024-09-17 10:54:01,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=168779.33333333334, ans=0.0 2024-09-17 10:54:11,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=168779.33333333334, ans=0.2 2024-09-17 10:54:12,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=168779.33333333334, ans=0.1 2024-09-17 10:54:26,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=168826.0, ans=0.0 2024-09-17 10:54:31,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.58 vs. limit=15.0 2024-09-17 10:54:32,645 INFO [train.py:1198] (1/2) Epoch 10, batch 1300, loss[loss=0.2791, simple_loss=0.3161, pruned_loss=0.09358, ctc_loss=0.1856, cr_loss=0.4433, over 33112.00 frames. ], tot_loss[loss=0.2656, simple_loss=0.3008, pruned_loss=0.08883, ctc_loss=0.175, cr_loss=0.441, over 6744670.22 frames. ], batch size: 130, lr: 1.22e-02, grad_scale: 16.0 2024-09-17 10:54:34,265 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.109e+02 2.739e+02 3.228e+02 4.005e+02 7.856e+02, threshold=6.457e+02, percent-clipped=2.0 2024-09-17 10:55:18,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=168966.0, ans=0.125 2024-09-17 10:55:18,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2024-09-17 10:55:34,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=169012.66666666666, ans=0.5 2024-09-17 10:55:36,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=169012.66666666666, ans=0.125 2024-09-17 10:55:38,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=169012.66666666666, ans=0.125 2024-09-17 10:55:53,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=169059.33333333334, ans=0.125 2024-09-17 10:55:56,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=169106.0, ans=0.125 2024-09-17 10:55:57,726 INFO [train.py:1198] (1/2) Epoch 10, batch 1350, loss[loss=0.2733, simple_loss=0.3121, pruned_loss=0.09047, ctc_loss=0.1796, cr_loss=0.4408, over 34531.00 frames. ], tot_loss[loss=0.2648, simple_loss=0.3001, pruned_loss=0.08849, ctc_loss=0.1746, cr_loss=0.4404, over 6766328.87 frames. ], batch size: 94, lr: 1.22e-02, grad_scale: 16.0 2024-09-17 10:56:00,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.62 vs. limit=15.0 2024-09-17 10:56:11,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.68 vs. limit=12.0 2024-09-17 10:56:23,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=169152.66666666666, ans=0.1 2024-09-17 10:56:35,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=169199.33333333334, ans=0.125 2024-09-17 10:56:35,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=169199.33333333334, ans=0.0 2024-09-17 10:56:35,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-09-17 10:56:55,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=169246.0, ans=0.125 2024-09-17 10:57:03,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.93 vs. limit=15.0 2024-09-17 10:57:06,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=169292.66666666666, ans=0.125 2024-09-17 10:57:21,539 INFO [train.py:1198] (1/2) Epoch 10, batch 1400, loss[loss=0.2244, simple_loss=0.2643, pruned_loss=0.06956, ctc_loss=0.1446, cr_loss=0.4126, over 34310.00 frames. ], tot_loss[loss=0.2649, simple_loss=0.3002, pruned_loss=0.08851, ctc_loss=0.1746, cr_loss=0.4405, over 6778394.31 frames. ], batch size: 80, lr: 1.22e-02, grad_scale: 16.0 2024-09-17 10:57:23,167 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.207e+02 2.613e+02 3.034e+02 3.864e+02 6.402e+02, threshold=6.068e+02, percent-clipped=0.0 2024-09-17 10:57:38,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=169386.0, ans=0.125 2024-09-17 10:58:06,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=169432.66666666666, ans=0.125 2024-09-17 10:58:08,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=169432.66666666666, ans=0.5 2024-09-17 10:58:12,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.06 vs. limit=15.0 2024-09-17 10:58:43,924 INFO [train.py:1198] (1/2) Epoch 10, batch 1450, loss[loss=0.2988, simple_loss=0.3276, pruned_loss=0.1049, ctc_loss=0.2069, cr_loss=0.4702, over 34433.00 frames. ], tot_loss[loss=0.2656, simple_loss=0.3009, pruned_loss=0.08882, ctc_loss=0.1753, cr_loss=0.4416, over 6776580.24 frames. ], batch size: 110, lr: 1.22e-02, grad_scale: 16.0 2024-09-17 10:58:55,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=169572.66666666666, ans=0.0 2024-09-17 10:58:55,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=169572.66666666666, ans=0.125 2024-09-17 10:58:55,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=169572.66666666666, ans=0.05 2024-09-17 10:59:08,001 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.89 vs. limit=10.0 2024-09-17 10:59:15,877 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=15.0 2024-09-17 10:59:24,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.38 vs. limit=22.5 2024-09-17 10:59:29,161 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2024-09-17 10:59:53,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=169759.33333333334, ans=0.05 2024-09-17 11:00:00,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2024-09-17 11:00:01,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=169759.33333333334, ans=0.125 2024-09-17 11:00:03,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=169759.33333333334, ans=0.05 2024-09-17 11:00:07,742 INFO [train.py:1198] (1/2) Epoch 10, batch 1500, loss[loss=0.2794, simple_loss=0.3155, pruned_loss=0.09378, ctc_loss=0.1813, cr_loss=0.4855, over 34446.00 frames. ], tot_loss[loss=0.2658, simple_loss=0.3012, pruned_loss=0.08886, ctc_loss=0.1753, cr_loss=0.4416, over 6776629.12 frames. ], batch size: 100, lr: 1.22e-02, grad_scale: 16.0 2024-09-17 11:00:09,392 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.152e+02 2.619e+02 3.135e+02 3.725e+02 6.330e+02, threshold=6.271e+02, percent-clipped=3.0 2024-09-17 11:00:38,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=169852.66666666666, ans=0.0 2024-09-17 11:00:58,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=169946.0, ans=0.2 2024-09-17 11:01:05,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.60 vs. limit=15.0 2024-09-17 11:01:20,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=169992.66666666666, ans=0.125 2024-09-17 11:01:21,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=169992.66666666666, ans=0.2 2024-09-17 11:01:32,735 INFO [train.py:1198] (1/2) Epoch 10, batch 1550, loss[loss=0.3023, simple_loss=0.3282, pruned_loss=0.1077, ctc_loss=0.2072, cr_loss=0.4905, over 34420.00 frames. ], tot_loss[loss=0.2662, simple_loss=0.3011, pruned_loss=0.08922, ctc_loss=0.1759, cr_loss=0.442, over 6747883.06 frames. ], batch size: 105, lr: 1.22e-02, grad_scale: 16.0 2024-09-17 11:01:42,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=170039.33333333334, ans=0.0 2024-09-17 11:01:53,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=170086.0, ans=0.1 2024-09-17 11:01:53,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=170086.0, ans=0.125 2024-09-17 11:02:33,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=170179.33333333334, ans=0.0 2024-09-17 11:02:49,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=8.84 vs. limit=15.0 2024-09-17 11:02:57,076 INFO [train.py:1198] (1/2) Epoch 10, batch 1600, loss[loss=0.2779, simple_loss=0.3103, pruned_loss=0.09532, ctc_loss=0.1849, cr_loss=0.4449, over 34573.00 frames. ], tot_loss[loss=0.2663, simple_loss=0.301, pruned_loss=0.08934, ctc_loss=0.176, cr_loss=0.4417, over 6726671.83 frames. ], batch size: 99, lr: 1.22e-02, grad_scale: 32.0 2024-09-17 11:02:58,639 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.618e+02 3.068e+02 3.599e+02 5.850e+02, threshold=6.136e+02, percent-clipped=0.0 2024-09-17 11:03:05,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=170272.66666666666, ans=0.125 2024-09-17 11:03:07,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=170272.66666666666, ans=0.025 2024-09-17 11:03:08,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=170272.66666666666, ans=0.125 2024-09-17 11:03:15,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=170319.33333333334, ans=0.5 2024-09-17 11:04:06,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=170459.33333333334, ans=0.125 2024-09-17 11:04:09,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=170459.33333333334, ans=0.0 2024-09-17 11:04:14,884 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:04:19,244 INFO [train.py:1198] (1/2) Epoch 10, batch 1650, loss[loss=0.2754, simple_loss=0.3153, pruned_loss=0.09122, ctc_loss=0.1759, cr_loss=0.446, over 34384.00 frames. ], tot_loss[loss=0.2659, simple_loss=0.3008, pruned_loss=0.08908, ctc_loss=0.1757, cr_loss=0.441, over 6719048.62 frames. ], batch size: 103, lr: 1.21e-02, grad_scale: 32.0 2024-09-17 11:04:44,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=170552.66666666666, ans=0.125 2024-09-17 11:04:56,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=170599.33333333334, ans=0.1 2024-09-17 11:04:56,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=170599.33333333334, ans=0.125 2024-09-17 11:05:23,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=170646.0, ans=0.125 2024-09-17 11:05:34,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=170692.66666666666, ans=0.125 2024-09-17 11:05:36,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170692.66666666666, ans=0.1 2024-09-17 11:05:44,029 INFO [train.py:1198] (1/2) Epoch 10, batch 1700, loss[loss=0.2322, simple_loss=0.2672, pruned_loss=0.07542, ctc_loss=0.1531, cr_loss=0.3944, over 34269.00 frames. ], tot_loss[loss=0.2659, simple_loss=0.301, pruned_loss=0.089, ctc_loss=0.1755, cr_loss=0.4414, over 6744025.59 frames. ], batch size: 80, lr: 1.21e-02, grad_scale: 32.0 2024-09-17 11:05:45,628 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.435e+02 2.850e+02 3.684e+02 6.642e+02, threshold=5.701e+02, percent-clipped=1.0 2024-09-17 11:05:50,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=170739.33333333334, ans=0.125 2024-09-17 11:05:52,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=170739.33333333334, ans=0.0 2024-09-17 11:06:09,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=170786.0, ans=0.125 2024-09-17 11:06:10,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=170786.0, ans=0.2 2024-09-17 11:06:14,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.90 vs. limit=10.0 2024-09-17 11:06:18,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2024-09-17 11:06:30,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=170832.66666666666, ans=0.2 2024-09-17 11:06:40,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-09-17 11:06:42,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=170879.33333333334, ans=0.125 2024-09-17 11:06:46,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=170879.33333333334, ans=0.125 2024-09-17 11:06:50,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=170926.0, ans=0.1 2024-09-17 11:06:52,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=170926.0, ans=0.2 2024-09-17 11:07:06,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=170972.66666666666, ans=0.0 2024-09-17 11:07:08,056 INFO [train.py:1198] (1/2) Epoch 10, batch 1750, loss[loss=0.2326, simple_loss=0.27, pruned_loss=0.07404, ctc_loss=0.1527, cr_loss=0.4125, over 34152.00 frames. ], tot_loss[loss=0.2655, simple_loss=0.3005, pruned_loss=0.08889, ctc_loss=0.1752, cr_loss=0.441, over 6752869.88 frames. ], batch size: 78, lr: 1.21e-02, grad_scale: 32.0 2024-09-17 11:07:44,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=171066.0, ans=0.0 2024-09-17 11:08:12,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=171159.33333333334, ans=0.02 2024-09-17 11:08:17,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=171159.33333333334, ans=0.125 2024-09-17 11:08:28,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=171206.0, ans=0.125 2024-09-17 11:08:30,084 INFO [train.py:1198] (1/2) Epoch 10, batch 1800, loss[loss=0.2648, simple_loss=0.3055, pruned_loss=0.08604, ctc_loss=0.1727, cr_loss=0.4371, over 34687.00 frames. ], tot_loss[loss=0.2653, simple_loss=0.3005, pruned_loss=0.08868, ctc_loss=0.1751, cr_loss=0.4406, over 6756534.98 frames. ], batch size: 97, lr: 1.21e-02, grad_scale: 32.0 2024-09-17 11:08:31,691 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.754e+02 3.684e+02 4.882e+02 9.651e+02, threshold=7.369e+02, percent-clipped=15.0 2024-09-17 11:08:58,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=171252.66666666666, ans=0.2 2024-09-17 11:09:18,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=171299.33333333334, ans=0.125 2024-09-17 11:09:28,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=171346.0, ans=0.09899494936611666 2024-09-17 11:09:54,798 INFO [train.py:1198] (1/2) Epoch 10, batch 1850, loss[loss=0.2786, simple_loss=0.3161, pruned_loss=0.0937, ctc_loss=0.1796, cr_loss=0.4435, over 34452.00 frames. ], tot_loss[loss=0.2651, simple_loss=0.3005, pruned_loss=0.08858, ctc_loss=0.1748, cr_loss=0.441, over 6762112.46 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 32.0 2024-09-17 11:10:13,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=171486.0, ans=0.5 2024-09-17 11:10:29,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=171532.66666666666, ans=0.2 2024-09-17 11:10:44,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=171579.33333333334, ans=0.125 2024-09-17 11:10:57,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=171579.33333333334, ans=0.125 2024-09-17 11:11:15,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=171626.0, ans=0.125 2024-09-17 11:11:16,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.73 vs. limit=6.0 2024-09-17 11:11:18,651 INFO [train.py:1198] (1/2) Epoch 10, batch 1900, loss[loss=0.2638, simple_loss=0.3036, pruned_loss=0.08553, ctc_loss=0.1738, cr_loss=0.4553, over 34404.00 frames. ], tot_loss[loss=0.2655, simple_loss=0.3011, pruned_loss=0.08864, ctc_loss=0.1751, cr_loss=0.442, over 6771516.79 frames. ], batch size: 103, lr: 1.21e-02, grad_scale: 32.0 2024-09-17 11:11:20,266 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.057e+02 2.724e+02 3.328e+02 4.572e+02 7.633e+02, threshold=6.657e+02, percent-clipped=2.0 2024-09-17 11:11:25,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=171672.66666666666, ans=0.125 2024-09-17 11:12:06,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2024-09-17 11:12:06,584 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.63 vs. limit=15.0 2024-09-17 11:12:14,038 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:12:15,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=171812.66666666666, ans=0.2 2024-09-17 11:12:39,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.22 vs. limit=10.0 2024-09-17 11:12:42,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=171906.0, ans=0.125 2024-09-17 11:12:43,639 INFO [train.py:1198] (1/2) Epoch 10, batch 1950, loss[loss=0.2595, simple_loss=0.2942, pruned_loss=0.08656, ctc_loss=0.1692, cr_loss=0.448, over 34349.00 frames. ], tot_loss[loss=0.2665, simple_loss=0.302, pruned_loss=0.08904, ctc_loss=0.1757, cr_loss=0.4435, over 6788790.21 frames. ], batch size: 91, lr: 1.21e-02, grad_scale: 32.0 2024-09-17 11:12:54,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=171906.0, ans=0.125 2024-09-17 11:12:56,439 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=22.5 2024-09-17 11:13:00,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=171952.66666666666, ans=0.0 2024-09-17 11:13:00,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=15.0 2024-09-17 11:13:16,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=171999.33333333334, ans=0.1 2024-09-17 11:13:22,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.00 vs. limit=22.5 2024-09-17 11:13:33,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=172046.0, ans=0.1 2024-09-17 11:13:53,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=172092.66666666666, ans=0.125 2024-09-17 11:14:07,824 INFO [train.py:1198] (1/2) Epoch 10, batch 2000, loss[loss=0.2187, simple_loss=0.2566, pruned_loss=0.06875, ctc_loss=0.1417, cr_loss=0.3737, over 34144.00 frames. ], tot_loss[loss=0.2671, simple_loss=0.3025, pruned_loss=0.08935, ctc_loss=0.1763, cr_loss=0.4437, over 6764106.39 frames. ], batch size: 78, lr: 1.21e-02, grad_scale: 32.0 2024-09-17 11:14:08,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=172139.33333333334, ans=0.09899494936611666 2024-09-17 11:14:11,163 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.640e+02 3.035e+02 3.836e+02 6.280e+02, threshold=6.070e+02, percent-clipped=0.0 2024-09-17 11:14:13,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=172139.33333333334, ans=0.0 2024-09-17 11:14:40,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.68 vs. limit=15.0 2024-09-17 11:14:43,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.39 vs. limit=10.0 2024-09-17 11:14:54,597 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:14:56,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.67 vs. limit=15.0 2024-09-17 11:15:11,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.29 vs. limit=22.5 2024-09-17 11:15:20,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=172326.0, ans=0.1 2024-09-17 11:15:24,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=172326.0, ans=0.125 2024-09-17 11:15:30,288 INFO [train.py:1198] (1/2) Epoch 10, batch 2050, loss[loss=0.2313, simple_loss=0.2676, pruned_loss=0.07465, ctc_loss=0.1517, cr_loss=0.3848, over 34502.00 frames. ], tot_loss[loss=0.2658, simple_loss=0.3012, pruned_loss=0.08884, ctc_loss=0.1753, cr_loss=0.4417, over 6754344.96 frames. ], batch size: 82, lr: 1.21e-02, grad_scale: 32.0 2024-09-17 11:15:57,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=172419.33333333334, ans=0.125 2024-09-17 11:16:33,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=172512.66666666666, ans=0.025 2024-09-17 11:16:54,652 INFO [train.py:1198] (1/2) Epoch 10, batch 2100, loss[loss=0.2576, simple_loss=0.2954, pruned_loss=0.08458, ctc_loss=0.1679, cr_loss=0.4233, over 34531.00 frames. ], tot_loss[loss=0.2645, simple_loss=0.3001, pruned_loss=0.08821, ctc_loss=0.1742, cr_loss=0.4404, over 6768677.46 frames. ], batch size: 94, lr: 1.21e-02, grad_scale: 32.0 2024-09-17 11:16:57,999 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 2.558e+02 2.946e+02 3.683e+02 6.599e+02, threshold=5.891e+02, percent-clipped=2.0 2024-09-17 11:17:18,882 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=12.0 2024-09-17 11:17:29,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=172699.33333333334, ans=15.0 2024-09-17 11:17:38,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=172699.33333333334, ans=10.0 2024-09-17 11:17:46,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=172746.0, ans=0.2 2024-09-17 11:17:54,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=172746.0, ans=0.0 2024-09-17 11:17:54,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.62 vs. limit=10.0 2024-09-17 11:17:56,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=172746.0, ans=0.0 2024-09-17 11:17:56,272 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:18:11,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.44 vs. limit=15.0 2024-09-17 11:18:17,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=172839.33333333334, ans=0.025 2024-09-17 11:18:19,005 INFO [train.py:1198] (1/2) Epoch 10, batch 2150, loss[loss=0.2724, simple_loss=0.302, pruned_loss=0.09348, ctc_loss=0.1859, cr_loss=0.4672, over 34319.00 frames. ], tot_loss[loss=0.2636, simple_loss=0.2994, pruned_loss=0.08771, ctc_loss=0.1733, cr_loss=0.4398, over 6786476.66 frames. ], batch size: 91, lr: 1.21e-02, grad_scale: 32.0 2024-09-17 11:18:21,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=172839.33333333334, ans=0.125 2024-09-17 11:18:34,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=172886.0, ans=0.0 2024-09-17 11:18:40,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=172886.0, ans=0.2 2024-09-17 11:19:11,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.21 vs. limit=22.5 2024-09-17 11:19:17,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=172979.33333333334, ans=0.0 2024-09-17 11:19:17,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=172979.33333333334, ans=0.0 2024-09-17 11:19:22,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=172979.33333333334, ans=0.0 2024-09-17 11:19:30,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=173026.0, ans=0.05 2024-09-17 11:19:32,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=173026.0, ans=0.1 2024-09-17 11:19:36,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=15.0 2024-09-17 11:19:41,409 INFO [train.py:1198] (1/2) Epoch 10, batch 2200, loss[loss=0.2806, simple_loss=0.3219, pruned_loss=0.09199, ctc_loss=0.182, cr_loss=0.4752, over 34455.00 frames. ], tot_loss[loss=0.2637, simple_loss=0.2994, pruned_loss=0.0878, ctc_loss=0.1734, cr_loss=0.44, over 6780473.10 frames. ], batch size: 100, lr: 1.21e-02, grad_scale: 32.0 2024-09-17 11:19:44,844 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.821e+02 3.439e+02 4.977e+02 9.853e+02, threshold=6.878e+02, percent-clipped=15.0 2024-09-17 11:20:00,485 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:20:22,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=173166.0, ans=0.0 2024-09-17 11:20:29,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.33 vs. limit=22.5 2024-09-17 11:20:31,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=173212.66666666666, ans=0.125 2024-09-17 11:20:44,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.89 vs. limit=15.0 2024-09-17 11:20:55,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=173259.33333333334, ans=0.0 2024-09-17 11:21:06,482 INFO [train.py:1198] (1/2) Epoch 10, batch 2250, loss[loss=0.2738, simple_loss=0.3094, pruned_loss=0.09232, ctc_loss=0.1792, cr_loss=0.4411, over 34421.00 frames. ], tot_loss[loss=0.2636, simple_loss=0.2994, pruned_loss=0.08779, ctc_loss=0.1734, cr_loss=0.4402, over 6776992.39 frames. ], batch size: 95, lr: 1.21e-02, grad_scale: 32.0 2024-09-17 11:21:17,775 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=22.5 2024-09-17 11:21:28,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=173352.66666666666, ans=0.5 2024-09-17 11:22:17,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=173492.66666666666, ans=0.1 2024-09-17 11:22:23,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=173492.66666666666, ans=0.2 2024-09-17 11:22:30,164 INFO [train.py:1198] (1/2) Epoch 10, batch 2300, loss[loss=0.2331, simple_loss=0.2727, pruned_loss=0.07424, ctc_loss=0.1489, cr_loss=0.3825, over 34239.00 frames. ], tot_loss[loss=0.2623, simple_loss=0.2981, pruned_loss=0.08725, ctc_loss=0.1724, cr_loss=0.4377, over 6763902.17 frames. ], batch size: 83, lr: 1.20e-02, grad_scale: 32.0 2024-09-17 11:22:33,381 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.537e+02 3.087e+02 3.949e+02 7.870e+02, threshold=6.173e+02, percent-clipped=3.0 2024-09-17 11:22:42,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=173539.33333333334, ans=0.0 2024-09-17 11:22:58,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=173586.0, ans=0.125 2024-09-17 11:23:12,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=173632.66666666666, ans=0.2 2024-09-17 11:23:36,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=173726.0, ans=0.0 2024-09-17 11:23:40,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=173726.0, ans=0.125 2024-09-17 11:23:52,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.19 vs. limit=15.0 2024-09-17 11:23:53,041 INFO [train.py:1198] (1/2) Epoch 10, batch 2350, loss[loss=0.2724, simple_loss=0.3101, pruned_loss=0.0904, ctc_loss=0.1787, cr_loss=0.4517, over 34691.00 frames. ], tot_loss[loss=0.2625, simple_loss=0.2983, pruned_loss=0.08731, ctc_loss=0.1724, cr_loss=0.4379, over 6771087.13 frames. ], batch size: 97, lr: 1.20e-02, grad_scale: 16.0 2024-09-17 11:23:57,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=173772.66666666666, ans=0.2 2024-09-17 11:24:10,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=173819.33333333334, ans=0.1 2024-09-17 11:24:13,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=173819.33333333334, ans=0.1 2024-09-17 11:24:23,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=173819.33333333334, ans=0.2 2024-09-17 11:24:34,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=173866.0, ans=0.1 2024-09-17 11:24:50,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2024-09-17 11:25:03,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=173959.33333333334, ans=0.125 2024-09-17 11:25:04,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=173959.33333333334, ans=0.09899494936611666 2024-09-17 11:25:16,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=173959.33333333334, ans=0.0 2024-09-17 11:25:17,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.47 vs. limit=12.0 2024-09-17 11:25:19,364 INFO [train.py:1198] (1/2) Epoch 10, batch 2400, loss[loss=0.2464, simple_loss=0.2855, pruned_loss=0.0793, ctc_loss=0.1576, cr_loss=0.4289, over 34567.00 frames. ], tot_loss[loss=0.2625, simple_loss=0.2984, pruned_loss=0.08724, ctc_loss=0.1722, cr_loss=0.4381, over 6776110.20 frames. ], batch size: 89, lr: 1.20e-02, grad_scale: 32.0 2024-09-17 11:25:24,215 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.121e+02 2.563e+02 3.073e+02 4.352e+02 8.139e+02, threshold=6.145e+02, percent-clipped=9.0 2024-09-17 11:25:27,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=174006.0, ans=0.2 2024-09-17 11:25:34,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=174052.66666666666, ans=0.125 2024-09-17 11:25:46,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=174052.66666666666, ans=0.125 2024-09-17 11:25:56,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.77 vs. limit=22.5 2024-09-17 11:26:17,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=174146.0, ans=0.0 2024-09-17 11:26:27,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=174192.66666666666, ans=0.1 2024-09-17 11:26:42,198 INFO [train.py:1198] (1/2) Epoch 10, batch 2450, loss[loss=0.2667, simple_loss=0.3097, pruned_loss=0.08589, ctc_loss=0.1719, cr_loss=0.4365, over 34413.00 frames. ], tot_loss[loss=0.2643, simple_loss=0.2999, pruned_loss=0.08811, ctc_loss=0.1738, cr_loss=0.4403, over 6750360.58 frames. ], batch size: 95, lr: 1.20e-02, grad_scale: 32.0 2024-09-17 11:26:54,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=174239.33333333334, ans=0.0 2024-09-17 11:27:07,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=174286.0, ans=0.125 2024-09-17 11:27:12,775 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=25.13 vs. limit=22.5 2024-09-17 11:27:28,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=174332.66666666666, ans=0.0 2024-09-17 11:27:57,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=174426.0, ans=0.125 2024-09-17 11:28:01,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.19 vs. limit=22.5 2024-09-17 11:28:05,400 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:28:06,726 INFO [train.py:1198] (1/2) Epoch 10, batch 2500, loss[loss=0.2836, simple_loss=0.3177, pruned_loss=0.0965, ctc_loss=0.1875, cr_loss=0.4732, over 34453.00 frames. ], tot_loss[loss=0.2649, simple_loss=0.3004, pruned_loss=0.08843, ctc_loss=0.1744, cr_loss=0.4413, over 6761994.59 frames. ], batch size: 100, lr: 1.20e-02, grad_scale: 32.0 2024-09-17 11:28:11,665 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.656e+02 3.454e+02 4.614e+02 6.443e+02, threshold=6.907e+02, percent-clipped=7.0 2024-09-17 11:28:12,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.69 vs. limit=10.0 2024-09-17 11:28:52,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=174566.0, ans=0.0 2024-09-17 11:29:00,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=174612.66666666666, ans=0.2 2024-09-17 11:29:31,706 INFO [train.py:1198] (1/2) Epoch 10, batch 2550, loss[loss=0.2301, simple_loss=0.2622, pruned_loss=0.07548, ctc_loss=0.152, cr_loss=0.4142, over 34129.00 frames. ], tot_loss[loss=0.2643, simple_loss=0.3, pruned_loss=0.08811, ctc_loss=0.1739, cr_loss=0.4412, over 6765553.59 frames. ], batch size: 78, lr: 1.20e-02, grad_scale: 32.0 2024-09-17 11:29:35,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=174706.0, ans=0.2 2024-09-17 11:29:41,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=174706.0, ans=0.125 2024-09-17 11:29:43,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=174706.0, ans=0.125 2024-09-17 11:30:06,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=174799.33333333334, ans=0.0 2024-09-17 11:30:14,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=174799.33333333334, ans=0.125 2024-09-17 11:30:22,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=174846.0, ans=0.1 2024-09-17 11:30:24,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=174846.0, ans=0.2 2024-09-17 11:30:53,844 INFO [train.py:1198] (1/2) Epoch 10, batch 2600, loss[loss=0.2556, simple_loss=0.292, pruned_loss=0.08454, ctc_loss=0.1652, cr_loss=0.4255, over 34363.00 frames. ], tot_loss[loss=0.2645, simple_loss=0.3002, pruned_loss=0.08818, ctc_loss=0.1741, cr_loss=0.4417, over 6761225.67 frames. ], batch size: 91, lr: 1.20e-02, grad_scale: 32.0 2024-09-17 11:30:58,751 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.660e+02 2.967e+02 3.897e+02 7.994e+02, threshold=5.933e+02, percent-clipped=4.0 2024-09-17 11:31:07,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=174939.33333333334, ans=0.1 2024-09-17 11:31:17,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=174986.0, ans=0.125 2024-09-17 11:31:25,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2024-09-17 11:32:04,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.02 vs. limit=22.5 2024-09-17 11:32:05,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=175126.0, ans=0.125 2024-09-17 11:32:17,971 INFO [train.py:1198] (1/2) Epoch 10, batch 2650, loss[loss=0.271, simple_loss=0.31, pruned_loss=0.08898, ctc_loss=0.1802, cr_loss=0.4484, over 34269.00 frames. ], tot_loss[loss=0.2644, simple_loss=0.3001, pruned_loss=0.0881, ctc_loss=0.174, cr_loss=0.442, over 6768297.48 frames. ], batch size: 117, lr: 1.20e-02, grad_scale: 32.0 2024-09-17 11:32:21,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=175172.66666666666, ans=0.0 2024-09-17 11:32:34,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=175219.33333333334, ans=0.5 2024-09-17 11:32:44,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=175219.33333333334, ans=0.125 2024-09-17 11:32:56,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.64 vs. limit=12.0 2024-09-17 11:32:59,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=175266.0, ans=0.0 2024-09-17 11:33:01,295 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=15.0 2024-09-17 11:33:15,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=175312.66666666666, ans=0.2 2024-09-17 11:33:38,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=175359.33333333334, ans=0.125 2024-09-17 11:33:41,730 INFO [train.py:1198] (1/2) Epoch 10, batch 2700, loss[loss=0.2619, simple_loss=0.3002, pruned_loss=0.08591, ctc_loss=0.1705, cr_loss=0.4417, over 34587.00 frames. ], tot_loss[loss=0.2651, simple_loss=0.3007, pruned_loss=0.08842, ctc_loss=0.1744, cr_loss=0.4424, over 6763314.21 frames. ], batch size: 102, lr: 1.20e-02, grad_scale: 32.0 2024-09-17 11:33:46,714 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.088e+02 2.604e+02 2.904e+02 3.565e+02 6.949e+02, threshold=5.809e+02, percent-clipped=2.0 2024-09-17 11:34:56,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=175592.66666666666, ans=0.0 2024-09-17 11:35:04,828 INFO [train.py:1198] (1/2) Epoch 10, batch 2750, loss[loss=0.2577, simple_loss=0.2903, pruned_loss=0.08663, ctc_loss=0.1704, cr_loss=0.4456, over 34639.00 frames. ], tot_loss[loss=0.2634, simple_loss=0.2992, pruned_loss=0.08769, ctc_loss=0.1732, cr_loss=0.4402, over 6760876.13 frames. ], batch size: 88, lr: 1.20e-02, grad_scale: 32.0 2024-09-17 11:35:10,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=175639.33333333334, ans=0.1 2024-09-17 11:35:20,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=175639.33333333334, ans=0.1 2024-09-17 11:35:23,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=175686.0, ans=0.1 2024-09-17 11:35:46,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=175732.66666666666, ans=0.035 2024-09-17 11:35:48,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=175732.66666666666, ans=0.1 2024-09-17 11:36:11,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=175779.33333333334, ans=0.1 2024-09-17 11:36:11,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=175779.33333333334, ans=0.2 2024-09-17 11:36:20,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2024-09-17 11:36:21,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=175826.0, ans=0.0 2024-09-17 11:36:30,896 INFO [train.py:1198] (1/2) Epoch 10, batch 2800, loss[loss=0.3051, simple_loss=0.3252, pruned_loss=0.1119, ctc_loss=0.2172, cr_loss=0.4449, over 23562.00 frames. ], tot_loss[loss=0.2646, simple_loss=0.2999, pruned_loss=0.08833, ctc_loss=0.1744, cr_loss=0.4415, over 6738730.42 frames. ], batch size: 244, lr: 1.20e-02, grad_scale: 32.0 2024-09-17 11:36:35,814 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.707e+02 3.205e+02 3.884e+02 7.477e+02, threshold=6.410e+02, percent-clipped=4.0 2024-09-17 11:36:39,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=175872.66666666666, ans=0.2 2024-09-17 11:36:47,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=175919.33333333334, ans=0.05 2024-09-17 11:36:54,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=175919.33333333334, ans=0.125 2024-09-17 11:36:59,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=175919.33333333334, ans=0.125 2024-09-17 11:37:03,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.34 vs. limit=15.0 2024-09-17 11:37:23,329 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.08 vs. limit=22.5 2024-09-17 11:37:27,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=176012.66666666666, ans=0.5 2024-09-17 11:37:53,623 INFO [train.py:1198] (1/2) Epoch 10, batch 2850, loss[loss=0.2549, simple_loss=0.2898, pruned_loss=0.08496, ctc_loss=0.1653, cr_loss=0.4236, over 34494.00 frames. ], tot_loss[loss=0.2651, simple_loss=0.3004, pruned_loss=0.0886, ctc_loss=0.1749, cr_loss=0.4418, over 6722635.45 frames. ], batch size: 90, lr: 1.20e-02, grad_scale: 32.0 2024-09-17 11:37:53,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=176106.0, ans=0.07 2024-09-17 11:38:21,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=176152.66666666666, ans=0.0 2024-09-17 11:38:22,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=176152.66666666666, ans=0.125 2024-09-17 11:39:17,806 INFO [train.py:1198] (1/2) Epoch 10, batch 2900, loss[loss=0.244, simple_loss=0.2896, pruned_loss=0.07621, ctc_loss=0.1487, cr_loss=0.4081, over 34550.00 frames. ], tot_loss[loss=0.2655, simple_loss=0.3013, pruned_loss=0.08854, ctc_loss=0.1747, cr_loss=0.4416, over 6753541.15 frames. ], batch size: 94, lr: 1.20e-02, grad_scale: 32.0 2024-09-17 11:39:22,502 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.910e+02 2.547e+02 2.985e+02 3.775e+02 6.281e+02, threshold=5.969e+02, percent-clipped=0.0 2024-09-17 11:39:34,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=176386.0, ans=0.0 2024-09-17 11:39:37,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=176386.0, ans=0.125 2024-09-17 11:39:42,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=176386.0, ans=0.125 2024-09-17 11:39:51,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=176432.66666666666, ans=0.125 2024-09-17 11:40:09,936 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2024-09-17 11:40:42,313 INFO [train.py:1198] (1/2) Epoch 10, batch 2950, loss[loss=0.2457, simple_loss=0.2819, pruned_loss=0.08093, ctc_loss=0.1566, cr_loss=0.406, over 34622.00 frames. ], tot_loss[loss=0.2642, simple_loss=0.2999, pruned_loss=0.08809, ctc_loss=0.1739, cr_loss=0.4396, over 6749253.81 frames. ], batch size: 88, lr: 1.19e-02, grad_scale: 32.0 2024-09-17 11:40:49,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=176572.66666666666, ans=0.1 2024-09-17 11:41:17,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=176666.0, ans=0.125 2024-09-17 11:41:28,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=176666.0, ans=0.0 2024-09-17 11:41:30,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=176712.66666666666, ans=0.1 2024-09-17 11:41:38,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=176712.66666666666, ans=0.2 2024-09-17 11:41:43,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=176712.66666666666, ans=0.025 2024-09-17 11:41:53,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=176759.33333333334, ans=0.1 2024-09-17 11:41:58,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=176759.33333333334, ans=0.125 2024-09-17 11:42:04,825 INFO [train.py:1198] (1/2) Epoch 10, batch 3000, loss[loss=0.26, simple_loss=0.2955, pruned_loss=0.08669, ctc_loss=0.1706, cr_loss=0.4252, over 34538.00 frames. ], tot_loss[loss=0.2641, simple_loss=0.2999, pruned_loss=0.088, ctc_loss=0.1737, cr_loss=0.4397, over 6750861.44 frames. ], batch size: 94, lr: 1.19e-02, grad_scale: 32.0 2024-09-17 11:42:04,825 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 11:42:21,815 INFO [train.py:1230] (1/2) Epoch 10, validation: loss=0.1562, simple_loss=0.2547, pruned_loss=0.02394, ctc_loss=0.04971, cr_loss=1.53e-14, over 944034.00 frames. 2024-09-17 11:42:21,815 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 11:42:26,775 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.534e+02 2.895e+02 3.452e+02 5.570e+02, threshold=5.790e+02, percent-clipped=0.0 2024-09-17 11:42:37,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=176806.0, ans=0.1 2024-09-17 11:42:52,721 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.46 vs. limit=10.0 2024-09-17 11:43:15,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=176946.0, ans=0.1 2024-09-17 11:43:36,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=176992.66666666666, ans=0.95 2024-09-17 11:43:47,459 INFO [train.py:1198] (1/2) Epoch 10, batch 3050, loss[loss=0.2494, simple_loss=0.2885, pruned_loss=0.08057, ctc_loss=0.1621, cr_loss=0.4212, over 34575.00 frames. ], tot_loss[loss=0.2657, simple_loss=0.3012, pruned_loss=0.08877, ctc_loss=0.175, cr_loss=0.4416, over 6741504.99 frames. ], batch size: 89, lr: 1.19e-02, grad_scale: 32.0 2024-09-17 11:43:55,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=177039.33333333334, ans=0.035 2024-09-17 11:44:09,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.40 vs. limit=22.5 2024-09-17 11:44:12,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2024-09-17 11:44:31,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=177132.66666666666, ans=0.125 2024-09-17 11:44:39,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=177179.33333333334, ans=0.125 2024-09-17 11:45:08,334 INFO [train.py:1198] (1/2) Epoch 10, batch 3100, loss[loss=0.2724, simple_loss=0.311, pruned_loss=0.09059, ctc_loss=0.1783, cr_loss=0.4245, over 34289.00 frames. ], tot_loss[loss=0.2653, simple_loss=0.3009, pruned_loss=0.08864, ctc_loss=0.1748, cr_loss=0.4404, over 6740580.29 frames. ], batch size: 117, lr: 1.19e-02, grad_scale: 32.0 2024-09-17 11:45:13,140 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.111e+02 2.648e+02 3.096e+02 3.882e+02 6.847e+02, threshold=6.191e+02, percent-clipped=3.0 2024-09-17 11:45:21,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=177272.66666666666, ans=0.1 2024-09-17 11:45:27,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=177319.33333333334, ans=0.125 2024-09-17 11:46:00,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=177412.66666666666, ans=0.0 2024-09-17 11:46:10,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=177412.66666666666, ans=0.025 2024-09-17 11:46:19,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.41 vs. limit=10.0 2024-09-17 11:46:21,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=177459.33333333334, ans=0.125 2024-09-17 11:46:29,790 INFO [train.py:1198] (1/2) Epoch 10, batch 3150, loss[loss=0.2724, simple_loss=0.3119, pruned_loss=0.08902, ctc_loss=0.1811, cr_loss=0.4652, over 33865.00 frames. ], tot_loss[loss=0.265, simple_loss=0.3006, pruned_loss=0.08844, ctc_loss=0.1745, cr_loss=0.4404, over 6747080.70 frames. ], batch size: 122, lr: 1.19e-02, grad_scale: 32.0 2024-09-17 11:46:44,748 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:46:52,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=177552.66666666666, ans=0.125 2024-09-17 11:47:04,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.85 vs. limit=22.5 2024-09-17 11:47:10,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=177599.33333333334, ans=0.0 2024-09-17 11:47:13,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=177599.33333333334, ans=0.0 2024-09-17 11:47:28,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.06 vs. limit=22.5 2024-09-17 11:47:37,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=177692.66666666666, ans=0.125 2024-09-17 11:47:50,783 INFO [train.py:1198] (1/2) Epoch 10, batch 3200, loss[loss=0.2517, simple_loss=0.2909, pruned_loss=0.08143, ctc_loss=0.1631, cr_loss=0.4246, over 34525.00 frames. ], tot_loss[loss=0.2639, simple_loss=0.2997, pruned_loss=0.08797, ctc_loss=0.1735, cr_loss=0.4392, over 6758393.12 frames. ], batch size: 94, lr: 1.19e-02, grad_scale: 32.0 2024-09-17 11:47:54,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=177739.33333333334, ans=0.125 2024-09-17 11:47:54,877 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.81 vs. limit=22.5 2024-09-17 11:47:55,632 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.472e+02 2.888e+02 3.659e+02 7.318e+02, threshold=5.776e+02, percent-clipped=1.0 2024-09-17 11:48:17,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=177786.0, ans=0.0 2024-09-17 11:48:33,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.38 vs. limit=22.5 2024-09-17 11:48:38,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=177879.33333333334, ans=0.125 2024-09-17 11:48:49,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=177879.33333333334, ans=0.0 2024-09-17 11:48:54,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=177926.0, ans=0.0 2024-09-17 11:49:12,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-17 11:49:13,315 INFO [train.py:1198] (1/2) Epoch 10, batch 3250, loss[loss=0.2681, simple_loss=0.3055, pruned_loss=0.08823, ctc_loss=0.1809, cr_loss=0.4525, over 34654.00 frames. ], tot_loss[loss=0.2637, simple_loss=0.2996, pruned_loss=0.08776, ctc_loss=0.1731, cr_loss=0.4389, over 6768235.33 frames. ], batch size: 98, lr: 1.19e-02, grad_scale: 32.0 2024-09-17 11:49:15,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=177972.66666666666, ans=0.05 2024-09-17 11:49:37,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=178019.33333333334, ans=0.125 2024-09-17 11:49:48,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=178066.0, ans=0.125 2024-09-17 11:49:50,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=178066.0, ans=0.125 2024-09-17 11:50:01,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=178112.66666666666, ans=0.125 2024-09-17 11:50:11,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=178112.66666666666, ans=0.2 2024-09-17 11:50:25,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=178159.33333333334, ans=0.125 2024-09-17 11:50:30,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=178159.33333333334, ans=0.0 2024-09-17 11:50:34,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=178206.0, ans=0.0 2024-09-17 11:50:35,432 INFO [train.py:1198] (1/2) Epoch 10, batch 3300, loss[loss=0.269, simple_loss=0.3104, pruned_loss=0.08763, ctc_loss=0.174, cr_loss=0.4418, over 33168.00 frames. ], tot_loss[loss=0.2623, simple_loss=0.2984, pruned_loss=0.0871, ctc_loss=0.1722, cr_loss=0.4374, over 6767737.08 frames. ], batch size: 130, lr: 1.19e-02, grad_scale: 16.0 2024-09-17 11:50:40,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178206.0, ans=0.1 2024-09-17 11:50:42,015 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.172e+02 2.588e+02 3.362e+02 4.891e+02 8.059e+02, threshold=6.724e+02, percent-clipped=12.0 2024-09-17 11:50:47,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=178206.0, ans=0.125 2024-09-17 11:50:54,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.72 vs. limit=15.0 2024-09-17 11:51:08,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=178299.33333333334, ans=0.125 2024-09-17 11:51:23,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=178346.0, ans=0.125 2024-09-17 11:51:23,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=178346.0, ans=0.05 2024-09-17 11:51:31,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=178346.0, ans=0.0 2024-09-17 11:51:41,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=178392.66666666666, ans=0.125 2024-09-17 11:51:54,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=178392.66666666666, ans=0.07 2024-09-17 11:51:56,931 INFO [train.py:1198] (1/2) Epoch 10, batch 3350, loss[loss=0.2663, simple_loss=0.3111, pruned_loss=0.08447, ctc_loss=0.1736, cr_loss=0.4465, over 33852.00 frames. ], tot_loss[loss=0.2634, simple_loss=0.2994, pruned_loss=0.08762, ctc_loss=0.1731, cr_loss=0.439, over 6742571.98 frames. ], batch size: 122, lr: 1.19e-02, grad_scale: 16.0 2024-09-17 11:52:10,854 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2024-09-17 11:52:25,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.94 vs. limit=22.5 2024-09-17 11:52:39,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=178532.66666666666, ans=0.025 2024-09-17 11:52:57,355 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.35 vs. limit=15.0 2024-09-17 11:53:00,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=178626.0, ans=0.125 2024-09-17 11:53:08,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=178626.0, ans=0.125 2024-09-17 11:53:17,546 INFO [train.py:1198] (1/2) Epoch 10, batch 3400, loss[loss=0.2039, simple_loss=0.2459, pruned_loss=0.06209, ctc_loss=0.1228, cr_loss=0.3301, over 34193.00 frames. ], tot_loss[loss=0.2639, simple_loss=0.2995, pruned_loss=0.08804, ctc_loss=0.1737, cr_loss=0.4392, over 6732518.41 frames. ], batch size: 78, lr: 1.19e-02, grad_scale: 16.0 2024-09-17 11:53:23,853 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.057e+02 2.754e+02 3.242e+02 3.869e+02 7.833e+02, threshold=6.484e+02, percent-clipped=2.0 2024-09-17 11:53:24,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178672.66666666666, ans=0.1 2024-09-17 11:53:25,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=178672.66666666666, ans=0.125 2024-09-17 11:53:29,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178672.66666666666, ans=0.1 2024-09-17 11:53:36,355 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.68 vs. limit=10.0 2024-09-17 11:54:13,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.49 vs. limit=15.0 2024-09-17 11:54:14,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=178812.66666666666, ans=0.125 2024-09-17 11:54:16,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.07 vs. limit=22.5 2024-09-17 11:54:19,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.89 vs. limit=22.5 2024-09-17 11:54:35,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=178859.33333333334, ans=0.0 2024-09-17 11:54:39,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=178906.0, ans=0.1 2024-09-17 11:54:40,370 INFO [train.py:1198] (1/2) Epoch 10, batch 3450, loss[loss=0.2814, simple_loss=0.3201, pruned_loss=0.09349, ctc_loss=0.1881, cr_loss=0.4515, over 32994.00 frames. ], tot_loss[loss=0.2641, simple_loss=0.3, pruned_loss=0.08799, ctc_loss=0.1735, cr_loss=0.4395, over 6744827.98 frames. ], batch size: 130, lr: 1.19e-02, grad_scale: 16.0 2024-09-17 11:54:40,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=178906.0, ans=0.0 2024-09-17 11:54:40,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=178906.0, ans=0.125 2024-09-17 11:54:41,467 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.86 vs. limit=10.0 2024-09-17 11:54:48,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=178906.0, ans=0.0 2024-09-17 11:54:57,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.38 vs. limit=22.5 2024-09-17 11:55:11,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=178999.33333333334, ans=0.0 2024-09-17 11:55:23,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=22.5 2024-09-17 11:55:41,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=179046.0, ans=0.0 2024-09-17 11:55:53,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=179092.66666666666, ans=0.0 2024-09-17 11:56:00,840 INFO [train.py:1198] (1/2) Epoch 10, batch 3500, loss[loss=0.2388, simple_loss=0.2778, pruned_loss=0.07695, ctc_loss=0.1509, cr_loss=0.3928, over 34459.00 frames. ], tot_loss[loss=0.2634, simple_loss=0.2994, pruned_loss=0.08766, ctc_loss=0.1729, cr_loss=0.4387, over 6747114.80 frames. ], batch size: 85, lr: 1.19e-02, grad_scale: 16.0 2024-09-17 11:56:06,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179139.33333333334, ans=0.1 2024-09-17 11:56:07,400 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.082e+02 2.822e+02 3.469e+02 4.721e+02 9.831e+02, threshold=6.938e+02, percent-clipped=11.0 2024-09-17 11:56:19,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=179186.0, ans=0.5 2024-09-17 11:56:24,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=179186.0, ans=0.125 2024-09-17 11:56:33,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=179232.66666666666, ans=0.0 2024-09-17 11:56:43,263 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 11:56:54,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=179279.33333333334, ans=0.1 2024-09-17 11:57:09,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=179326.0, ans=0.125 2024-09-17 11:57:09,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=179326.0, ans=0.0 2024-09-17 11:57:21,663 INFO [train.py:1198] (1/2) Epoch 10, batch 3550, loss[loss=0.268, simple_loss=0.3064, pruned_loss=0.08786, ctc_loss=0.1757, cr_loss=0.4667, over 34387.00 frames. ], tot_loss[loss=0.2634, simple_loss=0.2995, pruned_loss=0.08761, ctc_loss=0.1728, cr_loss=0.4389, over 6757934.91 frames. ], batch size: 103, lr: 1.19e-02, grad_scale: 16.0 2024-09-17 11:57:31,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=179372.66666666666, ans=0.0 2024-09-17 11:57:42,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=179419.33333333334, ans=0.125 2024-09-17 11:57:44,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=179419.33333333334, ans=0.0 2024-09-17 11:57:53,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=179466.0, ans=0.125 2024-09-17 11:58:19,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=179512.66666666666, ans=0.2 2024-09-17 11:58:23,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=179512.66666666666, ans=0.0 2024-09-17 11:58:29,863 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.61 vs. limit=15.0 2024-09-17 11:58:31,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=179559.33333333334, ans=0.2 2024-09-17 11:58:32,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=179559.33333333334, ans=0.125 2024-09-17 11:58:39,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.78 vs. limit=10.0 2024-09-17 11:58:43,715 INFO [train.py:1198] (1/2) Epoch 10, batch 3600, loss[loss=0.2396, simple_loss=0.2783, pruned_loss=0.07704, ctc_loss=0.1526, cr_loss=0.409, over 34470.00 frames. ], tot_loss[loss=0.2644, simple_loss=0.3004, pruned_loss=0.08806, ctc_loss=0.1735, cr_loss=0.4403, over 6767020.85 frames. ], batch size: 90, lr: 1.18e-02, grad_scale: 32.0 2024-09-17 11:58:49,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.96 vs. limit=22.5 2024-09-17 11:58:50,275 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.737e+02 3.077e+02 4.342e+02 7.227e+02, threshold=6.154e+02, percent-clipped=2.0 2024-09-17 11:58:55,994 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.90 vs. limit=6.0 2024-09-17 11:59:00,839 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.43 vs. limit=22.5 2024-09-17 11:59:09,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=179652.66666666666, ans=0.125 2024-09-17 11:59:32,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=179746.0, ans=0.125 2024-09-17 11:59:36,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=179746.0, ans=0.125 2024-09-17 11:59:51,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=179792.66666666666, ans=0.07 2024-09-17 12:00:04,309 INFO [train.py:1198] (1/2) Epoch 10, batch 3650, loss[loss=0.2872, simple_loss=0.3233, pruned_loss=0.09726, ctc_loss=0.1881, cr_loss=0.4753, over 34460.00 frames. ], tot_loss[loss=0.2634, simple_loss=0.2995, pruned_loss=0.08764, ctc_loss=0.1727, cr_loss=0.4391, over 6768116.36 frames. ], batch size: 110, lr: 1.18e-02, grad_scale: 32.0 2024-09-17 12:00:06,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=179839.33333333334, ans=0.2 2024-09-17 12:00:10,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=179839.33333333334, ans=0.125 2024-09-17 12:00:27,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=179886.0, ans=0.125 2024-09-17 12:00:32,266 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=15.0 2024-09-17 12:00:43,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=179932.66666666666, ans=0.0 2024-09-17 12:00:44,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=179932.66666666666, ans=0.0 2024-09-17 12:00:44,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=179932.66666666666, ans=0.125 2024-09-17 12:01:00,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=179979.33333333334, ans=0.1 2024-09-17 12:01:24,453 INFO [train.py:1198] (1/2) Epoch 10, batch 3700, loss[loss=0.2588, simple_loss=0.3015, pruned_loss=0.08274, ctc_loss=0.1675, cr_loss=0.4284, over 34643.00 frames. ], tot_loss[loss=0.2623, simple_loss=0.2988, pruned_loss=0.08696, ctc_loss=0.1717, cr_loss=0.4381, over 6782338.62 frames. ], batch size: 102, lr: 1.18e-02, grad_scale: 16.0 2024-09-17 12:01:25,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=180072.66666666666, ans=15.0 2024-09-17 12:01:31,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=180072.66666666666, ans=0.0 2024-09-17 12:01:32,487 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.501e+02 3.219e+02 4.085e+02 1.010e+03, threshold=6.439e+02, percent-clipped=4.0 2024-09-17 12:01:36,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.39 vs. limit=6.0 2024-09-17 12:01:59,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=180166.0, ans=0.0 2024-09-17 12:02:07,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=180166.0, ans=0.125 2024-09-17 12:02:09,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=22.5 2024-09-17 12:02:21,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=180212.66666666666, ans=0.1 2024-09-17 12:02:28,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=180212.66666666666, ans=0.125 2024-09-17 12:02:28,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=180212.66666666666, ans=0.125 2024-09-17 12:02:44,422 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:02:47,363 INFO [train.py:1198] (1/2) Epoch 10, batch 3750, loss[loss=0.2887, simple_loss=0.3198, pruned_loss=0.1001, ctc_loss=0.1939, cr_loss=0.4682, over 34384.00 frames. ], tot_loss[loss=0.2657, simple_loss=0.302, pruned_loss=0.08843, ctc_loss=0.1743, cr_loss=0.4429, over 6784339.42 frames. ], batch size: 113, lr: 1.18e-02, grad_scale: 16.0 2024-09-17 12:03:05,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=180352.66666666666, ans=0.125 2024-09-17 12:03:05,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=180352.66666666666, ans=0.0 2024-09-17 12:03:05,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=180352.66666666666, ans=0.0 2024-09-17 12:03:12,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=180352.66666666666, ans=0.025 2024-09-17 12:03:31,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=180399.33333333334, ans=0.2 2024-09-17 12:03:41,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.77 vs. limit=12.0 2024-09-17 12:03:54,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=180492.66666666666, ans=0.125 2024-09-17 12:03:54,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=180492.66666666666, ans=0.025 2024-09-17 12:03:57,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.67 vs. limit=10.0 2024-09-17 12:04:08,738 INFO [train.py:1198] (1/2) Epoch 10, batch 3800, loss[loss=0.3063, simple_loss=0.3287, pruned_loss=0.1102, ctc_loss=0.2139, cr_loss=0.5161, over 29618.00 frames. ], tot_loss[loss=0.2701, simple_loss=0.3054, pruned_loss=0.09064, ctc_loss=0.1783, cr_loss=0.4474, over 6675453.81 frames. ], batch size: 175, lr: 1.18e-02, grad_scale: 16.0 2024-09-17 12:04:17,020 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.464e+02 2.797e+02 3.245e+02 6.034e+02, threshold=5.594e+02, percent-clipped=0.0 2024-09-17 12:04:19,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=180539.33333333334, ans=0.2 2024-09-17 12:04:25,328 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=16.08 vs. limit=15.0 2024-09-17 12:04:28,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-17 12:04:34,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=180586.0, ans=0.0 2024-09-17 12:04:35,145 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.45 vs. limit=10.0 2024-09-17 12:04:39,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=180586.0, ans=0.0 2024-09-17 12:04:41,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=180632.66666666666, ans=0.0 2024-09-17 12:04:43,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=180632.66666666666, ans=0.125 2024-09-17 12:05:19,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=180726.0, ans=0.125 2024-09-17 12:05:32,582 INFO [train.py:1198] (1/2) Epoch 10, batch 3850, loss[loss=0.3237, simple_loss=0.3399, pruned_loss=0.1208, ctc_loss=0.232, cr_loss=0.4881, over 24546.00 frames. ], tot_loss[loss=0.2772, simple_loss=0.3097, pruned_loss=0.09468, ctc_loss=0.1863, cr_loss=0.4513, over 6252301.17 frames. ], batch size: 244, lr: 1.18e-02, grad_scale: 16.0 2024-09-17 12:07:13,526 INFO [train.py:1198] (1/2) Epoch 11, batch 0, loss[loss=0.2352, simple_loss=0.2768, pruned_loss=0.07446, ctc_loss=0.1471, cr_loss=0.3808, over 34473.00 frames. ], tot_loss[loss=0.2352, simple_loss=0.2768, pruned_loss=0.07446, ctc_loss=0.1471, cr_loss=0.3808, over 34473.00 frames. ], batch size: 85, lr: 1.13e-02, grad_scale: 32.0 2024-09-17 12:07:13,527 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 12:07:25,280 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.0703, 5.2651, 5.2628, 3.1007], device='cuda:1') 2024-09-17 12:07:30,378 INFO [train.py:1230] (1/2) Epoch 11, validation: loss=0.1585, simple_loss=0.2577, pruned_loss=0.02463, ctc_loss=0.05011, cr_loss=1.617e-14, over 944034.00 frames. 2024-09-17 12:07:30,379 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 12:07:33,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=180898.66666666666, ans=0.125 2024-09-17 12:07:50,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=180945.33333333334, ans=0.125 2024-09-17 12:07:57,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=180945.33333333334, ans=0.125 2024-09-17 12:07:59,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=180945.33333333334, ans=0.125 2024-09-17 12:08:16,934 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.297e+02 2.780e+02 3.250e+02 4.177e+02 8.625e+02, threshold=6.499e+02, percent-clipped=4.0 2024-09-17 12:08:43,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=15.0 2024-09-17 12:08:50,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=181085.33333333334, ans=0.0 2024-09-17 12:08:53,526 INFO [train.py:1198] (1/2) Epoch 11, batch 50, loss[loss=0.2443, simple_loss=0.2787, pruned_loss=0.08117, ctc_loss=0.1599, cr_loss=0.39, over 34528.00 frames. ], tot_loss[loss=0.2643, simple_loss=0.3009, pruned_loss=0.08761, ctc_loss=0.1739, cr_loss=0.4396, over 1481379.97 frames. ], batch size: 82, lr: 1.13e-02, grad_scale: 32.0 2024-09-17 12:09:02,069 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:09:48,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=181272.0, ans=0.0 2024-09-17 12:10:05,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=181318.66666666666, ans=0.025 2024-09-17 12:10:18,102 INFO [train.py:1198] (1/2) Epoch 11, batch 100, loss[loss=0.2493, simple_loss=0.2858, pruned_loss=0.08151, ctc_loss=0.1626, cr_loss=0.4317, over 34598.00 frames. ], tot_loss[loss=0.2664, simple_loss=0.3028, pruned_loss=0.0886, ctc_loss=0.1753, cr_loss=0.4442, over 2628384.77 frames. ], batch size: 89, lr: 1.13e-02, grad_scale: 32.0 2024-09-17 12:10:22,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-09-17 12:10:31,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=181365.33333333334, ans=0.2 2024-09-17 12:10:36,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=181412.0, ans=0.0 2024-09-17 12:10:55,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=181458.66666666666, ans=0.125 2024-09-17 12:11:06,350 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.082e+02 2.536e+02 2.920e+02 3.621e+02 6.760e+02, threshold=5.841e+02, percent-clipped=1.0 2024-09-17 12:11:29,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=181552.0, ans=0.0 2024-09-17 12:11:38,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=181552.0, ans=0.2 2024-09-17 12:11:41,892 INFO [train.py:1198] (1/2) Epoch 11, batch 150, loss[loss=0.2302, simple_loss=0.2675, pruned_loss=0.0733, ctc_loss=0.1489, cr_loss=0.4121, over 34491.00 frames. ], tot_loss[loss=0.2636, simple_loss=0.3003, pruned_loss=0.08732, ctc_loss=0.173, cr_loss=0.4405, over 3556030.47 frames. ], batch size: 82, lr: 1.13e-02, grad_scale: 32.0 2024-09-17 12:12:13,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=181692.0, ans=0.125 2024-09-17 12:12:16,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=181692.0, ans=0.2 2024-09-17 12:12:18,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=181692.0, ans=0.125 2024-09-17 12:12:42,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=181738.66666666666, ans=0.125 2024-09-17 12:12:46,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.67 vs. limit=15.0 2024-09-17 12:13:05,699 INFO [train.py:1198] (1/2) Epoch 11, batch 200, loss[loss=0.2847, simple_loss=0.3166, pruned_loss=0.0977, ctc_loss=0.1912, cr_loss=0.478, over 31581.00 frames. ], tot_loss[loss=0.2618, simple_loss=0.2986, pruned_loss=0.08656, ctc_loss=0.1719, cr_loss=0.439, over 4270510.50 frames. ], batch size: 145, lr: 1.12e-02, grad_scale: 32.0 2024-09-17 12:13:07,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=181832.0, ans=0.07 2024-09-17 12:13:15,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=181832.0, ans=0.1 2024-09-17 12:13:51,527 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.020e+02 2.517e+02 2.861e+02 3.719e+02 6.862e+02, threshold=5.722e+02, percent-clipped=5.0 2024-09-17 12:13:51,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=181925.33333333334, ans=0.125 2024-09-17 12:14:00,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=181972.0, ans=0.125 2024-09-17 12:14:10,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=182018.66666666666, ans=0.125 2024-09-17 12:14:28,296 INFO [train.py:1198] (1/2) Epoch 11, batch 250, loss[loss=0.2594, simple_loss=0.3025, pruned_loss=0.08304, ctc_loss=0.1675, cr_loss=0.4209, over 34260.00 frames. ], tot_loss[loss=0.2613, simple_loss=0.2984, pruned_loss=0.08621, ctc_loss=0.1709, cr_loss=0.4382, over 4833485.08 frames. ], batch size: 117, lr: 1.12e-02, grad_scale: 32.0 2024-09-17 12:14:44,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=182065.33333333334, ans=0.125 2024-09-17 12:14:51,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=182112.0, ans=0.1 2024-09-17 12:15:14,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=182158.66666666666, ans=0.125 2024-09-17 12:15:18,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=182205.33333333334, ans=0.0 2024-09-17 12:15:36,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=182252.0, ans=0.05 2024-09-17 12:15:48,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=182252.0, ans=0.025 2024-09-17 12:15:52,973 INFO [train.py:1198] (1/2) Epoch 11, batch 300, loss[loss=0.2782, simple_loss=0.3165, pruned_loss=0.09242, ctc_loss=0.1823, cr_loss=0.4647, over 34358.00 frames. ], tot_loss[loss=0.2606, simple_loss=0.2978, pruned_loss=0.08587, ctc_loss=0.1703, cr_loss=0.4375, over 5261447.41 frames. ], batch size: 107, lr: 1.12e-02, grad_scale: 16.0 2024-09-17 12:16:01,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=182298.66666666666, ans=0.125 2024-09-17 12:16:21,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=182345.33333333334, ans=0.0 2024-09-17 12:16:42,726 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.533e+02 2.908e+02 4.042e+02 7.792e+02, threshold=5.816e+02, percent-clipped=8.0 2024-09-17 12:16:47,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=182438.66666666666, ans=0.0 2024-09-17 12:17:17,359 INFO [train.py:1198] (1/2) Epoch 11, batch 350, loss[loss=0.2438, simple_loss=0.276, pruned_loss=0.0814, ctc_loss=0.1594, cr_loss=0.4214, over 34302.00 frames. ], tot_loss[loss=0.2617, simple_loss=0.2987, pruned_loss=0.08647, ctc_loss=0.1712, cr_loss=0.4394, over 5597114.00 frames. ], batch size: 83, lr: 1.12e-02, grad_scale: 16.0 2024-09-17 12:17:39,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=182578.66666666666, ans=0.125 2024-09-17 12:18:26,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=182718.66666666666, ans=0.125 2024-09-17 12:18:41,715 INFO [train.py:1198] (1/2) Epoch 11, batch 400, loss[loss=0.278, simple_loss=0.3097, pruned_loss=0.09515, ctc_loss=0.1853, cr_loss=0.4716, over 34398.00 frames. ], tot_loss[loss=0.2611, simple_loss=0.2981, pruned_loss=0.08618, ctc_loss=0.1709, cr_loss=0.4388, over 5864041.85 frames. ], batch size: 95, lr: 1.12e-02, grad_scale: 32.0 2024-09-17 12:18:42,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=182765.33333333334, ans=0.125 2024-09-17 12:18:45,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=182765.33333333334, ans=0.025 2024-09-17 12:18:46,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=182765.33333333334, ans=0.1 2024-09-17 12:19:23,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=182858.66666666666, ans=0.125 2024-09-17 12:19:29,281 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.178e+02 2.555e+02 3.265e+02 4.021e+02 7.716e+02, threshold=6.530e+02, percent-clipped=6.0 2024-09-17 12:19:58,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=182952.0, ans=0.0 2024-09-17 12:20:04,543 INFO [train.py:1198] (1/2) Epoch 11, batch 450, loss[loss=0.2653, simple_loss=0.3021, pruned_loss=0.08781, ctc_loss=0.1757, cr_loss=0.4445, over 34697.00 frames. ], tot_loss[loss=0.2612, simple_loss=0.298, pruned_loss=0.08634, ctc_loss=0.1708, cr_loss=0.4386, over 6051969.59 frames. ], batch size: 97, lr: 1.12e-02, grad_scale: 32.0 2024-09-17 12:20:15,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.53 vs. limit=22.5 2024-09-17 12:20:23,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=183045.33333333334, ans=0.1 2024-09-17 12:20:33,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=183045.33333333334, ans=0.025 2024-09-17 12:20:43,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=183092.0, ans=0.2 2024-09-17 12:20:51,557 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:20:55,494 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.25 vs. limit=10.0 2024-09-17 12:21:15,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.40 vs. limit=10.0 2024-09-17 12:21:28,943 INFO [train.py:1198] (1/2) Epoch 11, batch 500, loss[loss=0.2689, simple_loss=0.3087, pruned_loss=0.08876, ctc_loss=0.1727, cr_loss=0.4246, over 34477.00 frames. ], tot_loss[loss=0.2602, simple_loss=0.2971, pruned_loss=0.08589, ctc_loss=0.1699, cr_loss=0.4373, over 6218996.53 frames. ], batch size: 110, lr: 1.12e-02, grad_scale: 32.0 2024-09-17 12:21:34,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=183232.0, ans=0.1 2024-09-17 12:21:37,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=183232.0, ans=0.025 2024-09-17 12:21:39,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.09 vs. limit=10.0 2024-09-17 12:21:50,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=183278.66666666666, ans=0.125 2024-09-17 12:21:52,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=183278.66666666666, ans=0.1 2024-09-17 12:22:02,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=183325.33333333334, ans=0.125 2024-09-17 12:22:05,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=183325.33333333334, ans=0.0 2024-09-17 12:22:16,763 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.055e+02 2.491e+02 2.987e+02 3.568e+02 8.077e+02, threshold=5.973e+02, percent-clipped=2.0 2024-09-17 12:22:31,696 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.39 vs. limit=10.0 2024-09-17 12:22:32,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=183372.0, ans=0.125 2024-09-17 12:22:53,814 INFO [train.py:1198] (1/2) Epoch 11, batch 550, loss[loss=0.2736, simple_loss=0.3132, pruned_loss=0.09052, ctc_loss=0.1759, cr_loss=0.4461, over 33879.00 frames. ], tot_loss[loss=0.2605, simple_loss=0.2974, pruned_loss=0.08598, ctc_loss=0.1702, cr_loss=0.4378, over 6330296.70 frames. ], batch size: 122, lr: 1.12e-02, grad_scale: 32.0 2024-09-17 12:22:54,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=183465.33333333334, ans=0.1 2024-09-17 12:22:54,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.76 vs. limit=22.5 2024-09-17 12:23:10,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=183512.0, ans=0.0 2024-09-17 12:23:14,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=183512.0, ans=0.2 2024-09-17 12:23:18,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=183512.0, ans=0.05 2024-09-17 12:23:23,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=183512.0, ans=0.125 2024-09-17 12:23:27,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=183558.66666666666, ans=0.0 2024-09-17 12:23:40,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=183558.66666666666, ans=0.125 2024-09-17 12:24:18,327 INFO [train.py:1198] (1/2) Epoch 11, batch 600, loss[loss=0.2858, simple_loss=0.3195, pruned_loss=0.09729, ctc_loss=0.193, cr_loss=0.4736, over 34242.00 frames. ], tot_loss[loss=0.2605, simple_loss=0.2974, pruned_loss=0.08601, ctc_loss=0.1702, cr_loss=0.4384, over 6430303.33 frames. ], batch size: 117, lr: 1.12e-02, grad_scale: 32.0 2024-09-17 12:24:21,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=183698.66666666666, ans=0.125 2024-09-17 12:24:25,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=183698.66666666666, ans=0.125 2024-09-17 12:24:30,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=183698.66666666666, ans=0.2 2024-09-17 12:24:30,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2024-09-17 12:24:33,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=183745.33333333334, ans=0.0 2024-09-17 12:24:58,357 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=15.0 2024-09-17 12:24:59,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=183792.0, ans=0.0 2024-09-17 12:25:05,373 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.452e+02 2.940e+02 3.607e+02 8.577e+02, threshold=5.881e+02, percent-clipped=4.0 2024-09-17 12:25:36,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2024-09-17 12:25:40,211 INFO [train.py:1198] (1/2) Epoch 11, batch 650, loss[loss=0.2617, simple_loss=0.2977, pruned_loss=0.08704, ctc_loss=0.1682, cr_loss=0.4482, over 34556.00 frames. ], tot_loss[loss=0.2595, simple_loss=0.2965, pruned_loss=0.08552, ctc_loss=0.1694, cr_loss=0.4372, over 6521537.11 frames. ], batch size: 94, lr: 1.12e-02, grad_scale: 32.0 2024-09-17 12:26:13,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=184025.33333333334, ans=0.1 2024-09-17 12:26:22,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=184025.33333333334, ans=0.125 2024-09-17 12:26:39,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=184072.0, ans=0.2 2024-09-17 12:26:40,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=184072.0, ans=0.0 2024-09-17 12:26:47,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=184118.66666666666, ans=0.0 2024-09-17 12:26:52,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=184118.66666666666, ans=0.125 2024-09-17 12:26:57,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=184118.66666666666, ans=0.025 2024-09-17 12:27:02,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=184118.66666666666, ans=0.125 2024-09-17 12:27:05,119 INFO [train.py:1198] (1/2) Epoch 11, batch 700, loss[loss=0.2485, simple_loss=0.2864, pruned_loss=0.08149, ctc_loss=0.1557, cr_loss=0.4133, over 34579.00 frames. ], tot_loss[loss=0.2598, simple_loss=0.297, pruned_loss=0.08563, ctc_loss=0.1696, cr_loss=0.4375, over 6578291.93 frames. ], batch size: 89, lr: 1.12e-02, grad_scale: 32.0 2024-09-17 12:27:13,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=184165.33333333334, ans=0.0 2024-09-17 12:27:35,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=184212.0, ans=0.1 2024-09-17 12:27:46,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=184258.66666666666, ans=0.0 2024-09-17 12:27:46,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=184258.66666666666, ans=0.125 2024-09-17 12:27:55,108 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.081e+02 2.721e+02 3.324e+02 4.605e+02 8.256e+02, threshold=6.648e+02, percent-clipped=8.0 2024-09-17 12:28:08,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=184305.33333333334, ans=0.2 2024-09-17 12:28:20,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.82 vs. limit=15.0 2024-09-17 12:28:29,555 INFO [train.py:1198] (1/2) Epoch 11, batch 750, loss[loss=0.2748, simple_loss=0.3098, pruned_loss=0.09296, ctc_loss=0.1816, cr_loss=0.4382, over 34441.00 frames. ], tot_loss[loss=0.2595, simple_loss=0.2966, pruned_loss=0.08552, ctc_loss=0.1694, cr_loss=0.4371, over 6622829.41 frames. ], batch size: 95, lr: 1.12e-02, grad_scale: 32.0 2024-09-17 12:28:44,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=184445.33333333334, ans=0.125 2024-09-17 12:29:09,118 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:29:09,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=184492.0, ans=0.1 2024-09-17 12:29:26,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.94 vs. limit=15.0 2024-09-17 12:29:27,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=184538.66666666666, ans=0.0 2024-09-17 12:29:40,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=184585.33333333334, ans=0.0 2024-09-17 12:29:41,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=184585.33333333334, ans=0.125 2024-09-17 12:29:51,409 INFO [train.py:1198] (1/2) Epoch 11, batch 800, loss[loss=0.2348, simple_loss=0.2734, pruned_loss=0.07492, ctc_loss=0.15, cr_loss=0.4069, over 34444.00 frames. ], tot_loss[loss=0.2592, simple_loss=0.2965, pruned_loss=0.08534, ctc_loss=0.1693, cr_loss=0.437, over 6657786.57 frames. ], batch size: 85, lr: 1.12e-02, grad_scale: 32.0 2024-09-17 12:29:51,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=184632.0, ans=0.05 2024-09-17 12:30:01,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=184632.0, ans=0.025 2024-09-17 12:30:13,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=184678.66666666666, ans=0.2 2024-09-17 12:30:19,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=15.0 2024-09-17 12:30:35,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=184725.33333333334, ans=0.05 2024-09-17 12:30:41,044 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.966e+02 4.003e+02 5.027e+02 7.966e+02, threshold=8.007e+02, percent-clipped=4.0 2024-09-17 12:31:05,112 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:31:08,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=184818.66666666666, ans=0.125 2024-09-17 12:31:15,892 INFO [train.py:1198] (1/2) Epoch 11, batch 850, loss[loss=0.281, simple_loss=0.3174, pruned_loss=0.09421, ctc_loss=0.1836, cr_loss=0.4855, over 34354.00 frames. ], tot_loss[loss=0.2589, simple_loss=0.2961, pruned_loss=0.08518, ctc_loss=0.1689, cr_loss=0.4362, over 6691596.98 frames. ], batch size: 103, lr: 1.12e-02, grad_scale: 16.0 2024-09-17 12:31:26,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=184865.33333333334, ans=0.125 2024-09-17 12:31:37,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=184912.0, ans=0.125 2024-09-17 12:31:46,433 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:32:11,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=185005.33333333334, ans=0.025 2024-09-17 12:32:14,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=185005.33333333334, ans=0.0 2024-09-17 12:32:40,684 INFO [train.py:1198] (1/2) Epoch 11, batch 900, loss[loss=0.2259, simple_loss=0.2714, pruned_loss=0.06878, ctc_loss=0.1403, cr_loss=0.3704, over 34480.00 frames. ], tot_loss[loss=0.2593, simple_loss=0.2965, pruned_loss=0.08541, ctc_loss=0.1693, cr_loss=0.4362, over 6699135.19 frames. ], batch size: 85, lr: 1.11e-02, grad_scale: 16.0 2024-09-17 12:32:55,866 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:32:57,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=185145.33333333334, ans=0.025 2024-09-17 12:33:02,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=185145.33333333334, ans=0.2 2024-09-17 12:33:15,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=185192.0, ans=0.2 2024-09-17 12:33:25,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=185192.0, ans=0.125 2024-09-17 12:33:30,128 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.543e+02 2.969e+02 3.385e+02 7.754e+02, threshold=5.937e+02, percent-clipped=0.0 2024-09-17 12:33:35,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=185238.66666666666, ans=0.125 2024-09-17 12:33:42,253 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:34:02,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=185285.33333333334, ans=0.125 2024-09-17 12:34:05,385 INFO [train.py:1198] (1/2) Epoch 11, batch 950, loss[loss=0.2526, simple_loss=0.2883, pruned_loss=0.08405, ctc_loss=0.164, cr_loss=0.3986, over 34703.00 frames. ], tot_loss[loss=0.2596, simple_loss=0.2968, pruned_loss=0.08554, ctc_loss=0.1694, cr_loss=0.4366, over 6703236.26 frames. ], batch size: 87, lr: 1.11e-02, grad_scale: 16.0 2024-09-17 12:34:15,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=185332.0, ans=0.0 2024-09-17 12:34:20,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=185378.66666666666, ans=0.0 2024-09-17 12:34:39,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=185425.33333333334, ans=0.1 2024-09-17 12:34:55,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=185472.0, ans=0.125 2024-09-17 12:35:17,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=185518.66666666666, ans=0.125 2024-09-17 12:35:20,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=185518.66666666666, ans=0.1 2024-09-17 12:35:27,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=185518.66666666666, ans=0.09899494936611666 2024-09-17 12:35:29,913 INFO [train.py:1198] (1/2) Epoch 11, batch 1000, loss[loss=0.246, simple_loss=0.2874, pruned_loss=0.07794, ctc_loss=0.1574, cr_loss=0.4294, over 34461.00 frames. ], tot_loss[loss=0.2601, simple_loss=0.2974, pruned_loss=0.08576, ctc_loss=0.1697, cr_loss=0.437, over 6696164.72 frames. ], batch size: 90, lr: 1.11e-02, grad_scale: 16.0 2024-09-17 12:36:19,491 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 2.715e+02 3.185e+02 3.951e+02 7.414e+02, threshold=6.370e+02, percent-clipped=3.0 2024-09-17 12:36:22,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.25 vs. limit=15.0 2024-09-17 12:36:36,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=185752.0, ans=0.1 2024-09-17 12:36:47,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=185752.0, ans=0.0 2024-09-17 12:36:52,531 INFO [train.py:1198] (1/2) Epoch 11, batch 1050, loss[loss=0.2637, simple_loss=0.3067, pruned_loss=0.08451, ctc_loss=0.1665, cr_loss=0.4619, over 34540.00 frames. ], tot_loss[loss=0.2594, simple_loss=0.2965, pruned_loss=0.08553, ctc_loss=0.1693, cr_loss=0.4366, over 6706158.78 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 16.0 2024-09-17 12:37:19,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=185845.33333333334, ans=0.125 2024-09-17 12:37:40,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=185938.66666666666, ans=0.025 2024-09-17 12:37:49,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=185938.66666666666, ans=0.025 2024-09-17 12:37:57,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=185938.66666666666, ans=0.04949747468305833 2024-09-17 12:38:02,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=185985.33333333334, ans=0.0 2024-09-17 12:38:04,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=185985.33333333334, ans=0.125 2024-09-17 12:38:17,284 INFO [train.py:1198] (1/2) Epoch 11, batch 1100, loss[loss=0.2515, simple_loss=0.2897, pruned_loss=0.08151, ctc_loss=0.1607, cr_loss=0.4568, over 34348.00 frames. ], tot_loss[loss=0.2592, simple_loss=0.2962, pruned_loss=0.08548, ctc_loss=0.1692, cr_loss=0.4364, over 6718533.03 frames. ], batch size: 91, lr: 1.11e-02, grad_scale: 16.0 2024-09-17 12:38:26,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.89 vs. limit=22.5 2024-09-17 12:38:34,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=186078.66666666666, ans=0.2 2024-09-17 12:38:49,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=186125.33333333334, ans=0.125 2024-09-17 12:39:08,911 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.102e+02 2.525e+02 2.930e+02 3.581e+02 6.903e+02, threshold=5.860e+02, percent-clipped=1.0 2024-09-17 12:39:41,800 INFO [train.py:1198] (1/2) Epoch 11, batch 1150, loss[loss=0.243, simple_loss=0.2823, pruned_loss=0.07808, ctc_loss=0.1545, cr_loss=0.4161, over 34736.00 frames. ], tot_loss[loss=0.2592, simple_loss=0.2961, pruned_loss=0.08551, ctc_loss=0.1694, cr_loss=0.4371, over 6717461.92 frames. ], batch size: 92, lr: 1.11e-02, grad_scale: 16.0 2024-09-17 12:39:46,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.26 vs. limit=15.0 2024-09-17 12:39:59,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.53 vs. limit=15.0 2024-09-17 12:40:21,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-09-17 12:40:23,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=186358.66666666666, ans=0.0 2024-09-17 12:40:27,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2024-09-17 12:40:48,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=186452.0, ans=0.025 2024-09-17 12:40:51,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=186452.0, ans=0.0 2024-09-17 12:41:04,556 INFO [train.py:1198] (1/2) Epoch 11, batch 1200, loss[loss=0.2692, simple_loss=0.3099, pruned_loss=0.0882, ctc_loss=0.1705, cr_loss=0.4497, over 34561.00 frames. ], tot_loss[loss=0.2601, simple_loss=0.2971, pruned_loss=0.08582, ctc_loss=0.17, cr_loss=0.4372, over 6708654.03 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 32.0 2024-09-17 12:41:53,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=186592.0, ans=0.1 2024-09-17 12:41:56,507 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.062e+02 2.750e+02 3.266e+02 3.996e+02 9.898e+02, threshold=6.532e+02, percent-clipped=3.0 2024-09-17 12:42:25,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.17 vs. limit=10.0 2024-09-17 12:42:29,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=186685.33333333334, ans=0.2 2024-09-17 12:42:35,957 INFO [train.py:1198] (1/2) Epoch 11, batch 1250, loss[loss=0.2671, simple_loss=0.308, pruned_loss=0.0862, ctc_loss=0.1723, cr_loss=0.4855, over 34361.00 frames. ], tot_loss[loss=0.2606, simple_loss=0.2978, pruned_loss=0.0859, ctc_loss=0.1702, cr_loss=0.4385, over 6743208.44 frames. ], batch size: 107, lr: 1.11e-02, grad_scale: 32.0 2024-09-17 12:42:49,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=186732.0, ans=0.125 2024-09-17 12:42:54,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=186778.66666666666, ans=0.125 2024-09-17 12:43:03,352 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:43:06,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=186778.66666666666, ans=0.1 2024-09-17 12:43:09,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=186825.33333333334, ans=0.0 2024-09-17 12:43:38,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=186872.0, ans=0.0 2024-09-17 12:43:40,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=6.0 2024-09-17 12:43:51,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2024-09-17 12:44:00,330 INFO [train.py:1198] (1/2) Epoch 11, batch 1300, loss[loss=0.2582, simple_loss=0.3058, pruned_loss=0.08022, ctc_loss=0.1634, cr_loss=0.4371, over 33128.00 frames. ], tot_loss[loss=0.2594, simple_loss=0.2969, pruned_loss=0.08531, ctc_loss=0.1692, cr_loss=0.4371, over 6746664.23 frames. ], batch size: 130, lr: 1.11e-02, grad_scale: 32.0 2024-09-17 12:44:15,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=187012.0, ans=0.0 2024-09-17 12:44:17,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_positive, batch_count=187012.0, ans=0.05 2024-09-17 12:44:44,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.67 vs. limit=15.0 2024-09-17 12:44:51,590 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.133e+02 2.710e+02 3.544e+02 4.702e+02 9.461e+02, threshold=7.087e+02, percent-clipped=4.0 2024-09-17 12:44:51,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=187105.33333333334, ans=0.125 2024-09-17 12:44:53,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=187105.33333333334, ans=0.0 2024-09-17 12:45:03,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=187105.33333333334, ans=0.0 2024-09-17 12:45:06,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=187152.0, ans=0.125 2024-09-17 12:45:17,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.61 vs. limit=15.0 2024-09-17 12:45:23,050 INFO [train.py:1198] (1/2) Epoch 11, batch 1350, loss[loss=0.2609, simple_loss=0.2974, pruned_loss=0.08667, ctc_loss=0.1707, cr_loss=0.4228, over 34519.00 frames. ], tot_loss[loss=0.259, simple_loss=0.2966, pruned_loss=0.08512, ctc_loss=0.1687, cr_loss=0.4364, over 6767550.66 frames. ], batch size: 94, lr: 1.11e-02, grad_scale: 16.0 2024-09-17 12:45:32,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=187198.66666666666, ans=0.125 2024-09-17 12:45:35,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.90 vs. limit=12.0 2024-09-17 12:45:56,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=187292.0, ans=0.125 2024-09-17 12:46:01,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=187292.0, ans=0.125 2024-09-17 12:46:32,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=187385.33333333334, ans=0.0 2024-09-17 12:46:34,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=187385.33333333334, ans=0.125 2024-09-17 12:46:45,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=187385.33333333334, ans=0.2 2024-09-17 12:46:48,852 INFO [train.py:1198] (1/2) Epoch 11, batch 1400, loss[loss=0.2221, simple_loss=0.2588, pruned_loss=0.07071, ctc_loss=0.142, cr_loss=0.3872, over 34304.00 frames. ], tot_loss[loss=0.2588, simple_loss=0.2964, pruned_loss=0.08509, ctc_loss=0.1685, cr_loss=0.4359, over 6779818.49 frames. ], batch size: 80, lr: 1.11e-02, grad_scale: 16.0 2024-09-17 12:46:50,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=187432.0, ans=0.125 2024-09-17 12:47:00,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=187432.0, ans=0.125 2024-09-17 12:47:00,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187432.0, ans=0.1 2024-09-17 12:47:02,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=187432.0, ans=0.125 2024-09-17 12:47:40,318 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.470e+02 2.818e+02 3.642e+02 6.541e+02, threshold=5.637e+02, percent-clipped=0.0 2024-09-17 12:47:44,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.03 vs. limit=15.0 2024-09-17 12:47:57,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=187618.66666666666, ans=0.125 2024-09-17 12:48:11,761 INFO [train.py:1198] (1/2) Epoch 11, batch 1450, loss[loss=0.293, simple_loss=0.3261, pruned_loss=0.1013, ctc_loss=0.1937, cr_loss=0.4649, over 34447.00 frames. ], tot_loss[loss=0.2598, simple_loss=0.2971, pruned_loss=0.08552, ctc_loss=0.1692, cr_loss=0.4369, over 6775373.41 frames. ], batch size: 110, lr: 1.11e-02, grad_scale: 16.0 2024-09-17 12:48:33,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=187712.0, ans=0.0 2024-09-17 12:49:04,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=187805.33333333334, ans=0.0 2024-09-17 12:49:06,517 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:49:08,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2024-09-17 12:49:35,896 INFO [train.py:1198] (1/2) Epoch 11, batch 1500, loss[loss=0.2862, simple_loss=0.3174, pruned_loss=0.09766, ctc_loss=0.1937, cr_loss=0.5231, over 34465.00 frames. ], tot_loss[loss=0.2601, simple_loss=0.2974, pruned_loss=0.08568, ctc_loss=0.1696, cr_loss=0.4375, over 6775488.46 frames. ], batch size: 100, lr: 1.11e-02, grad_scale: 16.0 2024-09-17 12:49:39,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=187898.66666666666, ans=0.125 2024-09-17 12:49:56,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=187945.33333333334, ans=0.125 2024-09-17 12:50:12,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187992.0, ans=0.1 2024-09-17 12:50:29,215 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.110e+02 2.536e+02 2.924e+02 3.754e+02 6.301e+02, threshold=5.848e+02, percent-clipped=2.0 2024-09-17 12:50:54,378 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 12:50:59,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=188132.0, ans=0.2 2024-09-17 12:51:00,469 INFO [train.py:1198] (1/2) Epoch 11, batch 1550, loss[loss=0.2677, simple_loss=0.3029, pruned_loss=0.0902, ctc_loss=0.1722, cr_loss=0.4418, over 34454.00 frames. ], tot_loss[loss=0.2601, simple_loss=0.2971, pruned_loss=0.08578, ctc_loss=0.1697, cr_loss=0.4375, over 6746145.13 frames. ], batch size: 105, lr: 1.11e-02, grad_scale: 16.0 2024-09-17 12:51:04,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=188132.0, ans=0.0 2024-09-17 12:51:18,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=188178.66666666666, ans=0.125 2024-09-17 12:51:33,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=188225.33333333334, ans=0.0 2024-09-17 12:51:48,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=188272.0, ans=0.125 2024-09-17 12:51:53,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=188272.0, ans=0.1 2024-09-17 12:51:58,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=188272.0, ans=0.125 2024-09-17 12:52:03,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=188272.0, ans=0.125 2024-09-17 12:52:23,066 INFO [train.py:1198] (1/2) Epoch 11, batch 1600, loss[loss=0.2575, simple_loss=0.2998, pruned_loss=0.08206, ctc_loss=0.1671, cr_loss=0.4418, over 34588.00 frames. ], tot_loss[loss=0.2602, simple_loss=0.2972, pruned_loss=0.0859, ctc_loss=0.1699, cr_loss=0.437, over 6724644.15 frames. ], batch size: 99, lr: 1.11e-02, grad_scale: 32.0 2024-09-17 12:52:26,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=188365.33333333334, ans=0.1 2024-09-17 12:52:27,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.26 vs. limit=15.0 2024-09-17 12:52:56,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=188458.66666666666, ans=0.1 2024-09-17 12:53:00,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=188458.66666666666, ans=0.0 2024-09-17 12:53:01,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=188458.66666666666, ans=0.1 2024-09-17 12:53:03,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=188458.66666666666, ans=0.0 2024-09-17 12:53:16,179 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.132e+02 2.874e+02 3.627e+02 4.637e+02 7.993e+02, threshold=7.254e+02, percent-clipped=11.0 2024-09-17 12:53:47,810 INFO [train.py:1198] (1/2) Epoch 11, batch 1650, loss[loss=0.2689, simple_loss=0.3092, pruned_loss=0.08761, ctc_loss=0.1732, cr_loss=0.4672, over 34343.00 frames. ], tot_loss[loss=0.2598, simple_loss=0.2969, pruned_loss=0.08567, ctc_loss=0.1696, cr_loss=0.4365, over 6718767.95 frames. ], batch size: 103, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 12:53:50,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=188598.66666666666, ans=0.125 2024-09-17 12:55:10,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=188832.0, ans=0.125 2024-09-17 12:55:11,910 INFO [train.py:1198] (1/2) Epoch 11, batch 1700, loss[loss=0.2085, simple_loss=0.249, pruned_loss=0.06345, ctc_loss=0.1303, cr_loss=0.3759, over 34294.00 frames. ], tot_loss[loss=0.259, simple_loss=0.2965, pruned_loss=0.08519, ctc_loss=0.1687, cr_loss=0.4359, over 6743984.16 frames. ], batch size: 80, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 12:55:53,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=188925.33333333334, ans=0.125 2024-09-17 12:56:02,794 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.573e+02 3.206e+02 3.951e+02 6.984e+02, threshold=6.412e+02, percent-clipped=0.0 2024-09-17 12:56:05,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2024-09-17 12:56:34,083 INFO [train.py:1198] (1/2) Epoch 11, batch 1750, loss[loss=0.224, simple_loss=0.26, pruned_loss=0.07163, ctc_loss=0.1434, cr_loss=0.4006, over 34173.00 frames. ], tot_loss[loss=0.2585, simple_loss=0.2959, pruned_loss=0.08494, ctc_loss=0.1683, cr_loss=0.436, over 6752534.68 frames. ], batch size: 78, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 12:56:56,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=189112.0, ans=0.0 2024-09-17 12:57:02,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=189112.0, ans=0.125 2024-09-17 12:57:20,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=189158.66666666666, ans=0.2 2024-09-17 12:57:30,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=189205.33333333334, ans=0.125 2024-09-17 12:57:37,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=189205.33333333334, ans=0.0 2024-09-17 12:57:59,270 INFO [train.py:1198] (1/2) Epoch 11, batch 1800, loss[loss=0.258, simple_loss=0.301, pruned_loss=0.08235, ctc_loss=0.1641, cr_loss=0.4366, over 34692.00 frames. ], tot_loss[loss=0.2587, simple_loss=0.2963, pruned_loss=0.08494, ctc_loss=0.1684, cr_loss=0.4365, over 6755572.26 frames. ], batch size: 97, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 12:58:29,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=189345.33333333334, ans=0.0 2024-09-17 12:58:37,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=189392.0, ans=0.0 2024-09-17 12:58:42,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=189392.0, ans=0.125 2024-09-17 12:58:49,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=189438.66666666666, ans=0.1 2024-09-17 12:58:50,567 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.603e+02 3.394e+02 4.875e+02 8.027e+02, threshold=6.787e+02, percent-clipped=7.0 2024-09-17 12:59:03,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189438.66666666666, ans=0.1 2024-09-17 12:59:03,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=189438.66666666666, ans=0.0 2024-09-17 12:59:03,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=189438.66666666666, ans=0.125 2024-09-17 12:59:16,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=189485.33333333334, ans=0.125 2024-09-17 12:59:22,486 INFO [train.py:1198] (1/2) Epoch 11, batch 1850, loss[loss=0.2663, simple_loss=0.3043, pruned_loss=0.08897, ctc_loss=0.1704, cr_loss=0.4071, over 34463.00 frames. ], tot_loss[loss=0.2582, simple_loss=0.2959, pruned_loss=0.08476, ctc_loss=0.168, cr_loss=0.4353, over 6763858.95 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 12:59:40,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=189578.66666666666, ans=0.0 2024-09-17 12:59:47,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=189578.66666666666, ans=0.0 2024-09-17 12:59:51,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=189578.66666666666, ans=0.125 2024-09-17 13:00:07,716 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:00:18,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=189672.0, ans=0.0 2024-09-17 13:00:45,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189765.33333333334, ans=0.1 2024-09-17 13:00:46,933 INFO [train.py:1198] (1/2) Epoch 11, batch 1900, loss[loss=0.2754, simple_loss=0.3184, pruned_loss=0.08887, ctc_loss=0.1797, cr_loss=0.4684, over 34361.00 frames. ], tot_loss[loss=0.2588, simple_loss=0.2966, pruned_loss=0.08495, ctc_loss=0.1685, cr_loss=0.4363, over 6772949.41 frames. ], batch size: 103, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 13:00:47,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=189765.33333333334, ans=0.0 2024-09-17 13:00:52,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=189765.33333333334, ans=0.0 2024-09-17 13:00:53,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=189765.33333333334, ans=0.125 2024-09-17 13:01:19,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.68 vs. limit=22.5 2024-09-17 13:01:39,832 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.595e+02 2.994e+02 4.126e+02 9.055e+02, threshold=5.988e+02, percent-clipped=2.0 2024-09-17 13:01:40,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=189905.33333333334, ans=0.125 2024-09-17 13:01:57,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.28 vs. limit=15.0 2024-09-17 13:02:01,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=189952.0, ans=0.125 2024-09-17 13:02:09,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=189998.66666666666, ans=0.1 2024-09-17 13:02:11,076 INFO [train.py:1198] (1/2) Epoch 11, batch 1950, loss[loss=0.2507, simple_loss=0.2884, pruned_loss=0.08161, ctc_loss=0.1629, cr_loss=0.4278, over 34749.00 frames. ], tot_loss[loss=0.2595, simple_loss=0.2976, pruned_loss=0.0851, ctc_loss=0.1688, cr_loss=0.4376, over 6790482.89 frames. ], batch size: 92, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 13:02:19,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=189998.66666666666, ans=0.0 2024-09-17 13:02:50,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=190092.0, ans=22.5 2024-09-17 13:03:08,819 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:03:25,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=190185.33333333334, ans=0.125 2024-09-17 13:03:33,456 INFO [train.py:1198] (1/2) Epoch 11, batch 2000, loss[loss=0.2267, simple_loss=0.2644, pruned_loss=0.07237, ctc_loss=0.1462, cr_loss=0.3735, over 34206.00 frames. ], tot_loss[loss=0.2604, simple_loss=0.2982, pruned_loss=0.08558, ctc_loss=0.1697, cr_loss=0.4384, over 6765748.68 frames. ], batch size: 78, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 13:03:34,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.82 vs. limit=15.0 2024-09-17 13:03:38,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=190232.0, ans=0.125 2024-09-17 13:03:53,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=190278.66666666666, ans=0.0 2024-09-17 13:04:00,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=190278.66666666666, ans=0.125 2024-09-17 13:04:02,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=190278.66666666666, ans=0.0 2024-09-17 13:04:15,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=15.0 2024-09-17 13:04:22,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=190325.33333333334, ans=0.025 2024-09-17 13:04:27,002 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.467e+02 3.045e+02 3.953e+02 7.088e+02, threshold=6.089e+02, percent-clipped=5.0 2024-09-17 13:04:42,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=190418.66666666666, ans=0.125 2024-09-17 13:04:44,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=190418.66666666666, ans=0.09899494936611666 2024-09-17 13:04:58,749 INFO [train.py:1198] (1/2) Epoch 11, batch 2050, loss[loss=0.2237, simple_loss=0.2655, pruned_loss=0.069, ctc_loss=0.1433, cr_loss=0.3793, over 34495.00 frames. ], tot_loss[loss=0.2598, simple_loss=0.2972, pruned_loss=0.08545, ctc_loss=0.1694, cr_loss=0.4376, over 6757325.08 frames. ], batch size: 82, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 13:05:22,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=190512.0, ans=0.0 2024-09-17 13:05:24,803 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.60 vs. limit=15.0 2024-09-17 13:05:28,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2024-09-17 13:05:32,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=190558.66666666666, ans=0.2 2024-09-17 13:05:34,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=190558.66666666666, ans=0.0 2024-09-17 13:05:52,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=190605.33333333334, ans=0.0 2024-09-17 13:06:04,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.33 vs. limit=22.5 2024-09-17 13:06:07,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-09-17 13:06:10,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=190652.0, ans=0.125 2024-09-17 13:06:10,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=190652.0, ans=0.125 2024-09-17 13:06:17,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.54 vs. limit=15.0 2024-09-17 13:06:23,401 INFO [train.py:1198] (1/2) Epoch 11, batch 2100, loss[loss=0.2627, simple_loss=0.3037, pruned_loss=0.08518, ctc_loss=0.1692, cr_loss=0.4361, over 34537.00 frames. ], tot_loss[loss=0.2589, simple_loss=0.2966, pruned_loss=0.08504, ctc_loss=0.1686, cr_loss=0.436, over 6771053.91 frames. ], batch size: 94, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 13:06:41,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=190745.33333333334, ans=0.05 2024-09-17 13:07:02,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=15.0 2024-09-17 13:07:14,090 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.041e+02 2.816e+02 3.442e+02 4.673e+02 8.106e+02, threshold=6.883e+02, percent-clipped=7.0 2024-09-17 13:07:35,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=190885.33333333334, ans=0.04949747468305833 2024-09-17 13:07:42,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=190885.33333333334, ans=0.1 2024-09-17 13:07:45,625 INFO [train.py:1198] (1/2) Epoch 11, batch 2150, loss[loss=0.2397, simple_loss=0.2827, pruned_loss=0.075, ctc_loss=0.1519, cr_loss=0.4112, over 34347.00 frames. ], tot_loss[loss=0.2581, simple_loss=0.2959, pruned_loss=0.08466, ctc_loss=0.168, cr_loss=0.4349, over 6789355.32 frames. ], batch size: 91, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 13:08:49,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=191072.0, ans=0.5 2024-09-17 13:09:05,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=191118.66666666666, ans=0.2 2024-09-17 13:09:11,881 INFO [train.py:1198] (1/2) Epoch 11, batch 2200, loss[loss=0.2766, simple_loss=0.3166, pruned_loss=0.09098, ctc_loss=0.1798, cr_loss=0.4655, over 34432.00 frames. ], tot_loss[loss=0.2585, simple_loss=0.2963, pruned_loss=0.08482, ctc_loss=0.1682, cr_loss=0.4354, over 6784265.38 frames. ], batch size: 100, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 13:09:18,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=191165.33333333334, ans=0.125 2024-09-17 13:09:22,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=191165.33333333334, ans=0.0 2024-09-17 13:09:45,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=191258.66666666666, ans=0.125 2024-09-17 13:10:02,746 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.076e+02 2.540e+02 3.178e+02 4.278e+02 7.856e+02, threshold=6.355e+02, percent-clipped=2.0 2024-09-17 13:10:31,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=191352.0, ans=0.0 2024-09-17 13:10:31,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=191352.0, ans=0.025 2024-09-17 13:10:34,661 INFO [train.py:1198] (1/2) Epoch 11, batch 2250, loss[loss=0.2743, simple_loss=0.3127, pruned_loss=0.09126, ctc_loss=0.179, cr_loss=0.4387, over 34398.00 frames. ], tot_loss[loss=0.2576, simple_loss=0.2957, pruned_loss=0.08437, ctc_loss=0.1675, cr_loss=0.4343, over 6781708.34 frames. ], batch size: 95, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 13:10:50,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-17 13:11:14,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=191492.0, ans=0.125 2024-09-17 13:11:14,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=191492.0, ans=0.0 2024-09-17 13:11:29,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=191538.66666666666, ans=0.0 2024-09-17 13:11:37,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=191538.66666666666, ans=0.2 2024-09-17 13:11:55,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=191632.0, ans=0.0 2024-09-17 13:11:57,016 INFO [train.py:1198] (1/2) Epoch 11, batch 2300, loss[loss=0.2323, simple_loss=0.2697, pruned_loss=0.07485, ctc_loss=0.1492, cr_loss=0.3842, over 34276.00 frames. ], tot_loss[loss=0.2564, simple_loss=0.2943, pruned_loss=0.08388, ctc_loss=0.1667, cr_loss=0.4321, over 6767810.05 frames. ], batch size: 83, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 13:12:02,921 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.73 vs. limit=15.0 2024-09-17 13:12:35,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=191725.33333333334, ans=0.0 2024-09-17 13:12:40,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=191725.33333333334, ans=0.1 2024-09-17 13:12:48,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=191772.0, ans=0.1 2024-09-17 13:12:50,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=191772.0, ans=0.09899494936611666 2024-09-17 13:12:50,706 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:12:51,722 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.057e+02 2.657e+02 3.742e+02 5.024e+02 7.188e+02, threshold=7.485e+02, percent-clipped=5.0 2024-09-17 13:13:02,347 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.83 vs. limit=22.5 2024-09-17 13:13:22,895 INFO [train.py:1198] (1/2) Epoch 11, batch 2350, loss[loss=0.2797, simple_loss=0.3138, pruned_loss=0.09557, ctc_loss=0.1837, cr_loss=0.4427, over 34682.00 frames. ], tot_loss[loss=0.2571, simple_loss=0.2949, pruned_loss=0.08425, ctc_loss=0.1674, cr_loss=0.4338, over 6773326.12 frames. ], batch size: 97, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 13:13:29,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=191865.33333333334, ans=15.0 2024-09-17 13:13:30,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.23 vs. limit=12.0 2024-09-17 13:13:32,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=191865.33333333334, ans=0.125 2024-09-17 13:13:51,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=191912.0, ans=0.2 2024-09-17 13:14:20,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=192005.33333333334, ans=0.0 2024-09-17 13:14:23,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192005.33333333334, ans=0.1 2024-09-17 13:14:45,274 INFO [train.py:1198] (1/2) Epoch 11, batch 2400, loss[loss=0.25, simple_loss=0.283, pruned_loss=0.08384, ctc_loss=0.1613, cr_loss=0.428, over 34569.00 frames. ], tot_loss[loss=0.2576, simple_loss=0.2954, pruned_loss=0.08441, ctc_loss=0.1675, cr_loss=0.4352, over 6778439.38 frames. ], batch size: 89, lr: 1.10e-02, grad_scale: 32.0 2024-09-17 13:14:52,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.59 vs. limit=10.0 2024-09-17 13:15:14,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=192145.33333333334, ans=0.125 2024-09-17 13:15:36,713 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.957e+02 2.516e+02 3.081e+02 3.942e+02 8.607e+02, threshold=6.162e+02, percent-clipped=1.0 2024-09-17 13:16:03,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=192285.33333333334, ans=0.0 2024-09-17 13:16:11,206 INFO [train.py:1198] (1/2) Epoch 11, batch 2450, loss[loss=0.2642, simple_loss=0.3054, pruned_loss=0.08591, ctc_loss=0.1717, cr_loss=0.4198, over 34415.00 frames. ], tot_loss[loss=0.2589, simple_loss=0.2967, pruned_loss=0.08496, ctc_loss=0.1687, cr_loss=0.4364, over 6752035.10 frames. ], batch size: 95, lr: 1.09e-02, grad_scale: 32.0 2024-09-17 13:16:13,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=192332.0, ans=0.125 2024-09-17 13:16:24,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=192332.0, ans=0.0 2024-09-17 13:16:26,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_positive, batch_count=192332.0, ans=0.05 2024-09-17 13:16:39,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=192378.66666666666, ans=0.0 2024-09-17 13:16:54,394 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:17:17,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=192518.66666666666, ans=0.125 2024-09-17 13:17:25,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=192518.66666666666, ans=0.125 2024-09-17 13:17:31,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=15.0 2024-09-17 13:17:35,258 INFO [train.py:1198] (1/2) Epoch 11, batch 2500, loss[loss=0.284, simple_loss=0.3206, pruned_loss=0.09588, ctc_loss=0.1887, cr_loss=0.4492, over 34450.00 frames. ], tot_loss[loss=0.2592, simple_loss=0.2968, pruned_loss=0.08512, ctc_loss=0.1689, cr_loss=0.4375, over 6762533.64 frames. ], batch size: 100, lr: 1.09e-02, grad_scale: 32.0 2024-09-17 13:17:57,568 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.14 vs. limit=22.5 2024-09-17 13:18:13,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=22.5 2024-09-17 13:18:26,368 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.420e+02 2.773e+02 3.276e+02 5.684e+02, threshold=5.546e+02, percent-clipped=0.0 2024-09-17 13:18:37,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.46 vs. limit=15.0 2024-09-17 13:18:38,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=192705.33333333334, ans=0.125 2024-09-17 13:18:39,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=192752.0, ans=0.025 2024-09-17 13:18:58,035 INFO [train.py:1198] (1/2) Epoch 11, batch 2550, loss[loss=0.2171, simple_loss=0.2629, pruned_loss=0.06493, ctc_loss=0.1338, cr_loss=0.368, over 34149.00 frames. ], tot_loss[loss=0.2587, simple_loss=0.2966, pruned_loss=0.08478, ctc_loss=0.1683, cr_loss=0.437, over 6765835.90 frames. ], batch size: 78, lr: 1.09e-02, grad_scale: 32.0 2024-09-17 13:19:02,067 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.44 vs. limit=22.5 2024-09-17 13:19:08,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=192798.66666666666, ans=0.125 2024-09-17 13:19:19,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192845.33333333334, ans=0.1 2024-09-17 13:19:28,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2024-09-17 13:19:35,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=192892.0, ans=0.125 2024-09-17 13:19:39,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=192892.0, ans=0.1 2024-09-17 13:19:41,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.67 vs. limit=15.0 2024-09-17 13:19:51,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2024-09-17 13:20:19,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=192985.33333333334, ans=0.035 2024-09-17 13:20:24,150 INFO [train.py:1198] (1/2) Epoch 11, batch 2600, loss[loss=0.2559, simple_loss=0.294, pruned_loss=0.08447, ctc_loss=0.1641, cr_loss=0.4029, over 34374.00 frames. ], tot_loss[loss=0.2591, simple_loss=0.2969, pruned_loss=0.08502, ctc_loss=0.1687, cr_loss=0.4374, over 6761291.04 frames. ], batch size: 91, lr: 1.09e-02, grad_scale: 16.0 2024-09-17 13:20:24,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=193032.0, ans=0.2 2024-09-17 13:20:30,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193032.0, ans=0.1 2024-09-17 13:20:45,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=193078.66666666666, ans=0.0 2024-09-17 13:21:07,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=193125.33333333334, ans=0.1 2024-09-17 13:21:10,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=193125.33333333334, ans=0.125 2024-09-17 13:21:10,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=193125.33333333334, ans=0.125 2024-09-17 13:21:16,688 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 2.595e+02 3.034e+02 3.675e+02 7.280e+02, threshold=6.067e+02, percent-clipped=3.0 2024-09-17 13:21:43,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193218.66666666666, ans=0.1 2024-09-17 13:21:45,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=193265.33333333334, ans=0.0 2024-09-17 13:21:46,288 INFO [train.py:1198] (1/2) Epoch 11, batch 2650, loss[loss=0.2766, simple_loss=0.3137, pruned_loss=0.09217, ctc_loss=0.1824, cr_loss=0.4696, over 34246.00 frames. ], tot_loss[loss=0.2592, simple_loss=0.2972, pruned_loss=0.08496, ctc_loss=0.1687, cr_loss=0.4378, over 6768142.03 frames. ], batch size: 117, lr: 1.09e-02, grad_scale: 16.0 2024-09-17 13:22:44,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=193405.33333333334, ans=0.125 2024-09-17 13:23:08,866 INFO [train.py:1198] (1/2) Epoch 11, batch 2700, loss[loss=0.2687, simple_loss=0.3081, pruned_loss=0.08738, ctc_loss=0.1772, cr_loss=0.4805, over 34603.00 frames. ], tot_loss[loss=0.2594, simple_loss=0.2975, pruned_loss=0.08504, ctc_loss=0.1688, cr_loss=0.4384, over 6762236.26 frames. ], batch size: 102, lr: 1.09e-02, grad_scale: 16.0 2024-09-17 13:23:19,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=193498.66666666666, ans=0.025 2024-09-17 13:23:24,040 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 13:23:34,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=193545.33333333334, ans=0.0 2024-09-17 13:23:39,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=193545.33333333334, ans=0.125 2024-09-17 13:23:59,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=193592.0, ans=0.0 2024-09-17 13:24:06,015 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.542e+02 2.943e+02 3.562e+02 6.223e+02, threshold=5.885e+02, percent-clipped=1.0 2024-09-17 13:24:35,890 INFO [train.py:1198] (1/2) Epoch 11, batch 2750, loss[loss=0.2533, simple_loss=0.288, pruned_loss=0.08478, ctc_loss=0.1643, cr_loss=0.4048, over 34630.00 frames. ], tot_loss[loss=0.2581, simple_loss=0.2962, pruned_loss=0.0845, ctc_loss=0.1678, cr_loss=0.4363, over 6759869.79 frames. ], batch size: 88, lr: 1.09e-02, grad_scale: 16.0 2024-09-17 13:24:43,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.36 vs. limit=15.0 2024-09-17 13:24:47,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=193732.0, ans=0.125 2024-09-17 13:25:02,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=193778.66666666666, ans=0.125 2024-09-17 13:25:04,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=193778.66666666666, ans=0.125 2024-09-17 13:25:15,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=193825.33333333334, ans=0.0 2024-09-17 13:25:17,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=193825.33333333334, ans=0.05 2024-09-17 13:25:19,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=193825.33333333334, ans=0.5 2024-09-17 13:25:24,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=193872.0, ans=0.125 2024-09-17 13:25:25,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=193872.0, ans=0.125 2024-09-17 13:25:37,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=193872.0, ans=0.2 2024-09-17 13:25:58,914 INFO [train.py:1198] (1/2) Epoch 11, batch 2800, loss[loss=0.3206, simple_loss=0.3318, pruned_loss=0.1206, ctc_loss=0.2439, cr_loss=0.4854, over 23800.00 frames. ], tot_loss[loss=0.2587, simple_loss=0.2965, pruned_loss=0.08487, ctc_loss=0.1683, cr_loss=0.4365, over 6739220.71 frames. ], batch size: 244, lr: 1.09e-02, grad_scale: 32.0 2024-09-17 13:26:20,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=194012.0, ans=0.125 2024-09-17 13:26:50,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=194105.33333333334, ans=0.125 2024-09-17 13:26:51,538 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.082e+02 2.596e+02 3.244e+02 3.941e+02 6.811e+02, threshold=6.487e+02, percent-clipped=3.0 2024-09-17 13:26:54,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.92 vs. limit=15.0 2024-09-17 13:26:55,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=194105.33333333334, ans=0.125 2024-09-17 13:26:57,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=194105.33333333334, ans=0.0 2024-09-17 13:26:57,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.58 vs. limit=22.5 2024-09-17 13:27:08,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=194152.0, ans=0.1 2024-09-17 13:27:25,567 INFO [train.py:1198] (1/2) Epoch 11, batch 2850, loss[loss=0.2582, simple_loss=0.2956, pruned_loss=0.08479, ctc_loss=0.1636, cr_loss=0.4635, over 34478.00 frames. ], tot_loss[loss=0.2595, simple_loss=0.2972, pruned_loss=0.08528, ctc_loss=0.1691, cr_loss=0.4374, over 6723874.17 frames. ], batch size: 90, lr: 1.09e-02, grad_scale: 32.0 2024-09-17 13:27:39,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=194198.66666666666, ans=0.125 2024-09-17 13:27:45,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=194245.33333333334, ans=0.125 2024-09-17 13:28:01,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2024-09-17 13:28:15,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.63 vs. limit=22.5 2024-09-17 13:28:31,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=194385.33333333334, ans=0.125 2024-09-17 13:28:33,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=194385.33333333334, ans=0.0 2024-09-17 13:28:40,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=194385.33333333334, ans=0.5 2024-09-17 13:28:48,174 INFO [train.py:1198] (1/2) Epoch 11, batch 2900, loss[loss=0.2572, simple_loss=0.2957, pruned_loss=0.08389, ctc_loss=0.1694, cr_loss=0.4261, over 34540.00 frames. ], tot_loss[loss=0.2603, simple_loss=0.2981, pruned_loss=0.08549, ctc_loss=0.1695, cr_loss=0.4394, over 6754680.84 frames. ], batch size: 94, lr: 1.09e-02, grad_scale: 16.0 2024-09-17 13:29:10,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=194478.66666666666, ans=0.025 2024-09-17 13:29:28,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=194525.33333333334, ans=0.0 2024-09-17 13:29:42,713 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.536e+02 2.970e+02 3.819e+02 6.843e+02, threshold=5.940e+02, percent-clipped=1.0 2024-09-17 13:29:48,957 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.59 vs. limit=12.0 2024-09-17 13:30:11,239 INFO [train.py:1198] (1/2) Epoch 11, batch 2950, loss[loss=0.2275, simple_loss=0.2664, pruned_loss=0.07174, ctc_loss=0.1443, cr_loss=0.4052, over 34641.00 frames. ], tot_loss[loss=0.2585, simple_loss=0.2964, pruned_loss=0.0847, ctc_loss=0.168, cr_loss=0.4371, over 6749739.62 frames. ], batch size: 88, lr: 1.09e-02, grad_scale: 16.0 2024-09-17 13:30:58,616 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.70 vs. limit=10.0 2024-09-17 13:31:02,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.80 vs. limit=15.0 2024-09-17 13:31:23,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=194852.0, ans=0.125 2024-09-17 13:31:30,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.95 vs. limit=22.5 2024-09-17 13:31:36,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=194898.66666666666, ans=0.0 2024-09-17 13:31:38,193 INFO [train.py:1198] (1/2) Epoch 11, batch 3000, loss[loss=0.2513, simple_loss=0.2945, pruned_loss=0.07969, ctc_loss=0.1593, cr_loss=0.421, over 34538.00 frames. ], tot_loss[loss=0.2579, simple_loss=0.2961, pruned_loss=0.08442, ctc_loss=0.1677, cr_loss=0.4361, over 6750302.33 frames. ], batch size: 94, lr: 1.09e-02, grad_scale: 16.0 2024-09-17 13:31:38,193 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 13:31:55,117 INFO [train.py:1230] (1/2) Epoch 11, validation: loss=0.1539, simple_loss=0.2526, pruned_loss=0.0229, ctc_loss=0.04709, cr_loss=1.536e-14, over 944034.00 frames. 2024-09-17 13:31:55,117 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 13:32:00,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=194898.66666666666, ans=0.0 2024-09-17 13:32:05,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=194898.66666666666, ans=0.125 2024-09-17 13:32:07,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=194898.66666666666, ans=0.2 2024-09-17 13:32:31,831 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.69 vs. limit=15.0 2024-09-17 13:32:32,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=194992.0, ans=0.1 2024-09-17 13:32:44,969 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2024-09-17 13:32:48,836 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.112e+02 2.524e+02 2.908e+02 3.515e+02 6.797e+02, threshold=5.815e+02, percent-clipped=3.0 2024-09-17 13:33:00,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.78 vs. limit=10.0 2024-09-17 13:33:12,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=195085.33333333334, ans=0.025 2024-09-17 13:33:17,015 INFO [train.py:1198] (1/2) Epoch 11, batch 3050, loss[loss=0.2316, simple_loss=0.2749, pruned_loss=0.07239, ctc_loss=0.1431, cr_loss=0.3716, over 34594.00 frames. ], tot_loss[loss=0.2588, simple_loss=0.2968, pruned_loss=0.08482, ctc_loss=0.1685, cr_loss=0.4379, over 6741301.72 frames. ], batch size: 89, lr: 1.09e-02, grad_scale: 16.0 2024-09-17 13:34:09,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=195272.0, ans=0.0 2024-09-17 13:34:37,709 INFO [train.py:1198] (1/2) Epoch 11, batch 3100, loss[loss=0.2977, simple_loss=0.3326, pruned_loss=0.101, ctc_loss=0.2016, cr_loss=0.5093, over 34276.00 frames. ], tot_loss[loss=0.2586, simple_loss=0.2965, pruned_loss=0.08476, ctc_loss=0.1683, cr_loss=0.4375, over 6741926.42 frames. ], batch size: 117, lr: 1.09e-02, grad_scale: 16.0 2024-09-17 13:34:42,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=195365.33333333334, ans=0.125 2024-09-17 13:34:59,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=195412.0, ans=0.125 2024-09-17 13:35:13,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=22.5 2024-09-17 13:35:18,924 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-09-17 13:35:24,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=195505.33333333334, ans=0.0 2024-09-17 13:35:28,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=195505.33333333334, ans=0.0 2024-09-17 13:35:31,206 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.137e+02 2.530e+02 3.135e+02 4.100e+02 7.996e+02, threshold=6.270e+02, percent-clipped=6.0 2024-09-17 13:35:34,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=195505.33333333334, ans=0.125 2024-09-17 13:35:36,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2024-09-17 13:35:54,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=195552.0, ans=0.0 2024-09-17 13:35:58,955 INFO [train.py:1198] (1/2) Epoch 11, batch 3150, loss[loss=0.266, simple_loss=0.3033, pruned_loss=0.08802, ctc_loss=0.1727, cr_loss=0.4516, over 33845.00 frames. ], tot_loss[loss=0.2581, simple_loss=0.296, pruned_loss=0.08453, ctc_loss=0.1679, cr_loss=0.4374, over 6748640.69 frames. ], batch size: 122, lr: 1.09e-02, grad_scale: 16.0 2024-09-17 13:36:04,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=195598.66666666666, ans=0.125 2024-09-17 13:36:12,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=195598.66666666666, ans=10.0 2024-09-17 13:36:15,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=195645.33333333334, ans=0.025 2024-09-17 13:37:07,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=195785.33333333334, ans=0.025 2024-09-17 13:37:23,283 INFO [train.py:1198] (1/2) Epoch 11, batch 3200, loss[loss=0.2682, simple_loss=0.3039, pruned_loss=0.08905, ctc_loss=0.1766, cr_loss=0.4791, over 34528.00 frames. ], tot_loss[loss=0.2577, simple_loss=0.2957, pruned_loss=0.08437, ctc_loss=0.1677, cr_loss=0.4373, over 6762453.54 frames. ], batch size: 94, lr: 1.08e-02, grad_scale: 32.0 2024-09-17 13:37:23,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=195832.0, ans=0.125 2024-09-17 13:37:43,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=195878.66666666666, ans=0.125 2024-09-17 13:38:01,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.83 vs. limit=15.0 2024-09-17 13:38:14,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=22.5 2024-09-17 13:38:16,887 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.492e+02 3.062e+02 3.878e+02 6.291e+02, threshold=6.124e+02, percent-clipped=1.0 2024-09-17 13:38:17,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=195972.0, ans=0.125 2024-09-17 13:38:18,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=195972.0, ans=0.2 2024-09-17 13:38:35,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=196018.66666666666, ans=0.0 2024-09-17 13:38:44,482 INFO [train.py:1198] (1/2) Epoch 11, batch 3250, loss[loss=0.2734, simple_loss=0.3129, pruned_loss=0.09064, ctc_loss=0.1727, cr_loss=0.4528, over 34661.00 frames. ], tot_loss[loss=0.2585, simple_loss=0.2965, pruned_loss=0.08464, ctc_loss=0.1681, cr_loss=0.4387, over 6771622.58 frames. ], batch size: 98, lr: 1.08e-02, grad_scale: 32.0 2024-09-17 13:38:45,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.99 vs. limit=15.0 2024-09-17 13:39:40,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=196205.33333333334, ans=0.125 2024-09-17 13:39:47,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=196252.0, ans=0.125 2024-09-17 13:40:04,941 INFO [train.py:1198] (1/2) Epoch 11, batch 3300, loss[loss=0.2781, simple_loss=0.3183, pruned_loss=0.09163, ctc_loss=0.1815, cr_loss=0.4588, over 32959.00 frames. ], tot_loss[loss=0.2571, simple_loss=0.2952, pruned_loss=0.0841, ctc_loss=0.1671, cr_loss=0.4362, over 6768993.43 frames. ], batch size: 130, lr: 1.08e-02, grad_scale: 32.0 2024-09-17 13:40:13,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=196298.66666666666, ans=0.125 2024-09-17 13:40:21,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=196345.33333333334, ans=10.0 2024-09-17 13:40:42,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.27 vs. limit=22.5 2024-09-17 13:40:43,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.93 vs. limit=22.5 2024-09-17 13:40:58,241 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.861e+02 3.513e+02 4.710e+02 6.249e+02, threshold=7.025e+02, percent-clipped=1.0 2024-09-17 13:41:24,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=196532.0, ans=10.0 2024-09-17 13:41:25,801 INFO [train.py:1198] (1/2) Epoch 11, batch 3350, loss[loss=0.2751, simple_loss=0.3172, pruned_loss=0.08895, ctc_loss=0.1824, cr_loss=0.4632, over 33821.00 frames. ], tot_loss[loss=0.2586, simple_loss=0.2962, pruned_loss=0.08488, ctc_loss=0.1685, cr_loss=0.4376, over 6742663.05 frames. ], batch size: 122, lr: 1.08e-02, grad_scale: 32.0 2024-09-17 13:41:26,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=196532.0, ans=0.0 2024-09-17 13:41:57,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2024-09-17 13:42:04,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=196625.33333333334, ans=0.2 2024-09-17 13:42:07,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=196625.33333333334, ans=0.0 2024-09-17 13:42:43,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=196718.66666666666, ans=0.125 2024-09-17 13:42:45,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=196718.66666666666, ans=0.125 2024-09-17 13:42:49,646 INFO [train.py:1198] (1/2) Epoch 11, batch 3400, loss[loss=0.2305, simple_loss=0.2701, pruned_loss=0.07352, ctc_loss=0.1489, cr_loss=0.354, over 34179.00 frames. ], tot_loss[loss=0.2589, simple_loss=0.2964, pruned_loss=0.08503, ctc_loss=0.1688, cr_loss=0.4377, over 6732444.04 frames. ], batch size: 78, lr: 1.08e-02, grad_scale: 32.0 2024-09-17 13:43:30,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=196858.66666666666, ans=0.0 2024-09-17 13:43:42,997 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.973e+02 2.633e+02 2.970e+02 3.989e+02 6.095e+02, threshold=5.941e+02, percent-clipped=0.0 2024-09-17 13:44:10,544 INFO [train.py:1198] (1/2) Epoch 11, batch 3450, loss[loss=0.2711, simple_loss=0.3122, pruned_loss=0.08828, ctc_loss=0.1768, cr_loss=0.4544, over 33055.00 frames. ], tot_loss[loss=0.2588, simple_loss=0.2964, pruned_loss=0.08496, ctc_loss=0.1685, cr_loss=0.4372, over 6744748.13 frames. ], batch size: 130, lr: 1.08e-02, grad_scale: 32.0 2024-09-17 13:44:10,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=196998.66666666666, ans=0.125 2024-09-17 13:44:12,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=196998.66666666666, ans=0.0 2024-09-17 13:44:14,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=196998.66666666666, ans=0.125 2024-09-17 13:44:14,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.66 vs. limit=15.0 2024-09-17 13:44:28,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=197045.33333333334, ans=0.1 2024-09-17 13:45:14,145 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.31 vs. limit=22.5 2024-09-17 13:45:30,824 INFO [train.py:1198] (1/2) Epoch 11, batch 3500, loss[loss=0.2326, simple_loss=0.2727, pruned_loss=0.07343, ctc_loss=0.1471, cr_loss=0.4047, over 34496.00 frames. ], tot_loss[loss=0.2576, simple_loss=0.2954, pruned_loss=0.08441, ctc_loss=0.1675, cr_loss=0.4353, over 6746444.29 frames. ], batch size: 85, lr: 1.08e-02, grad_scale: 32.0 2024-09-17 13:45:37,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=197232.0, ans=0.0 2024-09-17 13:45:45,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=197278.66666666666, ans=0.125 2024-09-17 13:46:03,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=197325.33333333334, ans=0.1 2024-09-17 13:46:15,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=197325.33333333334, ans=0.0 2024-09-17 13:46:24,956 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 2.429e+02 2.745e+02 3.842e+02 5.919e+02, threshold=5.489e+02, percent-clipped=0.0 2024-09-17 13:46:32,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=197372.0, ans=0.2 2024-09-17 13:46:34,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=197372.0, ans=0.125 2024-09-17 13:46:34,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=197372.0, ans=0.025 2024-09-17 13:46:40,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=197418.66666666666, ans=0.125 2024-09-17 13:46:50,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=197418.66666666666, ans=0.0 2024-09-17 13:46:51,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=197465.33333333334, ans=0.07 2024-09-17 13:46:53,098 INFO [train.py:1198] (1/2) Epoch 11, batch 3550, loss[loss=0.2725, simple_loss=0.3109, pruned_loss=0.09009, ctc_loss=0.1754, cr_loss=0.4741, over 34407.00 frames. ], tot_loss[loss=0.2576, simple_loss=0.2955, pruned_loss=0.08439, ctc_loss=0.1675, cr_loss=0.4357, over 6756098.84 frames. ], batch size: 103, lr: 1.08e-02, grad_scale: 32.0 2024-09-17 13:47:14,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=197512.0, ans=0.125 2024-09-17 13:47:49,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=197605.33333333334, ans=0.125 2024-09-17 13:47:58,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=197652.0, ans=0.125 2024-09-17 13:48:13,117 INFO [train.py:1198] (1/2) Epoch 11, batch 3600, loss[loss=0.2432, simple_loss=0.2808, pruned_loss=0.07874, ctc_loss=0.1583, cr_loss=0.4134, over 34501.00 frames. ], tot_loss[loss=0.2577, simple_loss=0.2955, pruned_loss=0.08447, ctc_loss=0.1675, cr_loss=0.4359, over 6765599.77 frames. ], batch size: 90, lr: 1.08e-02, grad_scale: 32.0 2024-09-17 13:48:15,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=197698.66666666666, ans=0.125 2024-09-17 13:48:27,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=197745.33333333334, ans=0.0 2024-09-17 13:49:06,120 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.957e+02 2.506e+02 3.143e+02 4.184e+02 9.261e+02, threshold=6.287e+02, percent-clipped=8.0 2024-09-17 13:49:11,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=197838.66666666666, ans=0.1 2024-09-17 13:49:33,921 INFO [train.py:1198] (1/2) Epoch 11, batch 3650, loss[loss=0.2668, simple_loss=0.3081, pruned_loss=0.08637, ctc_loss=0.175, cr_loss=0.4475, over 34427.00 frames. ], tot_loss[loss=0.2565, simple_loss=0.2946, pruned_loss=0.08391, ctc_loss=0.1665, cr_loss=0.4337, over 6769163.22 frames. ], batch size: 110, lr: 1.08e-02, grad_scale: 32.0 2024-09-17 13:50:05,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=198025.33333333334, ans=0.125 2024-09-17 13:50:06,486 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.99 vs. limit=15.0 2024-09-17 13:50:10,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=198025.33333333334, ans=0.0 2024-09-17 13:50:32,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=12.0 2024-09-17 13:50:36,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=198072.0, ans=0.2 2024-09-17 13:50:43,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=198118.66666666666, ans=10.0 2024-09-17 13:50:44,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=198118.66666666666, ans=0.125 2024-09-17 13:50:55,663 INFO [train.py:1198] (1/2) Epoch 11, batch 3700, loss[loss=0.2755, simple_loss=0.3134, pruned_loss=0.09201, ctc_loss=0.1788, cr_loss=0.4493, over 34641.00 frames. ], tot_loss[loss=0.2564, simple_loss=0.2947, pruned_loss=0.08374, ctc_loss=0.1664, cr_loss=0.4335, over 6784045.29 frames. ], batch size: 102, lr: 1.08e-02, grad_scale: 32.0 2024-09-17 13:51:15,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=198212.0, ans=0.07 2024-09-17 13:51:25,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198212.0, ans=0.1 2024-09-17 13:51:26,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=198258.66666666666, ans=0.125 2024-09-17 13:51:34,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=198258.66666666666, ans=0.1 2024-09-17 13:51:48,722 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.039e+02 2.441e+02 2.792e+02 3.602e+02 8.086e+02, threshold=5.584e+02, percent-clipped=4.0 2024-09-17 13:51:49,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=198305.33333333334, ans=0.1 2024-09-17 13:51:55,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=198305.33333333334, ans=0.125 2024-09-17 13:51:58,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=198352.0, ans=0.125 2024-09-17 13:52:04,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.70 vs. limit=15.0 2024-09-17 13:52:06,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=198352.0, ans=0.125 2024-09-17 13:52:12,381 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.05 vs. limit=22.5 2024-09-17 13:52:16,236 INFO [train.py:1198] (1/2) Epoch 11, batch 3750, loss[loss=0.2746, simple_loss=0.3126, pruned_loss=0.09057, ctc_loss=0.1815, cr_loss=0.4788, over 34300.00 frames. ], tot_loss[loss=0.2601, simple_loss=0.2982, pruned_loss=0.08531, ctc_loss=0.1691, cr_loss=0.4382, over 6785448.11 frames. ], batch size: 113, lr: 1.08e-02, grad_scale: 32.0 2024-09-17 13:52:18,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=15.0 2024-09-17 13:52:18,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.86 vs. limit=15.0 2024-09-17 13:52:27,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=198398.66666666666, ans=0.125 2024-09-17 13:52:44,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=198445.33333333334, ans=0.125 2024-09-17 13:52:57,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=198492.0, ans=0.125 2024-09-17 13:52:58,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=198492.0, ans=0.125 2024-09-17 13:53:37,998 INFO [train.py:1198] (1/2) Epoch 11, batch 3800, loss[loss=0.2904, simple_loss=0.3193, pruned_loss=0.1016, ctc_loss=0.1979, cr_loss=0.4705, over 30208.00 frames. ], tot_loss[loss=0.2647, simple_loss=0.3018, pruned_loss=0.0876, ctc_loss=0.1732, cr_loss=0.443, over 6674555.95 frames. ], batch size: 175, lr: 1.08e-02, grad_scale: 32.0 2024-09-17 13:53:51,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=198632.0, ans=0.0 2024-09-17 13:53:55,584 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=12.0 2024-09-17 13:54:01,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=198678.66666666666, ans=0.0 2024-09-17 13:54:08,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=198678.66666666666, ans=0.125 2024-09-17 13:54:10,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=198725.33333333334, ans=0.0 2024-09-17 13:54:10,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.54 vs. limit=15.0 2024-09-17 13:54:24,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=198725.33333333334, ans=0.025 2024-09-17 13:54:35,128 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 2.548e+02 2.820e+02 3.138e+02 8.273e+02, threshold=5.639e+02, percent-clipped=1.0 2024-09-17 13:55:02,518 INFO [train.py:1198] (1/2) Epoch 11, batch 3850, loss[loss=0.3019, simple_loss=0.3241, pruned_loss=0.1084, ctc_loss=0.2149, cr_loss=0.4935, over 24480.00 frames. ], tot_loss[loss=0.2711, simple_loss=0.3057, pruned_loss=0.09128, ctc_loss=0.1806, cr_loss=0.4466, over 6250196.10 frames. ], batch size: 245, lr: 1.08e-02, grad_scale: 16.0 2024-09-17 13:55:14,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.46 vs. limit=15.0 2024-09-17 13:55:27,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=198912.0, ans=0.125 2024-09-17 13:56:34,628 INFO [train.py:1198] (1/2) Epoch 12, batch 0, loss[loss=0.2509, simple_loss=0.2787, pruned_loss=0.08651, ctc_loss=0.1667, cr_loss=0.4183, over 34466.00 frames. ], tot_loss[loss=0.2509, simple_loss=0.2787, pruned_loss=0.08651, ctc_loss=0.1667, cr_loss=0.4183, over 34466.00 frames. ], batch size: 85, lr: 1.03e-02, grad_scale: 32.0 2024-09-17 13:56:34,629 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 13:56:44,604 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2803, 3.9236, 3.2270, 3.9106], device='cuda:1') 2024-09-17 13:56:51,618 INFO [train.py:1230] (1/2) Epoch 12, validation: loss=0.1561, simple_loss=0.2559, pruned_loss=0.02341, ctc_loss=0.04784, cr_loss=1.6e-14, over 944034.00 frames. 2024-09-17 13:56:51,619 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 13:56:52,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=21.56 vs. limit=22.5 2024-09-17 13:57:15,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=199033.33333333334, ans=10.0 2024-09-17 13:57:16,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=199033.33333333334, ans=0.1 2024-09-17 13:57:18,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=199033.33333333334, ans=0.125 2024-09-17 13:57:31,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=199080.0, ans=0.125 2024-09-17 13:57:57,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=199173.33333333334, ans=0.0 2024-09-17 13:58:15,688 INFO [train.py:1198] (1/2) Epoch 12, batch 50, loss[loss=0.2294, simple_loss=0.2655, pruned_loss=0.07386, ctc_loss=0.148, cr_loss=0.4015, over 34482.00 frames. ], tot_loss[loss=0.2588, simple_loss=0.2966, pruned_loss=0.08492, ctc_loss=0.1688, cr_loss=0.4366, over 1479560.12 frames. ], batch size: 82, lr: 1.03e-02, grad_scale: 32.0 2024-09-17 13:58:27,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.41 vs. limit=15.0 2024-09-17 13:58:29,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.24 vs. limit=15.0 2024-09-17 13:58:30,117 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.187e+02 2.737e+02 3.120e+02 4.210e+02 8.547e+02, threshold=6.240e+02, percent-clipped=8.0 2024-09-17 13:58:32,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=199266.66666666666, ans=0.125 2024-09-17 13:58:40,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=199266.66666666666, ans=0.0 2024-09-17 13:58:53,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.12 vs. limit=15.0 2024-09-17 13:58:55,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=199313.33333333334, ans=0.125 2024-09-17 13:59:07,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=199360.0, ans=0.0 2024-09-17 13:59:07,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.65 vs. limit=15.0 2024-09-17 13:59:32,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.61 vs. limit=22.5 2024-09-17 13:59:40,140 INFO [train.py:1198] (1/2) Epoch 12, batch 100, loss[loss=0.2501, simple_loss=0.2897, pruned_loss=0.08075, ctc_loss=0.1593, cr_loss=0.4276, over 34568.00 frames. ], tot_loss[loss=0.2605, simple_loss=0.2985, pruned_loss=0.08551, ctc_loss=0.1699, cr_loss=0.4383, over 2626444.46 frames. ], batch size: 89, lr: 1.03e-02, grad_scale: 32.0 2024-09-17 13:59:53,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=199453.33333333334, ans=10.0 2024-09-17 14:00:02,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2024-09-17 14:00:08,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=199500.0, ans=0.125 2024-09-17 14:00:20,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=15.0 2024-09-17 14:00:29,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=199593.33333333334, ans=15.0 2024-09-17 14:00:36,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2024-09-17 14:00:37,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=199593.33333333334, ans=0.1 2024-09-17 14:00:43,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.62 vs. limit=15.0 2024-09-17 14:01:03,904 INFO [train.py:1198] (1/2) Epoch 12, batch 150, loss[loss=0.2214, simple_loss=0.2616, pruned_loss=0.06914, ctc_loss=0.1373, cr_loss=0.3863, over 34463.00 frames. ], tot_loss[loss=0.2574, simple_loss=0.2961, pruned_loss=0.08393, ctc_loss=0.167, cr_loss=0.4348, over 3554904.67 frames. ], batch size: 82, lr: 1.03e-02, grad_scale: 32.0 2024-09-17 14:01:05,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=199686.66666666666, ans=0.2 2024-09-17 14:01:17,137 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.053e+02 2.393e+02 2.725e+02 3.254e+02 5.866e+02, threshold=5.450e+02, percent-clipped=0.0 2024-09-17 14:01:21,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=199733.33333333334, ans=0.025 2024-09-17 14:01:36,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=15.0 2024-09-17 14:01:52,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=199826.66666666666, ans=0.0 2024-09-17 14:02:00,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=199826.66666666666, ans=0.125 2024-09-17 14:02:09,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.55 vs. limit=15.0 2024-09-17 14:02:15,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=199873.33333333334, ans=0.125 2024-09-17 14:02:19,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=199873.33333333334, ans=0.0 2024-09-17 14:02:28,639 INFO [train.py:1198] (1/2) Epoch 12, batch 200, loss[loss=0.2793, simple_loss=0.3147, pruned_loss=0.09452, ctc_loss=0.1822, cr_loss=0.4591, over 31971.00 frames. ], tot_loss[loss=0.2564, simple_loss=0.2951, pruned_loss=0.08353, ctc_loss=0.1661, cr_loss=0.434, over 4270286.50 frames. ], batch size: 145, lr: 1.03e-02, grad_scale: 32.0 2024-09-17 14:02:31,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.30 vs. limit=12.0 2024-09-17 14:03:51,112 INFO [train.py:1198] (1/2) Epoch 12, batch 250, loss[loss=0.2832, simple_loss=0.3194, pruned_loss=0.09572, ctc_loss=0.1834, cr_loss=0.4714, over 34262.00 frames. ], tot_loss[loss=0.2561, simple_loss=0.295, pruned_loss=0.08331, ctc_loss=0.1657, cr_loss=0.4347, over 4831597.64 frames. ], batch size: 117, lr: 1.03e-02, grad_scale: 32.0 2024-09-17 14:03:56,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=200153.33333333334, ans=0.0 2024-09-17 14:04:04,276 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.557e+02 3.091e+02 3.997e+02 7.261e+02, threshold=6.182e+02, percent-clipped=11.0 2024-09-17 14:04:47,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=22.5 2024-09-17 14:04:53,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=200293.33333333334, ans=0.1 2024-09-17 14:04:53,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=200293.33333333334, ans=0.0 2024-09-17 14:05:01,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=200340.0, ans=0.125 2024-09-17 14:05:10,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.96 vs. limit=15.0 2024-09-17 14:05:16,257 INFO [train.py:1198] (1/2) Epoch 12, batch 300, loss[loss=0.2831, simple_loss=0.3169, pruned_loss=0.09652, ctc_loss=0.1854, cr_loss=0.48, over 34305.00 frames. ], tot_loss[loss=0.256, simple_loss=0.2948, pruned_loss=0.08329, ctc_loss=0.1657, cr_loss=0.4348, over 5259947.87 frames. ], batch size: 107, lr: 1.03e-02, grad_scale: 32.0 2024-09-17 14:06:11,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=200526.66666666666, ans=0.125 2024-09-17 14:06:27,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.73 vs. limit=15.0 2024-09-17 14:06:40,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=200620.0, ans=0.09899494936611666 2024-09-17 14:06:41,612 INFO [train.py:1198] (1/2) Epoch 12, batch 350, loss[loss=0.2236, simple_loss=0.2618, pruned_loss=0.0705, ctc_loss=0.1419, cr_loss=0.401, over 34278.00 frames. ], tot_loss[loss=0.2569, simple_loss=0.2955, pruned_loss=0.08379, ctc_loss=0.1665, cr_loss=0.4359, over 5595226.45 frames. ], batch size: 83, lr: 1.03e-02, grad_scale: 32.0 2024-09-17 14:06:54,846 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.700e+02 3.327e+02 4.301e+02 7.148e+02, threshold=6.653e+02, percent-clipped=6.0 2024-09-17 14:07:00,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=200666.66666666666, ans=0.125 2024-09-17 14:07:21,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=200713.33333333334, ans=0.0 2024-09-17 14:07:22,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2024-09-17 14:07:36,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=200760.0, ans=0.125 2024-09-17 14:08:01,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=200806.66666666666, ans=0.2 2024-09-17 14:08:04,401 INFO [train.py:1198] (1/2) Epoch 12, batch 400, loss[loss=0.2676, simple_loss=0.3083, pruned_loss=0.08725, ctc_loss=0.1715, cr_loss=0.4505, over 34446.00 frames. ], tot_loss[loss=0.2561, simple_loss=0.2949, pruned_loss=0.08337, ctc_loss=0.1657, cr_loss=0.4349, over 5863515.43 frames. ], batch size: 95, lr: 1.03e-02, grad_scale: 32.0 2024-09-17 14:08:25,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=200900.0, ans=0.1 2024-09-17 14:08:35,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=15.0 2024-09-17 14:08:39,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.97 vs. limit=15.0 2024-09-17 14:08:44,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=200946.66666666666, ans=0.125 2024-09-17 14:09:01,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=200993.33333333334, ans=0.1 2024-09-17 14:09:06,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=200993.33333333334, ans=0.0 2024-09-17 14:09:08,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=200993.33333333334, ans=0.0 2024-09-17 14:09:12,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.75 vs. limit=12.0 2024-09-17 14:09:14,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=22.5 2024-09-17 14:09:28,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=201086.66666666666, ans=0.0 2024-09-17 14:09:29,308 INFO [train.py:1198] (1/2) Epoch 12, batch 450, loss[loss=0.2639, simple_loss=0.3044, pruned_loss=0.0855, ctc_loss=0.1726, cr_loss=0.4468, over 34693.00 frames. ], tot_loss[loss=0.256, simple_loss=0.2947, pruned_loss=0.08338, ctc_loss=0.1658, cr_loss=0.4352, over 6053555.52 frames. ], batch size: 97, lr: 1.03e-02, grad_scale: 32.0 2024-09-17 14:09:34,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=201086.66666666666, ans=0.125 2024-09-17 14:09:42,556 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.464e+02 2.926e+02 3.517e+02 6.893e+02, threshold=5.852e+02, percent-clipped=2.0 2024-09-17 14:09:49,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=201133.33333333334, ans=0.0 2024-09-17 14:09:51,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=201133.33333333334, ans=0.1 2024-09-17 14:10:11,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=201180.0, ans=0.1 2024-09-17 14:10:19,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=201226.66666666666, ans=0.125 2024-09-17 14:10:21,627 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:10:28,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=201226.66666666666, ans=0.125 2024-09-17 14:10:28,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=201226.66666666666, ans=0.0 2024-09-17 14:10:45,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=201273.33333333334, ans=0.125 2024-09-17 14:10:49,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=201273.33333333334, ans=0.125 2024-09-17 14:10:52,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=201320.0, ans=0.0 2024-09-17 14:10:53,826 INFO [train.py:1198] (1/2) Epoch 12, batch 500, loss[loss=0.2736, simple_loss=0.3165, pruned_loss=0.08877, ctc_loss=0.1761, cr_loss=0.447, over 34439.00 frames. ], tot_loss[loss=0.2545, simple_loss=0.2933, pruned_loss=0.08269, ctc_loss=0.1646, cr_loss=0.4337, over 6221020.83 frames. ], batch size: 110, lr: 1.03e-02, grad_scale: 32.0 2024-09-17 14:11:01,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten.whitening_limit, batch_count=201320.0, ans=15.0 2024-09-17 14:11:01,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.73 vs. limit=15.0 2024-09-17 14:11:21,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2024-09-17 14:11:27,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=201413.33333333334, ans=0.0 2024-09-17 14:11:30,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=201413.33333333334, ans=0.0 2024-09-17 14:11:55,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=201460.0, ans=0.125 2024-09-17 14:11:57,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=201460.0, ans=0.125 2024-09-17 14:12:18,346 INFO [train.py:1198] (1/2) Epoch 12, batch 550, loss[loss=0.2814, simple_loss=0.3174, pruned_loss=0.09384, ctc_loss=0.1901, cr_loss=0.493, over 33758.00 frames. ], tot_loss[loss=0.2549, simple_loss=0.2935, pruned_loss=0.08292, ctc_loss=0.1651, cr_loss=0.4343, over 6331186.74 frames. ], batch size: 122, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:12:31,490 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.450e+02 2.959e+02 3.502e+02 6.866e+02, threshold=5.918e+02, percent-clipped=1.0 2024-09-17 14:12:40,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=201600.0, ans=0.125 2024-09-17 14:13:17,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=201693.33333333334, ans=0.125 2024-09-17 14:13:20,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=201693.33333333334, ans=0.025 2024-09-17 14:13:28,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=201740.0, ans=0.0 2024-09-17 14:13:32,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=201740.0, ans=0.2 2024-09-17 14:13:38,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.58 vs. limit=15.0 2024-09-17 14:13:41,622 INFO [train.py:1198] (1/2) Epoch 12, batch 600, loss[loss=0.2697, simple_loss=0.3153, pruned_loss=0.08643, ctc_loss=0.1711, cr_loss=0.4277, over 34243.00 frames. ], tot_loss[loss=0.2552, simple_loss=0.294, pruned_loss=0.08296, ctc_loss=0.1653, cr_loss=0.4348, over 6433959.75 frames. ], batch size: 117, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:13:55,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=201786.66666666666, ans=0.05 2024-09-17 14:14:05,671 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:14:10,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=201833.33333333334, ans=0.0 2024-09-17 14:14:13,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=201833.33333333334, ans=0.0 2024-09-17 14:14:35,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=201926.66666666666, ans=0.0 2024-09-17 14:14:41,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=201926.66666666666, ans=0.125 2024-09-17 14:15:05,836 INFO [train.py:1198] (1/2) Epoch 12, batch 650, loss[loss=0.2531, simple_loss=0.2892, pruned_loss=0.08282, ctc_loss=0.1662, cr_loss=0.4528, over 34530.00 frames. ], tot_loss[loss=0.2541, simple_loss=0.2932, pruned_loss=0.08242, ctc_loss=0.1643, cr_loss=0.4333, over 6524600.20 frames. ], batch size: 94, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:15:14,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=202020.0, ans=0.125 2024-09-17 14:15:19,027 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.096e+02 2.764e+02 3.426e+02 5.361e+02 9.945e+02, threshold=6.852e+02, percent-clipped=21.0 2024-09-17 14:15:41,020 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:15:54,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=202113.33333333334, ans=0.02 2024-09-17 14:15:57,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=202160.0, ans=0.0 2024-09-17 14:16:28,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=202253.33333333334, ans=0.2 2024-09-17 14:16:29,975 INFO [train.py:1198] (1/2) Epoch 12, batch 700, loss[loss=0.241, simple_loss=0.2804, pruned_loss=0.07689, ctc_loss=0.1578, cr_loss=0.4061, over 34587.00 frames. ], tot_loss[loss=0.2542, simple_loss=0.2934, pruned_loss=0.08242, ctc_loss=0.1644, cr_loss=0.4341, over 6580642.51 frames. ], batch size: 89, lr: 1.02e-02, grad_scale: 16.0 2024-09-17 14:16:45,555 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.28 vs. limit=15.0 2024-09-17 14:16:48,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=202300.0, ans=0.0 2024-09-17 14:16:51,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=202300.0, ans=0.0 2024-09-17 14:17:17,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=202393.33333333334, ans=0.0 2024-09-17 14:17:18,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=202393.33333333334, ans=0.025 2024-09-17 14:17:19,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=202393.33333333334, ans=0.125 2024-09-17 14:17:29,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=202393.33333333334, ans=0.2 2024-09-17 14:17:33,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=202393.33333333334, ans=0.125 2024-09-17 14:17:39,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=202440.0, ans=0.125 2024-09-17 14:17:53,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=202486.66666666666, ans=0.125 2024-09-17 14:17:54,652 INFO [train.py:1198] (1/2) Epoch 12, batch 750, loss[loss=0.2608, simple_loss=0.2982, pruned_loss=0.08621, ctc_loss=0.1684, cr_loss=0.4303, over 34422.00 frames. ], tot_loss[loss=0.2537, simple_loss=0.2928, pruned_loss=0.08223, ctc_loss=0.164, cr_loss=0.4332, over 6625204.56 frames. ], batch size: 95, lr: 1.02e-02, grad_scale: 16.0 2024-09-17 14:17:58,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=202486.66666666666, ans=0.025 2024-09-17 14:18:09,271 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 2.623e+02 3.026e+02 4.137e+02 7.627e+02, threshold=6.053e+02, percent-clipped=2.0 2024-09-17 14:18:16,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=202533.33333333334, ans=0.07 2024-09-17 14:18:26,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=202580.0, ans=0.125 2024-09-17 14:18:32,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=202580.0, ans=0.1 2024-09-17 14:18:39,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=202580.0, ans=0.1 2024-09-17 14:18:44,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202626.66666666666, ans=0.1 2024-09-17 14:19:17,872 INFO [train.py:1198] (1/2) Epoch 12, batch 800, loss[loss=0.2241, simple_loss=0.2671, pruned_loss=0.06911, ctc_loss=0.1361, cr_loss=0.3934, over 34442.00 frames. ], tot_loss[loss=0.2537, simple_loss=0.2927, pruned_loss=0.08226, ctc_loss=0.1639, cr_loss=0.4335, over 6660752.81 frames. ], batch size: 85, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:19:24,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=202720.0, ans=0.0 2024-09-17 14:19:46,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=202766.66666666666, ans=0.1 2024-09-17 14:19:51,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=202813.33333333334, ans=0.125 2024-09-17 14:19:56,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=202813.33333333334, ans=0.1 2024-09-17 14:20:20,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=202860.0, ans=0.0 2024-09-17 14:20:42,117 INFO [train.py:1198] (1/2) Epoch 12, batch 850, loss[loss=0.2633, simple_loss=0.307, pruned_loss=0.08386, ctc_loss=0.1727, cr_loss=0.4308, over 34348.00 frames. ], tot_loss[loss=0.2532, simple_loss=0.2924, pruned_loss=0.08198, ctc_loss=0.1636, cr_loss=0.4329, over 6693192.30 frames. ], batch size: 103, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:20:53,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=202953.33333333334, ans=0.0 2024-09-17 14:20:56,628 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.625e+02 3.032e+02 4.001e+02 8.324e+02, threshold=6.065e+02, percent-clipped=4.0 2024-09-17 14:21:20,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=203046.66666666666, ans=0.1 2024-09-17 14:21:24,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=203046.66666666666, ans=0.0 2024-09-17 14:21:33,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.17 vs. limit=22.5 2024-09-17 14:22:06,965 INFO [train.py:1198] (1/2) Epoch 12, batch 900, loss[loss=0.2399, simple_loss=0.2771, pruned_loss=0.07719, ctc_loss=0.1569, cr_loss=0.4253, over 34506.00 frames. ], tot_loss[loss=0.254, simple_loss=0.293, pruned_loss=0.08237, ctc_loss=0.1642, cr_loss=0.4342, over 6700199.91 frames. ], batch size: 85, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:22:33,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=203233.33333333334, ans=0.05 2024-09-17 14:22:36,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=203233.33333333334, ans=0.1 2024-09-17 14:22:41,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=203280.0, ans=0.0 2024-09-17 14:22:55,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=203326.66666666666, ans=0.025 2024-09-17 14:23:03,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=203326.66666666666, ans=0.0 2024-09-17 14:23:14,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=203373.33333333334, ans=0.125 2024-09-17 14:23:31,042 INFO [train.py:1198] (1/2) Epoch 12, batch 950, loss[loss=0.2307, simple_loss=0.2727, pruned_loss=0.07176, ctc_loss=0.1473, cr_loss=0.3931, over 34667.00 frames. ], tot_loss[loss=0.254, simple_loss=0.2932, pruned_loss=0.08234, ctc_loss=0.1642, cr_loss=0.4338, over 6700469.94 frames. ], batch size: 87, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:23:32,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=15.0 2024-09-17 14:23:45,806 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.905e+02 2.460e+02 2.823e+02 3.511e+02 7.158e+02, threshold=5.646e+02, percent-clipped=2.0 2024-09-17 14:24:17,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.32 vs. limit=22.5 2024-09-17 14:24:31,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=203560.0, ans=0.2 2024-09-17 14:24:37,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=203606.66666666666, ans=0.125 2024-09-17 14:24:53,869 INFO [train.py:1198] (1/2) Epoch 12, batch 1000, loss[loss=0.2536, simple_loss=0.2912, pruned_loss=0.08287, ctc_loss=0.1651, cr_loss=0.432, over 34534.00 frames. ], tot_loss[loss=0.2552, simple_loss=0.2942, pruned_loss=0.08289, ctc_loss=0.1651, cr_loss=0.4349, over 6693961.19 frames. ], batch size: 90, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:25:00,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=203653.33333333334, ans=0.2 2024-09-17 14:25:10,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=203700.0, ans=0.125 2024-09-17 14:25:14,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=203700.0, ans=0.125 2024-09-17 14:25:27,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.27 vs. limit=15.0 2024-09-17 14:25:34,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=203746.66666666666, ans=0.2 2024-09-17 14:25:35,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2024-09-17 14:25:50,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=15.0 2024-09-17 14:26:08,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=203840.0, ans=0.2 2024-09-17 14:26:13,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=203840.0, ans=0.125 2024-09-17 14:26:14,223 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.21 vs. limit=15.0 2024-09-17 14:26:14,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2024-09-17 14:26:18,059 INFO [train.py:1198] (1/2) Epoch 12, batch 1050, loss[loss=0.2685, simple_loss=0.3117, pruned_loss=0.08632, ctc_loss=0.1714, cr_loss=0.4623, over 34575.00 frames. ], tot_loss[loss=0.2548, simple_loss=0.2938, pruned_loss=0.08278, ctc_loss=0.1648, cr_loss=0.4337, over 6704361.09 frames. ], batch size: 99, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:26:29,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=203886.66666666666, ans=0.125 2024-09-17 14:26:32,718 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.359e+02 2.914e+02 3.893e+02 6.943e+02, threshold=5.828e+02, percent-clipped=4.0 2024-09-17 14:26:42,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=203933.33333333334, ans=0.2 2024-09-17 14:27:19,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=204026.66666666666, ans=0.2 2024-09-17 14:27:23,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=204026.66666666666, ans=0.125 2024-09-17 14:27:26,917 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.13 vs. limit=15.0 2024-09-17 14:27:42,518 INFO [train.py:1198] (1/2) Epoch 12, batch 1100, loss[loss=0.2516, simple_loss=0.2889, pruned_loss=0.08201, ctc_loss=0.1635, cr_loss=0.4374, over 34363.00 frames. ], tot_loss[loss=0.2546, simple_loss=0.2937, pruned_loss=0.08261, ctc_loss=0.1645, cr_loss=0.4337, over 6716943.23 frames. ], batch size: 91, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:28:12,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=204166.66666666666, ans=0.125 2024-09-17 14:28:20,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=204213.33333333334, ans=0.1 2024-09-17 14:28:20,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=204213.33333333334, ans=0.0 2024-09-17 14:28:47,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=204306.66666666666, ans=0.125 2024-09-17 14:28:50,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=204306.66666666666, ans=0.0 2024-09-17 14:28:54,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2024-09-17 14:29:05,069 INFO [train.py:1198] (1/2) Epoch 12, batch 1150, loss[loss=0.2405, simple_loss=0.2821, pruned_loss=0.07576, ctc_loss=0.1537, cr_loss=0.4181, over 34354.00 frames. ], tot_loss[loss=0.2547, simple_loss=0.2937, pruned_loss=0.08269, ctc_loss=0.1648, cr_loss=0.433, over 6714906.77 frames. ], batch size: 91, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:29:06,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=204353.33333333334, ans=0.125 2024-09-17 14:29:22,144 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.581e+02 3.103e+02 3.692e+02 6.881e+02, threshold=6.205e+02, percent-clipped=3.0 2024-09-17 14:29:31,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=204400.0, ans=0.125 2024-09-17 14:29:32,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=204400.0, ans=0.125 2024-09-17 14:29:37,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=204400.0, ans=0.125 2024-09-17 14:29:46,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=204446.66666666666, ans=0.2 2024-09-17 14:30:04,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=204493.33333333334, ans=0.125 2024-09-17 14:30:13,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=204540.0, ans=15.0 2024-09-17 14:30:16,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=204540.0, ans=0.0 2024-09-17 14:30:22,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=204540.0, ans=0.125 2024-09-17 14:30:30,684 INFO [train.py:1198] (1/2) Epoch 12, batch 1200, loss[loss=0.2676, simple_loss=0.3087, pruned_loss=0.08743, ctc_loss=0.1729, cr_loss=0.4264, over 34585.00 frames. ], tot_loss[loss=0.2552, simple_loss=0.2944, pruned_loss=0.08281, ctc_loss=0.165, cr_loss=0.4332, over 6707955.18 frames. ], batch size: 99, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:30:35,327 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.83 vs. limit=12.0 2024-09-17 14:30:41,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=204586.66666666666, ans=0.2 2024-09-17 14:30:51,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=204633.33333333334, ans=0.04949747468305833 2024-09-17 14:30:59,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=204633.33333333334, ans=0.5 2024-09-17 14:30:59,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=204633.33333333334, ans=0.5 2024-09-17 14:31:12,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=204680.0, ans=0.1 2024-09-17 14:31:14,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=204680.0, ans=0.025 2024-09-17 14:31:23,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.74 vs. limit=10.0 2024-09-17 14:31:42,834 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.49 vs. limit=22.5 2024-09-17 14:31:53,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=204820.0, ans=0.0 2024-09-17 14:31:55,172 INFO [train.py:1198] (1/2) Epoch 12, batch 1250, loss[loss=0.2773, simple_loss=0.3124, pruned_loss=0.09327, ctc_loss=0.1819, cr_loss=0.482, over 34326.00 frames. ], tot_loss[loss=0.2553, simple_loss=0.2946, pruned_loss=0.08284, ctc_loss=0.165, cr_loss=0.434, over 6740783.24 frames. ], batch size: 107, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:32:10,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=204866.66666666666, ans=0.0 2024-09-17 14:32:11,915 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.435e+02 2.709e+02 3.181e+02 4.779e+02, threshold=5.417e+02, percent-clipped=0.0 2024-09-17 14:32:31,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=204913.33333333334, ans=0.125 2024-09-17 14:32:41,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=204913.33333333334, ans=0.0 2024-09-17 14:32:50,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=204960.0, ans=0.125 2024-09-17 14:33:13,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2024-09-17 14:33:15,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=205006.66666666666, ans=10.0 2024-09-17 14:33:18,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=205053.33333333334, ans=0.125 2024-09-17 14:33:20,030 INFO [train.py:1198] (1/2) Epoch 12, batch 1300, loss[loss=0.2512, simple_loss=0.301, pruned_loss=0.07687, ctc_loss=0.155, cr_loss=0.4152, over 33027.00 frames. ], tot_loss[loss=0.2538, simple_loss=0.2934, pruned_loss=0.08214, ctc_loss=0.1637, cr_loss=0.4325, over 6745086.13 frames. ], batch size: 130, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:33:22,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=205053.33333333334, ans=0.5 2024-09-17 14:33:35,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=205100.0, ans=0.09899494936611666 2024-09-17 14:33:39,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=12.0 2024-09-17 14:33:57,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.15 vs. limit=15.0 2024-09-17 14:34:13,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=205193.33333333334, ans=0.125 2024-09-17 14:34:16,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=205193.33333333334, ans=0.125 2024-09-17 14:34:44,334 INFO [train.py:1198] (1/2) Epoch 12, batch 1350, loss[loss=0.2469, simple_loss=0.2858, pruned_loss=0.07944, ctc_loss=0.1562, cr_loss=0.4475, over 34548.00 frames. ], tot_loss[loss=0.2531, simple_loss=0.2927, pruned_loss=0.08176, ctc_loss=0.1631, cr_loss=0.4324, over 6765814.09 frames. ], batch size: 94, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:35:06,939 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.860e+02 2.529e+02 3.086e+02 4.248e+02 6.999e+02, threshold=6.171e+02, percent-clipped=8.0 2024-09-17 14:35:21,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=205380.0, ans=0.0 2024-09-17 14:35:49,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=205426.66666666666, ans=0.0 2024-09-17 14:35:53,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=205426.66666666666, ans=0.0 2024-09-17 14:36:12,720 INFO [train.py:1198] (1/2) Epoch 12, batch 1400, loss[loss=0.2327, simple_loss=0.2712, pruned_loss=0.07379, ctc_loss=0.1524, cr_loss=0.4039, over 34286.00 frames. ], tot_loss[loss=0.253, simple_loss=0.2926, pruned_loss=0.08175, ctc_loss=0.1631, cr_loss=0.4326, over 6777942.83 frames. ], batch size: 80, lr: 1.02e-02, grad_scale: 32.0 2024-09-17 14:36:33,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.42 vs. limit=22.5 2024-09-17 14:36:47,884 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:37:37,854 INFO [train.py:1198] (1/2) Epoch 12, batch 1450, loss[loss=0.2687, simple_loss=0.3103, pruned_loss=0.08779, ctc_loss=0.1724, cr_loss=0.4262, over 34431.00 frames. ], tot_loss[loss=0.2535, simple_loss=0.2932, pruned_loss=0.08186, ctc_loss=0.1634, cr_loss=0.4334, over 6775606.38 frames. ], batch size: 110, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:37:41,457 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:37:41,522 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:37:54,376 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.043e+02 2.438e+02 2.778e+02 3.528e+02 7.206e+02, threshold=5.557e+02, percent-clipped=2.0 2024-09-17 14:38:09,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=205846.66666666666, ans=0.0 2024-09-17 14:38:11,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=205846.66666666666, ans=0.125 2024-09-17 14:38:27,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=205893.33333333334, ans=0.0 2024-09-17 14:38:50,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=205940.0, ans=0.125 2024-09-17 14:38:52,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=205940.0, ans=0.0 2024-09-17 14:39:01,897 INFO [train.py:1198] (1/2) Epoch 12, batch 1500, loss[loss=0.2578, simple_loss=0.3053, pruned_loss=0.08092, ctc_loss=0.1579, cr_loss=0.4236, over 34469.00 frames. ], tot_loss[loss=0.2541, simple_loss=0.2939, pruned_loss=0.08215, ctc_loss=0.1638, cr_loss=0.4339, over 6775242.72 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:39:14,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2024-09-17 14:39:37,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.92 vs. limit=15.0 2024-09-17 14:39:55,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=206126.66666666666, ans=0.125 2024-09-17 14:39:57,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=206126.66666666666, ans=0.125 2024-09-17 14:40:06,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.38 vs. limit=15.0 2024-09-17 14:40:11,401 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.17 vs. limit=15.0 2024-09-17 14:40:14,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=22.5 2024-09-17 14:40:17,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=206173.33333333334, ans=0.2 2024-09-17 14:40:25,097 INFO [train.py:1198] (1/2) Epoch 12, batch 1550, loss[loss=0.2576, simple_loss=0.301, pruned_loss=0.08132, ctc_loss=0.165, cr_loss=0.4664, over 34443.00 frames. ], tot_loss[loss=0.2547, simple_loss=0.294, pruned_loss=0.08253, ctc_loss=0.1644, cr_loss=0.4346, over 6746911.74 frames. ], batch size: 105, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:40:40,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=206266.66666666666, ans=0.125 2024-09-17 14:40:44,306 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.473e+02 2.906e+02 3.735e+02 6.697e+02, threshold=5.811e+02, percent-clipped=3.0 2024-09-17 14:41:12,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=206313.33333333334, ans=0.09899494936611666 2024-09-17 14:41:35,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=206406.66666666666, ans=0.125 2024-09-17 14:41:49,621 INFO [train.py:1198] (1/2) Epoch 12, batch 1600, loss[loss=0.2736, simple_loss=0.3111, pruned_loss=0.09148, ctc_loss=0.1768, cr_loss=0.4431, over 34564.00 frames. ], tot_loss[loss=0.2544, simple_loss=0.2937, pruned_loss=0.08245, ctc_loss=0.1643, cr_loss=0.4335, over 6726641.43 frames. ], batch size: 99, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:42:11,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206500.0, ans=0.1 2024-09-17 14:42:14,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=206500.0, ans=0.0 2024-09-17 14:42:24,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-17 14:42:46,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=206593.33333333334, ans=0.125 2024-09-17 14:43:03,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=206640.0, ans=0.1 2024-09-17 14:43:13,831 INFO [train.py:1198] (1/2) Epoch 12, batch 1650, loss[loss=0.2575, simple_loss=0.3024, pruned_loss=0.0808, ctc_loss=0.1636, cr_loss=0.456, over 34347.00 frames. ], tot_loss[loss=0.2548, simple_loss=0.294, pruned_loss=0.08265, ctc_loss=0.1646, cr_loss=0.4335, over 6720761.69 frames. ], batch size: 103, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:43:22,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=206686.66666666666, ans=0.07 2024-09-17 14:43:27,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=206686.66666666666, ans=0.025 2024-09-17 14:43:30,200 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.102e+02 2.672e+02 3.379e+02 4.688e+02 7.471e+02, threshold=6.759e+02, percent-clipped=11.0 2024-09-17 14:43:43,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=206733.33333333334, ans=0.5 2024-09-17 14:43:46,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=206780.0, ans=0.1 2024-09-17 14:43:58,871 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.19 vs. limit=15.0 2024-09-17 14:44:21,994 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2024-09-17 14:44:30,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.75 vs. limit=22.5 2024-09-17 14:44:37,766 INFO [train.py:1198] (1/2) Epoch 12, batch 1700, loss[loss=0.2124, simple_loss=0.2536, pruned_loss=0.06501, ctc_loss=0.1314, cr_loss=0.3728, over 34260.00 frames. ], tot_loss[loss=0.2542, simple_loss=0.2935, pruned_loss=0.08238, ctc_loss=0.1643, cr_loss=0.4332, over 6744749.17 frames. ], batch size: 80, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:44:57,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=206966.66666666666, ans=0.125 2024-09-17 14:45:09,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=207013.33333333334, ans=0.125 2024-09-17 14:45:36,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.32 vs. limit=15.0 2024-09-17 14:45:44,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=207106.66666666666, ans=0.0 2024-09-17 14:46:02,247 INFO [train.py:1198] (1/2) Epoch 12, batch 1750, loss[loss=0.2323, simple_loss=0.2669, pruned_loss=0.07594, ctc_loss=0.1483, cr_loss=0.4052, over 34125.00 frames. ], tot_loss[loss=0.2538, simple_loss=0.293, pruned_loss=0.08225, ctc_loss=0.1639, cr_loss=0.4327, over 6753400.83 frames. ], batch size: 78, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:46:18,965 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.054e+02 2.704e+02 3.334e+02 4.327e+02 6.787e+02, threshold=6.668e+02, percent-clipped=1.0 2024-09-17 14:46:34,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=207246.66666666666, ans=0.125 2024-09-17 14:46:42,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=207246.66666666666, ans=0.0 2024-09-17 14:47:07,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=207340.0, ans=0.125 2024-09-17 14:47:11,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=207340.0, ans=0.0 2024-09-17 14:47:24,563 INFO [train.py:1198] (1/2) Epoch 12, batch 1800, loss[loss=0.2599, simple_loss=0.3033, pruned_loss=0.08308, ctc_loss=0.1659, cr_loss=0.4294, over 34694.00 frames. ], tot_loss[loss=0.254, simple_loss=0.2933, pruned_loss=0.08231, ctc_loss=0.164, cr_loss=0.4329, over 6757488.95 frames. ], batch size: 97, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:47:29,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=207386.66666666666, ans=0.2 2024-09-17 14:47:33,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=207386.66666666666, ans=0.125 2024-09-17 14:47:38,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=207386.66666666666, ans=0.1 2024-09-17 14:47:44,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=207433.33333333334, ans=15.0 2024-09-17 14:47:54,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=207433.33333333334, ans=0.0 2024-09-17 14:47:58,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=207480.0, ans=0.125 2024-09-17 14:48:01,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=207480.0, ans=0.0 2024-09-17 14:48:28,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=207526.66666666666, ans=0.07 2024-09-17 14:48:43,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.83 vs. limit=22.5 2024-09-17 14:48:45,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.44 vs. limit=15.0 2024-09-17 14:48:49,319 INFO [train.py:1198] (1/2) Epoch 12, batch 1850, loss[loss=0.2642, simple_loss=0.3046, pruned_loss=0.08629, ctc_loss=0.1693, cr_loss=0.4329, over 34448.00 frames. ], tot_loss[loss=0.2537, simple_loss=0.293, pruned_loss=0.08217, ctc_loss=0.1637, cr_loss=0.4332, over 6764231.15 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:49:05,525 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.589e+02 3.233e+02 4.670e+02 8.161e+02, threshold=6.467e+02, percent-clipped=2.0 2024-09-17 14:49:36,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.98 vs. limit=22.5 2024-09-17 14:49:44,490 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.27 vs. limit=10.0 2024-09-17 14:49:58,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=207806.66666666666, ans=0.125 2024-09-17 14:50:12,114 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:50:13,182 INFO [train.py:1198] (1/2) Epoch 12, batch 1900, loss[loss=0.2586, simple_loss=0.301, pruned_loss=0.08263, ctc_loss=0.1682, cr_loss=0.4336, over 34389.00 frames. ], tot_loss[loss=0.2544, simple_loss=0.2937, pruned_loss=0.08246, ctc_loss=0.1643, cr_loss=0.4341, over 6773199.95 frames. ], batch size: 103, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:50:16,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=207853.33333333334, ans=0.125 2024-09-17 14:50:28,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.85 vs. limit=10.0 2024-09-17 14:50:43,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=207900.0, ans=0.1 2024-09-17 14:51:00,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.85 vs. limit=22.5 2024-09-17 14:51:20,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=208040.0, ans=0.125 2024-09-17 14:51:28,347 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2024-09-17 14:51:35,743 INFO [train.py:1198] (1/2) Epoch 12, batch 1950, loss[loss=0.2365, simple_loss=0.2842, pruned_loss=0.07189, ctc_loss=0.1458, cr_loss=0.395, over 34354.00 frames. ], tot_loss[loss=0.2553, simple_loss=0.2948, pruned_loss=0.08271, ctc_loss=0.1647, cr_loss=0.4354, over 6790435.80 frames. ], batch size: 91, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:51:36,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=208086.66666666666, ans=0.125 2024-09-17 14:51:37,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=208086.66666666666, ans=0.2 2024-09-17 14:51:47,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=208086.66666666666, ans=0.1 2024-09-17 14:51:47,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=208086.66666666666, ans=0.2 2024-09-17 14:51:52,775 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.388e+02 2.725e+02 3.511e+02 5.743e+02, threshold=5.450e+02, percent-clipped=0.0 2024-09-17 14:52:01,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=208133.33333333334, ans=0.125 2024-09-17 14:53:02,787 INFO [train.py:1198] (1/2) Epoch 12, batch 2000, loss[loss=0.2281, simple_loss=0.2678, pruned_loss=0.07161, ctc_loss=0.1458, cr_loss=0.3983, over 34151.00 frames. ], tot_loss[loss=0.2556, simple_loss=0.2951, pruned_loss=0.08281, ctc_loss=0.165, cr_loss=0.4357, over 6765684.23 frames. ], batch size: 78, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:53:05,572 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.30 vs. limit=15.0 2024-09-17 14:53:09,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=208320.0, ans=0.07 2024-09-17 14:53:21,864 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:53:44,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=208413.33333333334, ans=0.025 2024-09-17 14:53:44,720 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 14:53:46,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=208413.33333333334, ans=0.1 2024-09-17 14:53:48,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=208413.33333333334, ans=0.125 2024-09-17 14:54:04,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=208460.0, ans=0.0 2024-09-17 14:54:25,670 INFO [train.py:1198] (1/2) Epoch 12, batch 2050, loss[loss=0.2273, simple_loss=0.2663, pruned_loss=0.07177, ctc_loss=0.1465, cr_loss=0.3876, over 34457.00 frames. ], tot_loss[loss=0.2544, simple_loss=0.2939, pruned_loss=0.08232, ctc_loss=0.1642, cr_loss=0.4339, over 6757227.27 frames. ], batch size: 82, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:54:26,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=208553.33333333334, ans=0.0 2024-09-17 14:54:42,232 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.619e+02 3.061e+02 3.997e+02 7.177e+02, threshold=6.122e+02, percent-clipped=9.0 2024-09-17 14:55:10,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=208646.66666666666, ans=0.95 2024-09-17 14:55:48,347 INFO [train.py:1198] (1/2) Epoch 12, batch 2100, loss[loss=0.2401, simple_loss=0.2889, pruned_loss=0.07306, ctc_loss=0.1464, cr_loss=0.3942, over 34552.00 frames. ], tot_loss[loss=0.2537, simple_loss=0.2933, pruned_loss=0.08204, ctc_loss=0.1636, cr_loss=0.4332, over 6770805.79 frames. ], batch size: 94, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:55:50,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=208786.66666666666, ans=0.0 2024-09-17 14:55:58,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=208786.66666666666, ans=0.0 2024-09-17 14:56:02,925 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.13 vs. limit=15.0 2024-09-17 14:56:12,916 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.08 vs. limit=15.0 2024-09-17 14:56:15,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=208833.33333333334, ans=0.125 2024-09-17 14:56:17,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=208833.33333333334, ans=0.2 2024-09-17 14:56:17,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.38 vs. limit=22.5 2024-09-17 14:56:23,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=208880.0, ans=0.125 2024-09-17 14:56:38,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=208926.66666666666, ans=0.025 2024-09-17 14:56:52,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=208926.66666666666, ans=0.0 2024-09-17 14:57:11,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=208973.33333333334, ans=0.125 2024-09-17 14:57:14,112 INFO [train.py:1198] (1/2) Epoch 12, batch 2150, loss[loss=0.2655, simple_loss=0.2984, pruned_loss=0.0888, ctc_loss=0.1807, cr_loss=0.4737, over 34370.00 frames. ], tot_loss[loss=0.253, simple_loss=0.2927, pruned_loss=0.08168, ctc_loss=0.1629, cr_loss=0.432, over 6788812.70 frames. ], batch size: 91, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:57:16,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=209020.0, ans=0.125 2024-09-17 14:57:16,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=209020.0, ans=0.0 2024-09-17 14:57:24,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=209020.0, ans=0.125 2024-09-17 14:57:26,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=209020.0, ans=6.0 2024-09-17 14:57:30,944 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.489e+02 2.990e+02 4.527e+02 9.369e+02, threshold=5.980e+02, percent-clipped=12.0 2024-09-17 14:57:37,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=209066.66666666666, ans=0.1 2024-09-17 14:57:46,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=209113.33333333334, ans=0.0 2024-09-17 14:57:52,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=209113.33333333334, ans=0.2 2024-09-17 14:58:09,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=209160.0, ans=0.125 2024-09-17 14:58:27,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=15.0 2024-09-17 14:58:35,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=209253.33333333334, ans=0.125 2024-09-17 14:58:36,956 INFO [train.py:1198] (1/2) Epoch 12, batch 2200, loss[loss=0.2364, simple_loss=0.287, pruned_loss=0.07002, ctc_loss=0.1455, cr_loss=0.4176, over 34446.00 frames. ], tot_loss[loss=0.2529, simple_loss=0.2927, pruned_loss=0.08161, ctc_loss=0.1628, cr_loss=0.4318, over 6784049.46 frames. ], batch size: 100, lr: 1.01e-02, grad_scale: 32.0 2024-09-17 14:59:10,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=209346.66666666666, ans=0.0 2024-09-17 14:59:10,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=209346.66666666666, ans=0.1 2024-09-17 14:59:31,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=209393.33333333334, ans=0.125 2024-09-17 14:59:33,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=209393.33333333334, ans=0.0 2024-09-17 14:59:33,924 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.70 vs. limit=15.0 2024-09-17 14:59:48,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=209440.0, ans=0.125 2024-09-17 14:59:59,028 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:00:01,921 INFO [train.py:1198] (1/2) Epoch 12, batch 2250, loss[loss=0.2672, simple_loss=0.3099, pruned_loss=0.08617, ctc_loss=0.1722, cr_loss=0.4424, over 34440.00 frames. ], tot_loss[loss=0.2533, simple_loss=0.2929, pruned_loss=0.08188, ctc_loss=0.1632, cr_loss=0.4325, over 6781752.47 frames. ], batch size: 95, lr: 1.01e-02, grad_scale: 16.0 2024-09-17 15:00:20,039 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.990e+02 2.668e+02 3.273e+02 4.477e+02 8.517e+02, threshold=6.545e+02, percent-clipped=8.0 2024-09-17 15:00:23,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=209533.33333333334, ans=0.125 2024-09-17 15:00:48,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=209580.0, ans=0.125 2024-09-17 15:01:10,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.53 vs. limit=22.5 2024-09-17 15:01:19,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.67 vs. limit=15.0 2024-09-17 15:01:25,797 INFO [train.py:1198] (1/2) Epoch 12, batch 2300, loss[loss=0.2222, simple_loss=0.2664, pruned_loss=0.06691, ctc_loss=0.1403, cr_loss=0.4026, over 34251.00 frames. ], tot_loss[loss=0.2526, simple_loss=0.2921, pruned_loss=0.08161, ctc_loss=0.1627, cr_loss=0.4314, over 6766204.23 frames. ], batch size: 83, lr: 1.01e-02, grad_scale: 16.0 2024-09-17 15:01:26,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=209720.0, ans=0.1 2024-09-17 15:01:40,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=209766.66666666666, ans=0.07 2024-09-17 15:01:42,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.15 vs. limit=12.0 2024-09-17 15:01:43,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=8.0 2024-09-17 15:02:00,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=209813.33333333334, ans=0.1 2024-09-17 15:02:08,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=209813.33333333334, ans=0.0 2024-09-17 15:02:14,189 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.46 vs. limit=22.5 2024-09-17 15:02:43,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=209906.66666666666, ans=0.0 2024-09-17 15:02:48,042 INFO [train.py:1198] (1/2) Epoch 12, batch 2350, loss[loss=0.2708, simple_loss=0.308, pruned_loss=0.08942, ctc_loss=0.1783, cr_loss=0.4743, over 34710.00 frames. ], tot_loss[loss=0.2526, simple_loss=0.2922, pruned_loss=0.0816, ctc_loss=0.1627, cr_loss=0.4321, over 6773228.92 frames. ], batch size: 97, lr: 1.00e-02, grad_scale: 16.0 2024-09-17 15:02:49,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=209953.33333333334, ans=0.1 2024-09-17 15:03:06,470 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.531e+02 2.907e+02 3.762e+02 6.510e+02, threshold=5.815e+02, percent-clipped=0.0 2024-09-17 15:03:11,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=210000.0, ans=0.0 2024-09-17 15:03:57,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=210140.0, ans=0.1 2024-09-17 15:04:10,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=210140.0, ans=0.125 2024-09-17 15:04:13,279 INFO [train.py:1198] (1/2) Epoch 12, batch 2400, loss[loss=0.2551, simple_loss=0.2887, pruned_loss=0.08454, ctc_loss=0.169, cr_loss=0.4671, over 34582.00 frames. ], tot_loss[loss=0.2531, simple_loss=0.2927, pruned_loss=0.0818, ctc_loss=0.163, cr_loss=0.4331, over 6777236.41 frames. ], batch size: 89, lr: 1.00e-02, grad_scale: 32.0 2024-09-17 15:04:25,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=210186.66666666666, ans=0.1 2024-09-17 15:04:30,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=210233.33333333334, ans=0.125 2024-09-17 15:04:31,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2024-09-17 15:04:43,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=210233.33333333334, ans=0.0 2024-09-17 15:04:51,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=210280.0, ans=0.125 2024-09-17 15:05:13,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=210326.66666666666, ans=0.125 2024-09-17 15:05:19,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=210373.33333333334, ans=0.035 2024-09-17 15:05:26,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=210373.33333333334, ans=12.0 2024-09-17 15:05:37,384 INFO [train.py:1198] (1/2) Epoch 12, batch 2450, loss[loss=0.2455, simple_loss=0.2919, pruned_loss=0.07564, ctc_loss=0.154, cr_loss=0.4239, over 34439.00 frames. ], tot_loss[loss=0.2542, simple_loss=0.2936, pruned_loss=0.08227, ctc_loss=0.1639, cr_loss=0.4341, over 6751770.55 frames. ], batch size: 95, lr: 1.00e-02, grad_scale: 32.0 2024-09-17 15:05:39,676 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=15.0 2024-09-17 15:05:53,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=210466.66666666666, ans=0.0 2024-09-17 15:05:55,184 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.114e+02 2.774e+02 3.458e+02 4.294e+02 7.468e+02, threshold=6.917e+02, percent-clipped=6.0 2024-09-17 15:06:15,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-09-17 15:06:20,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=210513.33333333334, ans=0.0 2024-09-17 15:06:25,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=210560.0, ans=0.0 2024-09-17 15:06:30,582 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-09-17 15:06:53,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=210606.66666666666, ans=0.0 2024-09-17 15:06:53,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=210606.66666666666, ans=0.0 2024-09-17 15:06:58,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=210653.33333333334, ans=0.0 2024-09-17 15:06:59,412 INFO [train.py:1198] (1/2) Epoch 12, batch 2500, loss[loss=0.2638, simple_loss=0.3071, pruned_loss=0.08472, ctc_loss=0.1662, cr_loss=0.4437, over 34414.00 frames. ], tot_loss[loss=0.2546, simple_loss=0.2939, pruned_loss=0.08248, ctc_loss=0.1642, cr_loss=0.4342, over 6764460.95 frames. ], batch size: 100, lr: 1.00e-02, grad_scale: 32.0 2024-09-17 15:07:26,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=210700.0, ans=0.1 2024-09-17 15:07:41,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=210746.66666666666, ans=0.0 2024-09-17 15:08:18,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=210840.0, ans=0.125 2024-09-17 15:08:21,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=210840.0, ans=0.125 2024-09-17 15:08:26,194 INFO [train.py:1198] (1/2) Epoch 12, batch 2550, loss[loss=0.2311, simple_loss=0.2666, pruned_loss=0.07484, ctc_loss=0.1482, cr_loss=0.4047, over 34128.00 frames. ], tot_loss[loss=0.2544, simple_loss=0.2938, pruned_loss=0.0824, ctc_loss=0.164, cr_loss=0.4341, over 6768832.52 frames. ], batch size: 78, lr: 1.00e-02, grad_scale: 32.0 2024-09-17 15:08:44,684 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.387e+02 2.756e+02 3.545e+02 6.201e+02, threshold=5.512e+02, percent-clipped=0.0 2024-09-17 15:08:56,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2024-09-17 15:09:08,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=210980.0, ans=0.025 2024-09-17 15:09:14,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=211026.66666666666, ans=0.015 2024-09-17 15:09:14,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=211026.66666666666, ans=0.125 2024-09-17 15:09:29,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=211026.66666666666, ans=0.1 2024-09-17 15:09:49,172 INFO [train.py:1198] (1/2) Epoch 12, batch 2600, loss[loss=0.2493, simple_loss=0.2812, pruned_loss=0.08352, ctc_loss=0.1634, cr_loss=0.4419, over 34342.00 frames. ], tot_loss[loss=0.2549, simple_loss=0.2942, pruned_loss=0.08266, ctc_loss=0.1646, cr_loss=0.4354, over 6764132.52 frames. ], batch size: 91, lr: 1.00e-02, grad_scale: 32.0 2024-09-17 15:09:56,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=211120.0, ans=0.125 2024-09-17 15:10:00,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=211120.0, ans=0.125 2024-09-17 15:10:08,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=211166.66666666666, ans=0.125 2024-09-17 15:10:18,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=211166.66666666666, ans=0.0 2024-09-17 15:10:40,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.14 vs. limit=10.0 2024-09-17 15:10:48,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=211260.0, ans=0.0 2024-09-17 15:11:05,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=211306.66666666666, ans=0.0 2024-09-17 15:11:11,129 INFO [train.py:1198] (1/2) Epoch 12, batch 2650, loss[loss=0.2608, simple_loss=0.3046, pruned_loss=0.08322, ctc_loss=0.1649, cr_loss=0.441, over 34216.00 frames. ], tot_loss[loss=0.2543, simple_loss=0.2939, pruned_loss=0.08228, ctc_loss=0.1639, cr_loss=0.4353, over 6770506.22 frames. ], batch size: 117, lr: 1.00e-02, grad_scale: 32.0 2024-09-17 15:11:26,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=211353.33333333334, ans=0.0 2024-09-17 15:11:31,439 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.151e+02 2.488e+02 2.915e+02 3.784e+02 7.687e+02, threshold=5.830e+02, percent-clipped=8.0 2024-09-17 15:11:31,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=211400.0, ans=0.09899494936611666 2024-09-17 15:11:50,256 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.36 vs. limit=15.0 2024-09-17 15:12:26,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=211540.0, ans=0.125 2024-09-17 15:12:37,455 INFO [train.py:1198] (1/2) Epoch 12, batch 2700, loss[loss=0.2575, simple_loss=0.3033, pruned_loss=0.08094, ctc_loss=0.1618, cr_loss=0.4356, over 34631.00 frames. ], tot_loss[loss=0.2544, simple_loss=0.2941, pruned_loss=0.08229, ctc_loss=0.1639, cr_loss=0.4353, over 6764995.93 frames. ], batch size: 102, lr: 1.00e-02, grad_scale: 16.0 2024-09-17 15:12:46,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2024-09-17 15:12:55,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=211633.33333333334, ans=0.125 2024-09-17 15:13:35,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=211726.66666666666, ans=0.1 2024-09-17 15:14:00,130 INFO [train.py:1198] (1/2) Epoch 12, batch 2750, loss[loss=0.2576, simple_loss=0.2904, pruned_loss=0.08616, ctc_loss=0.1731, cr_loss=0.4481, over 34644.00 frames. ], tot_loss[loss=0.2534, simple_loss=0.2929, pruned_loss=0.08189, ctc_loss=0.1633, cr_loss=0.4338, over 6763201.40 frames. ], batch size: 88, lr: 1.00e-02, grad_scale: 16.0 2024-09-17 15:14:12,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=211820.0, ans=0.125 2024-09-17 15:14:16,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.67 vs. limit=15.0 2024-09-17 15:14:17,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=211866.66666666666, ans=0.125 2024-09-17 15:14:20,718 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.114e+02 2.646e+02 3.191e+02 4.237e+02 1.004e+03, threshold=6.382e+02, percent-clipped=7.0 2024-09-17 15:14:53,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=211960.0, ans=0.0 2024-09-17 15:14:55,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=211960.0, ans=0.125 2024-09-17 15:15:05,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=212006.66666666666, ans=0.125 2024-09-17 15:15:11,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=212006.66666666666, ans=0.125 2024-09-17 15:15:27,777 INFO [train.py:1198] (1/2) Epoch 12, batch 2800, loss[loss=0.3151, simple_loss=0.336, pruned_loss=0.1156, ctc_loss=0.2229, cr_loss=0.4579, over 23950.00 frames. ], tot_loss[loss=0.2538, simple_loss=0.2932, pruned_loss=0.08212, ctc_loss=0.1636, cr_loss=0.434, over 6741729.21 frames. ], batch size: 245, lr: 1.00e-02, grad_scale: 32.0 2024-09-17 15:15:31,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=212053.33333333334, ans=0.025 2024-09-17 15:16:50,067 INFO [train.py:1198] (1/2) Epoch 12, batch 2850, loss[loss=0.2559, simple_loss=0.2976, pruned_loss=0.08245, ctc_loss=0.1603, cr_loss=0.4292, over 34497.00 frames. ], tot_loss[loss=0.2545, simple_loss=0.2939, pruned_loss=0.08244, ctc_loss=0.1641, cr_loss=0.4339, over 6725690.87 frames. ], batch size: 90, lr: 9.99e-03, grad_scale: 32.0 2024-09-17 15:16:57,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.39 vs. limit=15.0 2024-09-17 15:17:06,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=22.5 2024-09-17 15:17:09,858 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.127e+02 2.594e+02 3.144e+02 3.929e+02 7.123e+02, threshold=6.287e+02, percent-clipped=4.0 2024-09-17 15:17:15,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=212333.33333333334, ans=0.125 2024-09-17 15:17:25,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=212380.0, ans=0.2 2024-09-17 15:18:01,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=212473.33333333334, ans=0.125 2024-09-17 15:18:02,369 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.47 vs. limit=15.0 2024-09-17 15:18:06,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=212473.33333333334, ans=0.125 2024-09-17 15:18:12,787 INFO [train.py:1198] (1/2) Epoch 12, batch 2900, loss[loss=0.2629, simple_loss=0.2966, pruned_loss=0.08827, ctc_loss=0.1736, cr_loss=0.4494, over 34519.00 frames. ], tot_loss[loss=0.2553, simple_loss=0.2949, pruned_loss=0.08271, ctc_loss=0.1646, cr_loss=0.4356, over 6755989.67 frames. ], batch size: 94, lr: 9.99e-03, grad_scale: 32.0 2024-09-17 15:18:24,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=212520.0, ans=0.125 2024-09-17 15:18:26,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=212520.0, ans=0.2 2024-09-17 15:18:26,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2024-09-17 15:18:26,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=212520.0, ans=15.0 2024-09-17 15:18:44,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=212613.33333333334, ans=0.0 2024-09-17 15:19:05,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=212660.0, ans=0.125 2024-09-17 15:19:39,325 INFO [train.py:1198] (1/2) Epoch 12, batch 2950, loss[loss=0.2468, simple_loss=0.2831, pruned_loss=0.0807, ctc_loss=0.1634, cr_loss=0.4084, over 34664.00 frames. ], tot_loss[loss=0.2537, simple_loss=0.2933, pruned_loss=0.08202, ctc_loss=0.1635, cr_loss=0.4333, over 6750465.97 frames. ], batch size: 88, lr: 9.98e-03, grad_scale: 32.0 2024-09-17 15:19:42,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=212753.33333333334, ans=0.1 2024-09-17 15:19:52,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=212753.33333333334, ans=0.0 2024-09-17 15:19:56,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.84 vs. limit=15.0 2024-09-17 15:19:59,313 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.048e+02 2.633e+02 3.150e+02 4.160e+02 8.561e+02, threshold=6.300e+02, percent-clipped=3.0 2024-09-17 15:20:15,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2024-09-17 15:20:21,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=212846.66666666666, ans=0.125 2024-09-17 15:21:02,315 INFO [train.py:1198] (1/2) Epoch 12, batch 3000, loss[loss=0.244, simple_loss=0.2879, pruned_loss=0.07641, ctc_loss=0.1543, cr_loss=0.4113, over 34536.00 frames. ], tot_loss[loss=0.2534, simple_loss=0.293, pruned_loss=0.08189, ctc_loss=0.1634, cr_loss=0.4331, over 6750057.42 frames. ], batch size: 94, lr: 9.98e-03, grad_scale: 32.0 2024-09-17 15:21:02,315 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 15:21:19,315 INFO [train.py:1230] (1/2) Epoch 12, validation: loss=0.1535, simple_loss=0.2515, pruned_loss=0.02314, ctc_loss=0.04642, cr_loss=1.693e-14, over 944034.00 frames. 2024-09-17 15:21:19,316 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 15:21:33,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=212986.66666666666, ans=0.0 2024-09-17 15:21:35,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=213033.33333333334, ans=0.0 2024-09-17 15:22:33,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.45 vs. limit=15.0 2024-09-17 15:22:40,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.28 vs. limit=6.0 2024-09-17 15:22:40,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2024-09-17 15:22:40,906 INFO [train.py:1198] (1/2) Epoch 12, batch 3050, loss[loss=0.2434, simple_loss=0.2822, pruned_loss=0.07814, ctc_loss=0.1562, cr_loss=0.4256, over 34587.00 frames. ], tot_loss[loss=0.2547, simple_loss=0.2941, pruned_loss=0.08253, ctc_loss=0.1646, cr_loss=0.4343, over 6741508.74 frames. ], batch size: 89, lr: 9.97e-03, grad_scale: 32.0 2024-09-17 15:22:59,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=213266.66666666666, ans=0.125 2024-09-17 15:23:00,580 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.060e+02 2.534e+02 3.011e+02 3.544e+02 8.971e+02, threshold=6.022e+02, percent-clipped=1.0 2024-09-17 15:23:00,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=213266.66666666666, ans=0.0 2024-09-17 15:23:05,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=213266.66666666666, ans=0.125 2024-09-17 15:23:10,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=213266.66666666666, ans=0.125 2024-09-17 15:23:23,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=213313.33333333334, ans=0.125 2024-09-17 15:23:29,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=213313.33333333334, ans=10.0 2024-09-17 15:23:38,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.98 vs. limit=22.5 2024-09-17 15:24:05,334 INFO [train.py:1198] (1/2) Epoch 12, batch 3100, loss[loss=0.2554, simple_loss=0.2964, pruned_loss=0.08126, ctc_loss=0.1685, cr_loss=0.4569, over 34208.00 frames. ], tot_loss[loss=0.2543, simple_loss=0.2936, pruned_loss=0.08241, ctc_loss=0.1643, cr_loss=0.434, over 6741698.72 frames. ], batch size: 117, lr: 9.97e-03, grad_scale: 32.0 2024-09-17 15:24:33,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=213500.0, ans=0.125 2024-09-17 15:24:44,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2024-09-17 15:24:51,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.78 vs. limit=15.0 2024-09-17 15:25:07,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=213593.33333333334, ans=0.0 2024-09-17 15:25:21,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=213640.0, ans=0.2 2024-09-17 15:25:25,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.64 vs. limit=22.5 2024-09-17 15:25:26,141 INFO [train.py:1198] (1/2) Epoch 12, batch 3150, loss[loss=0.2759, simple_loss=0.3118, pruned_loss=0.0917, ctc_loss=0.1828, cr_loss=0.5014, over 33821.00 frames. ], tot_loss[loss=0.2541, simple_loss=0.2935, pruned_loss=0.08224, ctc_loss=0.164, cr_loss=0.4339, over 6748389.94 frames. ], batch size: 122, lr: 9.96e-03, grad_scale: 32.0 2024-09-17 15:25:42,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=213733.33333333334, ans=0.2 2024-09-17 15:25:43,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=213733.33333333334, ans=0.0 2024-09-17 15:25:45,983 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.486e+02 2.989e+02 3.954e+02 8.227e+02, threshold=5.978e+02, percent-clipped=4.0 2024-09-17 15:25:59,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=213780.0, ans=0.025 2024-09-17 15:26:25,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.55 vs. limit=12.0 2024-09-17 15:26:26,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=213826.66666666666, ans=0.5 2024-09-17 15:26:31,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=213873.33333333334, ans=0.025 2024-09-17 15:26:46,108 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2024-09-17 15:26:47,055 INFO [train.py:1198] (1/2) Epoch 12, batch 3200, loss[loss=0.2547, simple_loss=0.2966, pruned_loss=0.08184, ctc_loss=0.1604, cr_loss=0.4243, over 34527.00 frames. ], tot_loss[loss=0.2525, simple_loss=0.2924, pruned_loss=0.08146, ctc_loss=0.1626, cr_loss=0.4313, over 6761367.75 frames. ], batch size: 94, lr: 9.96e-03, grad_scale: 32.0 2024-09-17 15:26:49,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=22.5 2024-09-17 15:27:10,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=213966.66666666666, ans=0.2 2024-09-17 15:27:11,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=213966.66666666666, ans=0.0 2024-09-17 15:27:17,219 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.98 vs. limit=15.0 2024-09-17 15:27:21,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=214013.33333333334, ans=0.0 2024-09-17 15:27:29,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=214013.33333333334, ans=0.09899494936611666 2024-09-17 15:27:42,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214060.0, ans=0.1 2024-09-17 15:28:08,408 INFO [train.py:1198] (1/2) Epoch 12, batch 3250, loss[loss=0.2633, simple_loss=0.3074, pruned_loss=0.08392, ctc_loss=0.1672, cr_loss=0.4483, over 34671.00 frames. ], tot_loss[loss=0.253, simple_loss=0.2929, pruned_loss=0.08165, ctc_loss=0.1629, cr_loss=0.432, over 6770341.92 frames. ], batch size: 98, lr: 9.95e-03, grad_scale: 32.0 2024-09-17 15:28:15,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=22.5 2024-09-17 15:28:23,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=214200.0, ans=0.125 2024-09-17 15:28:26,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=214200.0, ans=0.125 2024-09-17 15:28:27,988 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.055e+02 2.578e+02 3.120e+02 4.237e+02 7.337e+02, threshold=6.240e+02, percent-clipped=7.0 2024-09-17 15:28:39,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=214246.66666666666, ans=0.125 2024-09-17 15:28:44,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2024-09-17 15:28:46,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=214246.66666666666, ans=0.125 2024-09-17 15:29:02,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=214293.33333333334, ans=0.0 2024-09-17 15:29:09,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=214293.33333333334, ans=0.125 2024-09-17 15:29:32,344 INFO [train.py:1198] (1/2) Epoch 12, batch 3300, loss[loss=0.2601, simple_loss=0.3037, pruned_loss=0.08232, ctc_loss=0.1696, cr_loss=0.4468, over 33113.00 frames. ], tot_loss[loss=0.2516, simple_loss=0.2916, pruned_loss=0.08104, ctc_loss=0.1618, cr_loss=0.4302, over 6768637.94 frames. ], batch size: 130, lr: 9.95e-03, grad_scale: 32.0 2024-09-17 15:29:41,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.62 vs. limit=15.0 2024-09-17 15:30:09,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=214480.0, ans=0.0 2024-09-17 15:30:11,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=214480.0, ans=0.125 2024-09-17 15:30:27,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=214526.66666666666, ans=0.125 2024-09-17 15:30:30,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=214526.66666666666, ans=0.0 2024-09-17 15:30:37,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=214573.33333333334, ans=0.125 2024-09-17 15:30:53,116 INFO [train.py:1198] (1/2) Epoch 12, batch 3350, loss[loss=0.2654, simple_loss=0.3049, pruned_loss=0.08612, ctc_loss=0.1747, cr_loss=0.4646, over 33896.00 frames. ], tot_loss[loss=0.2529, simple_loss=0.2926, pruned_loss=0.08165, ctc_loss=0.163, cr_loss=0.4323, over 6743875.07 frames. ], batch size: 122, lr: 9.94e-03, grad_scale: 32.0 2024-09-17 15:31:13,173 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.556e+02 3.005e+02 3.585e+02 7.653e+02, threshold=6.009e+02, percent-clipped=2.0 2024-09-17 15:31:31,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214713.33333333334, ans=0.1 2024-09-17 15:31:32,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=214713.33333333334, ans=0.2 2024-09-17 15:31:39,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=214713.33333333334, ans=0.125 2024-09-17 15:31:43,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=214760.0, ans=0.125 2024-09-17 15:31:45,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=214760.0, ans=0.0 2024-09-17 15:32:08,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.48 vs. limit=15.0 2024-09-17 15:32:14,194 INFO [train.py:1198] (1/2) Epoch 12, batch 3400, loss[loss=0.2083, simple_loss=0.2519, pruned_loss=0.06206, ctc_loss=0.1308, cr_loss=0.3597, over 34165.00 frames. ], tot_loss[loss=0.2533, simple_loss=0.2926, pruned_loss=0.08198, ctc_loss=0.1635, cr_loss=0.4329, over 6733876.80 frames. ], batch size: 78, lr: 9.94e-03, grad_scale: 32.0 2024-09-17 15:32:17,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=214853.33333333334, ans=0.125 2024-09-17 15:32:38,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=15.0 2024-09-17 15:32:38,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214900.0, ans=0.1 2024-09-17 15:32:40,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=214900.0, ans=0.0 2024-09-17 15:32:55,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=214946.66666666666, ans=0.125 2024-09-17 15:33:11,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=214993.33333333334, ans=0.125 2024-09-17 15:33:32,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215040.0, ans=0.1 2024-09-17 15:33:35,104 INFO [train.py:1198] (1/2) Epoch 12, batch 3450, loss[loss=0.2762, simple_loss=0.3128, pruned_loss=0.09192, ctc_loss=0.1847, cr_loss=0.4734, over 33058.00 frames. ], tot_loss[loss=0.2535, simple_loss=0.293, pruned_loss=0.08199, ctc_loss=0.1635, cr_loss=0.4336, over 6745544.48 frames. ], batch size: 130, lr: 9.93e-03, grad_scale: 32.0 2024-09-17 15:33:36,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=215086.66666666666, ans=0.125 2024-09-17 15:33:55,531 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.748e+02 3.214e+02 4.372e+02 6.395e+02, threshold=6.428e+02, percent-clipped=2.0 2024-09-17 15:34:07,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=215180.0, ans=0.025 2024-09-17 15:34:22,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=215180.0, ans=0.125 2024-09-17 15:34:34,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=15.0 2024-09-17 15:34:38,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=215226.66666666666, ans=0.0 2024-09-17 15:34:56,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=215320.0, ans=0.125 2024-09-17 15:34:57,745 INFO [train.py:1198] (1/2) Epoch 12, batch 3500, loss[loss=0.2226, simple_loss=0.2668, pruned_loss=0.06772, ctc_loss=0.1368, cr_loss=0.3919, over 34485.00 frames. ], tot_loss[loss=0.2525, simple_loss=0.2921, pruned_loss=0.08156, ctc_loss=0.1627, cr_loss=0.4321, over 6748452.65 frames. ], batch size: 85, lr: 9.93e-03, grad_scale: 16.0 2024-09-17 15:35:46,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=215460.0, ans=0.0 2024-09-17 15:36:10,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=215506.66666666666, ans=0.0 2024-09-17 15:36:18,316 INFO [train.py:1198] (1/2) Epoch 12, batch 3550, loss[loss=0.2649, simple_loss=0.3105, pruned_loss=0.08354, ctc_loss=0.1709, cr_loss=0.4493, over 34376.00 frames. ], tot_loss[loss=0.2524, simple_loss=0.2921, pruned_loss=0.08142, ctc_loss=0.1626, cr_loss=0.4327, over 6758273.80 frames. ], batch size: 103, lr: 9.92e-03, grad_scale: 16.0 2024-09-17 15:36:27,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2024-09-17 15:36:39,374 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.713e+02 3.391e+02 4.714e+02 7.091e+02, threshold=6.782e+02, percent-clipped=4.0 2024-09-17 15:36:39,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215600.0, ans=0.1 2024-09-17 15:37:38,664 INFO [train.py:1198] (1/2) Epoch 12, batch 3600, loss[loss=0.2463, simple_loss=0.2864, pruned_loss=0.07893, ctc_loss=0.157, cr_loss=0.4224, over 34476.00 frames. ], tot_loss[loss=0.2528, simple_loss=0.2925, pruned_loss=0.08159, ctc_loss=0.1629, cr_loss=0.4336, over 6767966.33 frames. ], batch size: 90, lr: 9.91e-03, grad_scale: 32.0 2024-09-17 15:37:47,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.18 vs. limit=10.0 2024-09-17 15:37:47,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.58 vs. limit=6.0 2024-09-17 15:38:02,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=215833.33333333334, ans=0.0 2024-09-17 15:38:02,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=215833.33333333334, ans=0.125 2024-09-17 15:38:07,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=10.01 vs. limit=15.0 2024-09-17 15:38:27,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215926.66666666666, ans=0.1 2024-09-17 15:38:35,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=215926.66666666666, ans=0.125 2024-09-17 15:38:37,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2024-09-17 15:39:01,030 INFO [train.py:1198] (1/2) Epoch 12, batch 3650, loss[loss=0.2687, simple_loss=0.3065, pruned_loss=0.08816, ctc_loss=0.1766, cr_loss=0.479, over 34422.00 frames. ], tot_loss[loss=0.252, simple_loss=0.2917, pruned_loss=0.08128, ctc_loss=0.1621, cr_loss=0.4314, over 6770674.49 frames. ], batch size: 110, lr: 9.91e-03, grad_scale: 32.0 2024-09-17 15:39:22,204 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.043e+02 2.718e+02 3.675e+02 4.656e+02 8.560e+02, threshold=7.350e+02, percent-clipped=2.0 2024-09-17 15:39:22,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=216066.66666666666, ans=0.0 2024-09-17 15:39:25,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=216066.66666666666, ans=0.1 2024-09-17 15:39:51,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=216160.0, ans=0.0 2024-09-17 15:40:02,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=216160.0, ans=0.025 2024-09-17 15:40:05,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=216206.66666666666, ans=0.0 2024-09-17 15:40:12,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=216206.66666666666, ans=0.2 2024-09-17 15:40:17,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=216206.66666666666, ans=0.0 2024-09-17 15:40:21,572 INFO [train.py:1198] (1/2) Epoch 12, batch 3700, loss[loss=0.2567, simple_loss=0.3016, pruned_loss=0.08112, ctc_loss=0.1625, cr_loss=0.4268, over 34587.00 frames. ], tot_loss[loss=0.2517, simple_loss=0.2919, pruned_loss=0.08098, ctc_loss=0.1618, cr_loss=0.4316, over 6784949.48 frames. ], batch size: 102, lr: 9.90e-03, grad_scale: 32.0 2024-09-17 15:40:23,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.71 vs. limit=15.0 2024-09-17 15:40:50,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=216300.0, ans=0.125 2024-09-17 15:40:51,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.24 vs. limit=10.0 2024-09-17 15:41:02,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=216346.66666666666, ans=0.0 2024-09-17 15:41:32,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=216440.0, ans=0.0 2024-09-17 15:41:35,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=216440.0, ans=10.0 2024-09-17 15:41:43,556 INFO [train.py:1198] (1/2) Epoch 12, batch 3750, loss[loss=0.274, simple_loss=0.31, pruned_loss=0.09141, ctc_loss=0.1844, cr_loss=0.4598, over 34399.00 frames. ], tot_loss[loss=0.2557, simple_loss=0.2955, pruned_loss=0.08268, ctc_loss=0.1648, cr_loss=0.4371, over 6785574.23 frames. ], batch size: 113, lr: 9.90e-03, grad_scale: 16.0 2024-09-17 15:42:06,249 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.384e+02 2.643e+02 3.114e+02 6.212e+02, threshold=5.287e+02, percent-clipped=0.0 2024-09-17 15:42:06,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=216533.33333333334, ans=0.0 2024-09-17 15:42:13,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=216533.33333333334, ans=0.125 2024-09-17 15:43:04,093 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.58 vs. limit=22.5 2024-09-17 15:43:04,838 INFO [train.py:1198] (1/2) Epoch 12, batch 3800, loss[loss=0.2938, simple_loss=0.3183, pruned_loss=0.1048, ctc_loss=0.198, cr_loss=0.4993, over 30313.00 frames. ], tot_loss[loss=0.2594, simple_loss=0.2983, pruned_loss=0.08459, ctc_loss=0.1682, cr_loss=0.4409, over 6674814.06 frames. ], batch size: 176, lr: 9.89e-03, grad_scale: 16.0 2024-09-17 15:43:22,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=216766.66666666666, ans=0.09899494936611666 2024-09-17 15:43:54,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.51 vs. limit=12.0 2024-09-17 15:43:55,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=216860.0, ans=0.2 2024-09-17 15:44:19,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=216906.66666666666, ans=0.0 2024-09-17 15:44:29,506 INFO [train.py:1198] (1/2) Epoch 12, batch 3850, loss[loss=0.2791, simple_loss=0.3077, pruned_loss=0.09695, ctc_loss=0.1939, cr_loss=0.4473, over 23345.00 frames. ], tot_loss[loss=0.2657, simple_loss=0.3021, pruned_loss=0.08823, ctc_loss=0.1754, cr_loss=0.4454, over 6247015.68 frames. ], batch size: 244, lr: 9.89e-03, grad_scale: 16.0 2024-09-17 15:44:44,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=217000.0, ans=0.035 2024-09-17 15:44:52,447 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.240e+02 2.605e+02 2.843e+02 3.194e+02 4.468e+02, threshold=5.685e+02, percent-clipped=0.0 2024-09-17 15:44:53,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=22.5 2024-09-17 15:46:05,774 INFO [train.py:1198] (1/2) Epoch 13, batch 0, loss[loss=0.2291, simple_loss=0.2731, pruned_loss=0.07043, ctc_loss=0.1425, cr_loss=0.3947, over 34482.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2731, pruned_loss=0.07043, ctc_loss=0.1425, cr_loss=0.3947, over 34482.00 frames. ], batch size: 85, lr: 9.50e-03, grad_scale: 32.0 2024-09-17 15:46:05,774 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 15:46:22,740 INFO [train.py:1230] (1/2) Epoch 13, validation: loss=0.1549, simple_loss=0.2538, pruned_loss=0.02332, ctc_loss=0.04642, cr_loss=1.645e-14, over 944034.00 frames. 2024-09-17 15:46:22,741 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 15:46:48,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=217121.33333333334, ans=0.0 2024-09-17 15:46:55,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=217168.0, ans=0.125 2024-09-17 15:47:15,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=217214.66666666666, ans=0.2 2024-09-17 15:47:40,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2024-09-17 15:47:46,325 INFO [train.py:1198] (1/2) Epoch 13, batch 50, loss[loss=0.2512, simple_loss=0.2874, pruned_loss=0.08269, ctc_loss=0.1651, cr_loss=0.418, over 34475.00 frames. ], tot_loss[loss=0.2567, simple_loss=0.2959, pruned_loss=0.08347, ctc_loss=0.1657, cr_loss=0.4376, over 1479655.14 frames. ], batch size: 82, lr: 9.50e-03, grad_scale: 32.0 2024-09-17 15:48:02,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=217354.66666666666, ans=0.125 2024-09-17 15:48:04,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=217354.66666666666, ans=0.1 2024-09-17 15:48:06,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.19 vs. limit=15.0 2024-09-17 15:48:13,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=217354.66666666666, ans=0.0 2024-09-17 15:48:18,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=217401.33333333334, ans=0.125 2024-09-17 15:48:18,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.04 vs. limit=10.0 2024-09-17 15:48:22,880 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 15:48:28,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=217401.33333333334, ans=0.2 2024-09-17 15:48:39,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=217448.0, ans=0.125 2024-09-17 15:48:44,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217448.0, ans=0.1 2024-09-17 15:48:51,084 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.432e+02 2.711e+02 3.216e+02 5.570e+02, threshold=5.423e+02, percent-clipped=0.0 2024-09-17 15:48:54,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=217494.66666666666, ans=0.125 2024-09-17 15:49:03,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=217494.66666666666, ans=0.07 2024-09-17 15:49:11,176 INFO [train.py:1198] (1/2) Epoch 13, batch 100, loss[loss=0.2412, simple_loss=0.2816, pruned_loss=0.07688, ctc_loss=0.1534, cr_loss=0.4102, over 34607.00 frames. ], tot_loss[loss=0.2572, simple_loss=0.2967, pruned_loss=0.08345, ctc_loss=0.1663, cr_loss=0.4391, over 2630142.11 frames. ], batch size: 89, lr: 9.49e-03, grad_scale: 32.0 2024-09-17 15:49:29,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=217588.0, ans=0.125 2024-09-17 15:49:31,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=217588.0, ans=0.125 2024-09-17 15:49:36,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217588.0, ans=0.1 2024-09-17 15:49:37,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=217588.0, ans=0.0 2024-09-17 15:50:15,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=217728.0, ans=0.2 2024-09-17 15:50:18,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=217728.0, ans=0.2 2024-09-17 15:50:20,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=217728.0, ans=0.0 2024-09-17 15:50:21,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.26 vs. limit=22.5 2024-09-17 15:50:35,081 INFO [train.py:1198] (1/2) Epoch 13, batch 150, loss[loss=0.2278, simple_loss=0.2661, pruned_loss=0.07224, ctc_loss=0.145, cr_loss=0.4037, over 34475.00 frames. ], tot_loss[loss=0.2534, simple_loss=0.2935, pruned_loss=0.08164, ctc_loss=0.1632, cr_loss=0.435, over 3557430.88 frames. ], batch size: 82, lr: 9.49e-03, grad_scale: 32.0 2024-09-17 15:51:05,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=217821.33333333334, ans=0.125 2024-09-17 15:51:08,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=217868.0, ans=0.125 2024-09-17 15:51:23,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=217914.66666666666, ans=0.2 2024-09-17 15:51:26,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=217914.66666666666, ans=0.05 2024-09-17 15:51:26,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=217914.66666666666, ans=0.1 2024-09-17 15:51:29,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=217914.66666666666, ans=0.125 2024-09-17 15:51:32,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.27 vs. limit=15.0 2024-09-17 15:51:33,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=217914.66666666666, ans=0.125 2024-09-17 15:51:37,729 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.772e+02 3.513e+02 4.477e+02 7.448e+02, threshold=7.025e+02, percent-clipped=14.0 2024-09-17 15:51:51,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=217961.33333333334, ans=0.125 2024-09-17 15:51:57,375 INFO [train.py:1198] (1/2) Epoch 13, batch 200, loss[loss=0.2587, simple_loss=0.3006, pruned_loss=0.08241, ctc_loss=0.1697, cr_loss=0.4477, over 31692.00 frames. ], tot_loss[loss=0.2519, simple_loss=0.292, pruned_loss=0.08104, ctc_loss=0.1621, cr_loss=0.4328, over 4271668.80 frames. ], batch size: 145, lr: 9.48e-03, grad_scale: 32.0 2024-09-17 15:52:27,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-17 15:52:37,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=218101.33333333334, ans=0.1 2024-09-17 15:53:04,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.37 vs. limit=12.0 2024-09-17 15:53:22,145 INFO [train.py:1198] (1/2) Epoch 13, batch 250, loss[loss=0.2598, simple_loss=0.3017, pruned_loss=0.08333, ctc_loss=0.1674, cr_loss=0.4435, over 34203.00 frames. ], tot_loss[loss=0.2509, simple_loss=0.2913, pruned_loss=0.08052, ctc_loss=0.1612, cr_loss=0.4314, over 4834239.00 frames. ], batch size: 117, lr: 9.48e-03, grad_scale: 32.0 2024-09-17 15:53:25,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=218241.33333333334, ans=0.2 2024-09-17 15:54:18,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.27 vs. limit=22.5 2024-09-17 15:54:23,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=218381.33333333334, ans=0.125 2024-09-17 15:54:27,552 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.096e+02 2.492e+02 3.060e+02 4.107e+02 6.340e+02, threshold=6.121e+02, percent-clipped=0.0 2024-09-17 15:54:32,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=218428.0, ans=0.125 2024-09-17 15:54:38,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.53 vs. limit=10.0 2024-09-17 15:54:47,400 INFO [train.py:1198] (1/2) Epoch 13, batch 300, loss[loss=0.2778, simple_loss=0.3109, pruned_loss=0.09387, ctc_loss=0.186, cr_loss=0.4925, over 34354.00 frames. ], tot_loss[loss=0.2499, simple_loss=0.2903, pruned_loss=0.08007, ctc_loss=0.1603, cr_loss=0.4299, over 5261101.88 frames. ], batch size: 107, lr: 9.47e-03, grad_scale: 32.0 2024-09-17 15:54:56,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=218474.66666666666, ans=0.125 2024-09-17 15:55:19,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=218568.0, ans=0.2 2024-09-17 15:55:29,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.74 vs. limit=5.0 2024-09-17 15:55:32,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.63 vs. limit=12.0 2024-09-17 15:56:00,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=218661.33333333334, ans=0.0 2024-09-17 15:56:00,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=218661.33333333334, ans=0.125 2024-09-17 15:56:06,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=218661.33333333334, ans=0.1 2024-09-17 15:56:11,537 INFO [train.py:1198] (1/2) Epoch 13, batch 350, loss[loss=0.244, simple_loss=0.2817, pruned_loss=0.07888, ctc_loss=0.1574, cr_loss=0.4255, over 34291.00 frames. ], tot_loss[loss=0.2505, simple_loss=0.2909, pruned_loss=0.08034, ctc_loss=0.1608, cr_loss=0.4311, over 5596916.96 frames. ], batch size: 83, lr: 9.47e-03, grad_scale: 16.0 2024-09-17 15:56:18,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=218708.0, ans=0.0 2024-09-17 15:56:22,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.87 vs. limit=15.0 2024-09-17 15:56:44,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.52 vs. limit=15.0 2024-09-17 15:56:50,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=218801.33333333334, ans=0.125 2024-09-17 15:56:53,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.19 vs. limit=15.0 2024-09-17 15:57:01,774 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.91 vs. limit=22.5 2024-09-17 15:57:15,628 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.602e+02 3.163e+02 4.272e+02 7.295e+02, threshold=6.325e+02, percent-clipped=6.0 2024-09-17 15:57:33,646 INFO [train.py:1198] (1/2) Epoch 13, batch 400, loss[loss=0.2613, simple_loss=0.3031, pruned_loss=0.08395, ctc_loss=0.1652, cr_loss=0.4637, over 34411.00 frames. ], tot_loss[loss=0.2498, simple_loss=0.2903, pruned_loss=0.07998, ctc_loss=0.1603, cr_loss=0.4311, over 5864114.01 frames. ], batch size: 95, lr: 9.46e-03, grad_scale: 32.0 2024-09-17 15:57:34,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=218941.33333333334, ans=0.2 2024-09-17 15:57:42,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=218941.33333333334, ans=0.2 2024-09-17 15:57:57,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=218988.0, ans=0.0 2024-09-17 15:58:05,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=218988.0, ans=0.1 2024-09-17 15:58:16,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=219034.66666666666, ans=0.025 2024-09-17 15:58:44,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=219128.0, ans=0.05 2024-09-17 15:58:58,800 INFO [train.py:1198] (1/2) Epoch 13, batch 450, loss[loss=0.2656, simple_loss=0.3042, pruned_loss=0.08763, ctc_loss=0.1715, cr_loss=0.4386, over 34699.00 frames. ], tot_loss[loss=0.2504, simple_loss=0.2908, pruned_loss=0.08029, ctc_loss=0.1608, cr_loss=0.4318, over 6054142.31 frames. ], batch size: 97, lr: 9.46e-03, grad_scale: 16.0 2024-09-17 15:59:04,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=219174.66666666666, ans=0.0 2024-09-17 15:59:05,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=219174.66666666666, ans=0.0 2024-09-17 15:59:07,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=219174.66666666666, ans=0.125 2024-09-17 15:59:10,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=219174.66666666666, ans=0.125 2024-09-17 15:59:30,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=219268.0, ans=0.1 2024-09-17 15:59:35,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=219268.0, ans=0.125 2024-09-17 15:59:49,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2024-09-17 16:00:07,630 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.462e+02 2.831e+02 3.473e+02 6.860e+02, threshold=5.661e+02, percent-clipped=2.0 2024-09-17 16:00:24,125 INFO [train.py:1198] (1/2) Epoch 13, batch 500, loss[loss=0.2669, simple_loss=0.3056, pruned_loss=0.08747, ctc_loss=0.1739, cr_loss=0.4598, over 34497.00 frames. ], tot_loss[loss=0.2496, simple_loss=0.2902, pruned_loss=0.07993, ctc_loss=0.1601, cr_loss=0.4303, over 6222079.32 frames. ], batch size: 110, lr: 9.45e-03, grad_scale: 16.0 2024-09-17 16:00:24,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=219408.0, ans=0.125 2024-09-17 16:00:46,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.11 vs. limit=15.0 2024-09-17 16:00:47,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=219454.66666666666, ans=0.0 2024-09-17 16:00:49,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=219454.66666666666, ans=0.0 2024-09-17 16:00:50,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=219454.66666666666, ans=0.1 2024-09-17 16:00:53,974 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:01:32,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=219594.66666666666, ans=0.1 2024-09-17 16:01:46,667 INFO [train.py:1198] (1/2) Epoch 13, batch 550, loss[loss=0.2672, simple_loss=0.3109, pruned_loss=0.0866, ctc_loss=0.1705, cr_loss=0.4044, over 33891.00 frames. ], tot_loss[loss=0.2501, simple_loss=0.2905, pruned_loss=0.08016, ctc_loss=0.1604, cr_loss=0.4302, over 6332838.00 frames. ], batch size: 122, lr: 9.45e-03, grad_scale: 16.0 2024-09-17 16:01:48,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=219641.33333333334, ans=0.0 2024-09-17 16:02:06,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=219688.0, ans=0.125 2024-09-17 16:02:07,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=219688.0, ans=0.0 2024-09-17 16:02:12,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=219688.0, ans=0.125 2024-09-17 16:02:30,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=219734.66666666666, ans=10.0 2024-09-17 16:02:46,729 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.34 vs. limit=15.0 2024-09-17 16:02:49,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=219781.33333333334, ans=0.0 2024-09-17 16:02:49,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=219781.33333333334, ans=0.125 2024-09-17 16:02:55,621 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.516e+02 3.053e+02 3.829e+02 6.402e+02, threshold=6.106e+02, percent-clipped=3.0 2024-09-17 16:03:02,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=219828.0, ans=0.0 2024-09-17 16:03:12,228 INFO [train.py:1198] (1/2) Epoch 13, batch 600, loss[loss=0.2722, simple_loss=0.3157, pruned_loss=0.0871, ctc_loss=0.1783, cr_loss=0.4714, over 34218.00 frames. ], tot_loss[loss=0.2502, simple_loss=0.2908, pruned_loss=0.08019, ctc_loss=0.1604, cr_loss=0.4308, over 6433916.72 frames. ], batch size: 117, lr: 9.44e-03, grad_scale: 16.0 2024-09-17 16:03:40,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=219921.33333333334, ans=0.125 2024-09-17 16:03:55,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=219968.0, ans=0.125 2024-09-17 16:04:10,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=220014.66666666666, ans=0.0 2024-09-17 16:04:18,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=220061.33333333334, ans=0.0 2024-09-17 16:04:36,387 INFO [train.py:1198] (1/2) Epoch 13, batch 650, loss[loss=0.2496, simple_loss=0.2929, pruned_loss=0.07894, ctc_loss=0.1583, cr_loss=0.4207, over 34530.00 frames. ], tot_loss[loss=0.2489, simple_loss=0.2899, pruned_loss=0.07946, ctc_loss=0.159, cr_loss=0.4291, over 6525771.98 frames. ], batch size: 94, lr: 9.44e-03, grad_scale: 16.0 2024-09-17 16:04:38,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=220108.0, ans=0.125 2024-09-17 16:04:41,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=220108.0, ans=0.125 2024-09-17 16:04:53,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220154.66666666666, ans=0.1 2024-09-17 16:04:58,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=220154.66666666666, ans=0.0 2024-09-17 16:05:28,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=220248.0, ans=0.125 2024-09-17 16:05:35,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=220248.0, ans=0.07 2024-09-17 16:05:42,803 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.628e+02 3.394e+02 4.518e+02 9.217e+02, threshold=6.788e+02, percent-clipped=12.0 2024-09-17 16:06:00,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=220341.33333333334, ans=0.2 2024-09-17 16:06:02,006 INFO [train.py:1198] (1/2) Epoch 13, batch 700, loss[loss=0.2328, simple_loss=0.274, pruned_loss=0.07295, ctc_loss=0.148, cr_loss=0.403, over 34604.00 frames. ], tot_loss[loss=0.2497, simple_loss=0.2905, pruned_loss=0.07984, ctc_loss=0.1596, cr_loss=0.4306, over 6582301.25 frames. ], batch size: 89, lr: 9.43e-03, grad_scale: 16.0 2024-09-17 16:06:02,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=12.0 2024-09-17 16:06:12,676 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.88 vs. limit=22.5 2024-09-17 16:06:25,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=220388.0, ans=0.1 2024-09-17 16:07:19,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.22 vs. limit=15.0 2024-09-17 16:07:21,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=220528.0, ans=10.0 2024-09-17 16:07:26,521 INFO [train.py:1198] (1/2) Epoch 13, batch 750, loss[loss=0.2536, simple_loss=0.2991, pruned_loss=0.07911, ctc_loss=0.1619, cr_loss=0.4372, over 34412.00 frames. ], tot_loss[loss=0.249, simple_loss=0.2899, pruned_loss=0.07953, ctc_loss=0.1593, cr_loss=0.4297, over 6625988.22 frames. ], batch size: 95, lr: 9.43e-03, grad_scale: 16.0 2024-09-17 16:07:39,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=220574.66666666666, ans=0.1 2024-09-17 16:07:46,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=220621.33333333334, ans=0.1 2024-09-17 16:08:01,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=220668.0, ans=0.125 2024-09-17 16:08:12,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=220668.0, ans=0.125 2024-09-17 16:08:14,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=220714.66666666666, ans=0.025 2024-09-17 16:08:23,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.47 vs. limit=15.0 2024-09-17 16:08:32,130 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.083e+02 2.530e+02 2.838e+02 3.791e+02 1.158e+03, threshold=5.676e+02, percent-clipped=1.0 2024-09-17 16:08:48,897 INFO [train.py:1198] (1/2) Epoch 13, batch 800, loss[loss=0.2271, simple_loss=0.2688, pruned_loss=0.07021, ctc_loss=0.1439, cr_loss=0.4056, over 34469.00 frames. ], tot_loss[loss=0.2489, simple_loss=0.2898, pruned_loss=0.07954, ctc_loss=0.1593, cr_loss=0.43, over 6661892.09 frames. ], batch size: 85, lr: 9.42e-03, grad_scale: 32.0 2024-09-17 16:08:59,839 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.03 vs. limit=15.0 2024-09-17 16:09:11,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.53 vs. limit=10.0 2024-09-17 16:09:17,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=220854.66666666666, ans=0.0 2024-09-17 16:09:59,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=220994.66666666666, ans=0.09899494936611666 2024-09-17 16:10:05,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=220994.66666666666, ans=0.0 2024-09-17 16:10:13,454 INFO [train.py:1198] (1/2) Epoch 13, batch 850, loss[loss=0.2574, simple_loss=0.3023, pruned_loss=0.08171, ctc_loss=0.1598, cr_loss=0.4266, over 34382.00 frames. ], tot_loss[loss=0.2484, simple_loss=0.2894, pruned_loss=0.07922, ctc_loss=0.1587, cr_loss=0.4291, over 6693510.31 frames. ], batch size: 103, lr: 9.42e-03, grad_scale: 32.0 2024-09-17 16:10:26,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=12.0 2024-09-17 16:10:42,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=221088.0, ans=0.125 2024-09-17 16:11:08,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=221181.33333333334, ans=0.0 2024-09-17 16:11:21,564 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.844e+02 2.581e+02 2.985e+02 4.016e+02 1.138e+03, threshold=5.970e+02, percent-clipped=7.0 2024-09-17 16:11:28,643 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:11:38,096 INFO [train.py:1198] (1/2) Epoch 13, batch 900, loss[loss=0.2326, simple_loss=0.273, pruned_loss=0.07338, ctc_loss=0.1459, cr_loss=0.4102, over 34488.00 frames. ], tot_loss[loss=0.2492, simple_loss=0.29, pruned_loss=0.07966, ctc_loss=0.1594, cr_loss=0.4299, over 6698082.60 frames. ], batch size: 85, lr: 9.41e-03, grad_scale: 32.0 2024-09-17 16:11:45,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=15.0 2024-09-17 16:12:23,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=221368.0, ans=0.125 2024-09-17 16:12:32,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=221414.66666666666, ans=0.0 2024-09-17 16:12:39,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=221414.66666666666, ans=0.2 2024-09-17 16:12:41,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=221414.66666666666, ans=0.125 2024-09-17 16:13:00,505 INFO [train.py:1198] (1/2) Epoch 13, batch 950, loss[loss=0.2451, simple_loss=0.2829, pruned_loss=0.07953, ctc_loss=0.1582, cr_loss=0.4166, over 34708.00 frames. ], tot_loss[loss=0.2494, simple_loss=0.2902, pruned_loss=0.07976, ctc_loss=0.1597, cr_loss=0.4305, over 6701147.96 frames. ], batch size: 87, lr: 9.41e-03, grad_scale: 32.0 2024-09-17 16:13:07,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=221508.0, ans=0.0 2024-09-17 16:13:19,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.60 vs. limit=10.0 2024-09-17 16:13:33,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=221554.66666666666, ans=0.025 2024-09-17 16:14:08,784 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 2.622e+02 3.049e+02 3.526e+02 5.778e+02, threshold=6.098e+02, percent-clipped=0.0 2024-09-17 16:14:17,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.71 vs. limit=15.0 2024-09-17 16:14:24,937 INFO [train.py:1198] (1/2) Epoch 13, batch 1000, loss[loss=0.2219, simple_loss=0.269, pruned_loss=0.06632, ctc_loss=0.135, cr_loss=0.3799, over 34461.00 frames. ], tot_loss[loss=0.2506, simple_loss=0.2913, pruned_loss=0.08028, ctc_loss=0.1605, cr_loss=0.432, over 6696410.44 frames. ], batch size: 90, lr: 9.40e-03, grad_scale: 32.0 2024-09-17 16:14:39,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.04 vs. limit=10.0 2024-09-17 16:14:43,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=221788.0, ans=0.125 2024-09-17 16:14:55,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=221788.0, ans=0.05 2024-09-17 16:15:03,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=221834.66666666666, ans=0.2 2024-09-17 16:15:17,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.04 vs. limit=15.0 2024-09-17 16:15:18,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=221881.33333333334, ans=0.2 2024-09-17 16:15:41,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.49 vs. limit=15.0 2024-09-17 16:15:48,800 INFO [train.py:1198] (1/2) Epoch 13, batch 1050, loss[loss=0.2498, simple_loss=0.294, pruned_loss=0.07852, ctc_loss=0.1574, cr_loss=0.4251, over 34589.00 frames. ], tot_loss[loss=0.2501, simple_loss=0.2906, pruned_loss=0.08016, ctc_loss=0.1603, cr_loss=0.4315, over 6705726.00 frames. ], batch size: 99, lr: 9.40e-03, grad_scale: 32.0 2024-09-17 16:15:50,974 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:16:07,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=222021.33333333334, ans=0.0 2024-09-17 16:16:45,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=222114.66666666666, ans=0.125 2024-09-17 16:16:47,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=222114.66666666666, ans=0.5 2024-09-17 16:16:55,316 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.061e+02 2.518e+02 2.890e+02 3.644e+02 5.507e+02, threshold=5.780e+02, percent-clipped=0.0 2024-09-17 16:16:56,182 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.48 vs. limit=12.0 2024-09-17 16:16:59,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=222161.33333333334, ans=0.025 2024-09-17 16:17:00,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=222161.33333333334, ans=0.125 2024-09-17 16:17:12,120 INFO [train.py:1198] (1/2) Epoch 13, batch 1100, loss[loss=0.2366, simple_loss=0.2788, pruned_loss=0.07418, ctc_loss=0.1471, cr_loss=0.4177, over 34741.00 frames. ], tot_loss[loss=0.2495, simple_loss=0.29, pruned_loss=0.07988, ctc_loss=0.1599, cr_loss=0.4304, over 6719017.40 frames. ], batch size: 92, lr: 9.39e-03, grad_scale: 32.0 2024-09-17 16:17:22,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=222208.0, ans=0.0 2024-09-17 16:17:43,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=222254.66666666666, ans=0.125 2024-09-17 16:17:48,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-09-17 16:17:51,243 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:18:04,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=222348.0, ans=0.125 2024-09-17 16:18:11,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=222348.0, ans=0.0 2024-09-17 16:18:21,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=222394.66666666666, ans=0.95 2024-09-17 16:18:38,927 INFO [train.py:1198] (1/2) Epoch 13, batch 1150, loss[loss=0.2366, simple_loss=0.2785, pruned_loss=0.07384, ctc_loss=0.1504, cr_loss=0.4223, over 34729.00 frames. ], tot_loss[loss=0.2496, simple_loss=0.2901, pruned_loss=0.07993, ctc_loss=0.1599, cr_loss=0.4302, over 6718499.23 frames. ], batch size: 92, lr: 9.39e-03, grad_scale: 32.0 2024-09-17 16:18:57,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=222488.0, ans=0.125 2024-09-17 16:18:57,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=222488.0, ans=0.125 2024-09-17 16:18:57,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=22.5 2024-09-17 16:19:25,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222534.66666666666, ans=0.1 2024-09-17 16:19:32,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=222581.33333333334, ans=0.125 2024-09-17 16:19:34,517 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-09-17 16:19:45,302 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.075e+02 2.568e+02 3.111e+02 4.431e+02 7.892e+02, threshold=6.223e+02, percent-clipped=11.0 2024-09-17 16:19:58,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=222628.0, ans=0.0 2024-09-17 16:20:01,768 INFO [train.py:1198] (1/2) Epoch 13, batch 1200, loss[loss=0.2589, simple_loss=0.3027, pruned_loss=0.08242, ctc_loss=0.165, cr_loss=0.4333, over 34555.00 frames. ], tot_loss[loss=0.2507, simple_loss=0.2912, pruned_loss=0.08035, ctc_loss=0.1607, cr_loss=0.4315, over 6710063.98 frames. ], batch size: 99, lr: 9.38e-03, grad_scale: 32.0 2024-09-17 16:20:13,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=222674.66666666666, ans=0.125 2024-09-17 16:20:15,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=222674.66666666666, ans=0.09899494936611666 2024-09-17 16:20:25,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=222721.33333333334, ans=0.125 2024-09-17 16:20:38,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=222768.0, ans=0.1 2024-09-17 16:20:55,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=222814.66666666666, ans=0.2 2024-09-17 16:21:18,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=222861.33333333334, ans=0.0 2024-09-17 16:21:20,957 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2024-09-17 16:21:26,281 INFO [train.py:1198] (1/2) Epoch 13, batch 1250, loss[loss=0.2609, simple_loss=0.3029, pruned_loss=0.08344, ctc_loss=0.1691, cr_loss=0.4564, over 34363.00 frames. ], tot_loss[loss=0.2508, simple_loss=0.2915, pruned_loss=0.08033, ctc_loss=0.1606, cr_loss=0.4321, over 6742894.68 frames. ], batch size: 107, lr: 9.38e-03, grad_scale: 32.0 2024-09-17 16:21:54,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=222954.66666666666, ans=0.0 2024-09-17 16:21:56,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=222954.66666666666, ans=0.125 2024-09-17 16:21:58,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.63 vs. limit=22.5 2024-09-17 16:22:21,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=223048.0, ans=0.125 2024-09-17 16:22:25,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.64 vs. limit=22.5 2024-09-17 16:22:36,130 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.076e+02 2.541e+02 3.197e+02 4.090e+02 7.771e+02, threshold=6.395e+02, percent-clipped=4.0 2024-09-17 16:22:38,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=223094.66666666666, ans=0.0 2024-09-17 16:22:39,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=223094.66666666666, ans=0.1 2024-09-17 16:22:50,836 INFO [train.py:1198] (1/2) Epoch 13, batch 1300, loss[loss=0.2668, simple_loss=0.3121, pruned_loss=0.0847, ctc_loss=0.169, cr_loss=0.4586, over 33068.00 frames. ], tot_loss[loss=0.2498, simple_loss=0.2906, pruned_loss=0.07987, ctc_loss=0.1598, cr_loss=0.4305, over 6746191.68 frames. ], batch size: 130, lr: 9.37e-03, grad_scale: 16.0 2024-09-17 16:22:52,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=223141.33333333334, ans=0.2 2024-09-17 16:22:57,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=223141.33333333334, ans=0.0 2024-09-17 16:23:17,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=223188.0, ans=0.125 2024-09-17 16:23:44,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=223281.33333333334, ans=0.125 2024-09-17 16:23:49,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=223281.33333333334, ans=0.0 2024-09-17 16:24:13,555 INFO [train.py:1198] (1/2) Epoch 13, batch 1350, loss[loss=0.2456, simple_loss=0.2888, pruned_loss=0.0773, ctc_loss=0.1545, cr_loss=0.4202, over 34518.00 frames. ], tot_loss[loss=0.2491, simple_loss=0.2901, pruned_loss=0.07957, ctc_loss=0.1591, cr_loss=0.4297, over 6765311.50 frames. ], batch size: 94, lr: 9.37e-03, grad_scale: 16.0 2024-09-17 16:24:28,481 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:24:30,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=223421.33333333334, ans=0.125 2024-09-17 16:25:23,053 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.072e+02 2.477e+02 2.790e+02 3.651e+02 6.820e+02, threshold=5.581e+02, percent-clipped=2.0 2024-09-17 16:25:37,926 INFO [train.py:1198] (1/2) Epoch 13, batch 1400, loss[loss=0.2247, simple_loss=0.261, pruned_loss=0.07188, ctc_loss=0.1458, cr_loss=0.3851, over 34310.00 frames. ], tot_loss[loss=0.249, simple_loss=0.2899, pruned_loss=0.07955, ctc_loss=0.1591, cr_loss=0.43, over 6776779.86 frames. ], batch size: 80, lr: 9.36e-03, grad_scale: 16.0 2024-09-17 16:25:38,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=223608.0, ans=0.125 2024-09-17 16:26:06,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=223654.66666666666, ans=0.125 2024-09-17 16:26:24,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=223701.33333333334, ans=0.1 2024-09-17 16:26:39,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=223748.0, ans=0.0 2024-09-17 16:27:02,200 INFO [train.py:1198] (1/2) Epoch 13, batch 1450, loss[loss=0.2739, simple_loss=0.3139, pruned_loss=0.09004, ctc_loss=0.1789, cr_loss=0.4517, over 34453.00 frames. ], tot_loss[loss=0.2497, simple_loss=0.2905, pruned_loss=0.07983, ctc_loss=0.1598, cr_loss=0.4311, over 6774134.72 frames. ], batch size: 110, lr: 9.36e-03, grad_scale: 16.0 2024-09-17 16:27:12,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=223841.33333333334, ans=0.05 2024-09-17 16:27:22,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=223888.0, ans=0.025 2024-09-17 16:27:30,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=223888.0, ans=0.0 2024-09-17 16:27:53,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=223981.33333333334, ans=0.125 2024-09-17 16:28:15,949 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.033e+02 2.424e+02 2.645e+02 3.160e+02 6.436e+02, threshold=5.290e+02, percent-clipped=1.0 2024-09-17 16:28:29,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=224074.66666666666, ans=0.0 2024-09-17 16:28:31,059 INFO [train.py:1198] (1/2) Epoch 13, batch 1500, loss[loss=0.2454, simple_loss=0.296, pruned_loss=0.07388, ctc_loss=0.1506, cr_loss=0.4256, over 34445.00 frames. ], tot_loss[loss=0.2497, simple_loss=0.2907, pruned_loss=0.07973, ctc_loss=0.1597, cr_loss=0.4306, over 6774070.43 frames. ], batch size: 100, lr: 9.35e-03, grad_scale: 16.0 2024-09-17 16:28:34,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=224074.66666666666, ans=0.125 2024-09-17 16:28:38,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=224074.66666666666, ans=0.0 2024-09-17 16:28:58,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=224121.33333333334, ans=0.125 2024-09-17 16:29:09,079 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.54 vs. limit=15.0 2024-09-17 16:29:20,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=224168.0, ans=0.125 2024-09-17 16:29:38,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=224261.33333333334, ans=0.2 2024-09-17 16:29:58,203 INFO [train.py:1198] (1/2) Epoch 13, batch 1550, loss[loss=0.2515, simple_loss=0.2943, pruned_loss=0.07976, ctc_loss=0.1587, cr_loss=0.4392, over 34402.00 frames. ], tot_loss[loss=0.2503, simple_loss=0.291, pruned_loss=0.08014, ctc_loss=0.1604, cr_loss=0.4318, over 6746433.61 frames. ], batch size: 105, lr: 9.35e-03, grad_scale: 16.0 2024-09-17 16:30:32,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=224401.33333333334, ans=0.0 2024-09-17 16:30:37,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=224401.33333333334, ans=0.0 2024-09-17 16:30:42,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=224401.33333333334, ans=0.125 2024-09-17 16:30:50,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=224448.0, ans=0.125 2024-09-17 16:30:55,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.51 vs. limit=10.0 2024-09-17 16:31:05,389 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.120e+02 2.468e+02 2.905e+02 3.793e+02 7.651e+02, threshold=5.810e+02, percent-clipped=6.0 2024-09-17 16:31:09,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=224494.66666666666, ans=0.125 2024-09-17 16:31:17,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=224494.66666666666, ans=0.125 2024-09-17 16:31:19,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.49 vs. limit=15.0 2024-09-17 16:31:20,123 INFO [train.py:1198] (1/2) Epoch 13, batch 1600, loss[loss=0.257, simple_loss=0.3, pruned_loss=0.08149, ctc_loss=0.1652, cr_loss=0.4465, over 34568.00 frames. ], tot_loss[loss=0.25, simple_loss=0.2907, pruned_loss=0.08004, ctc_loss=0.1602, cr_loss=0.431, over 6725241.28 frames. ], batch size: 99, lr: 9.34e-03, grad_scale: 32.0 2024-09-17 16:31:33,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=224541.33333333334, ans=0.025 2024-09-17 16:31:37,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=224588.0, ans=0.125 2024-09-17 16:31:42,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=224588.0, ans=0.5 2024-09-17 16:32:06,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=224634.66666666666, ans=0.2 2024-09-17 16:32:16,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=224681.33333333334, ans=0.125 2024-09-17 16:32:21,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=224681.33333333334, ans=0.1 2024-09-17 16:32:28,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=224728.0, ans=0.125 2024-09-17 16:32:33,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=224728.0, ans=0.2 2024-09-17 16:32:44,402 INFO [train.py:1198] (1/2) Epoch 13, batch 1650, loss[loss=0.2582, simple_loss=0.3044, pruned_loss=0.08011, ctc_loss=0.1656, cr_loss=0.4656, over 34388.00 frames. ], tot_loss[loss=0.25, simple_loss=0.2906, pruned_loss=0.08005, ctc_loss=0.1603, cr_loss=0.4312, over 6719769.55 frames. ], batch size: 103, lr: 9.34e-03, grad_scale: 32.0 2024-09-17 16:32:46,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=224774.66666666666, ans=0.025 2024-09-17 16:32:50,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=15.0 2024-09-17 16:33:11,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=224821.33333333334, ans=0.125 2024-09-17 16:33:38,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.40 vs. limit=15.0 2024-09-17 16:33:39,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=224914.66666666666, ans=0.0 2024-09-17 16:33:43,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=224914.66666666666, ans=0.125 2024-09-17 16:33:54,334 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.542e+02 3.083e+02 4.375e+02 8.895e+02, threshold=6.165e+02, percent-clipped=10.0 2024-09-17 16:34:07,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=225008.0, ans=0.0 2024-09-17 16:34:09,098 INFO [train.py:1198] (1/2) Epoch 13, batch 1700, loss[loss=0.2234, simple_loss=0.2668, pruned_loss=0.06802, ctc_loss=0.1401, cr_loss=0.3992, over 34316.00 frames. ], tot_loss[loss=0.2498, simple_loss=0.2906, pruned_loss=0.07982, ctc_loss=0.1599, cr_loss=0.4312, over 6744991.87 frames. ], batch size: 80, lr: 9.34e-03, grad_scale: 32.0 2024-09-17 16:34:19,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=225008.0, ans=0.05 2024-09-17 16:34:39,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=15.0 2024-09-17 16:34:48,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=225101.33333333334, ans=0.125 2024-09-17 16:35:03,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=225148.0, ans=0.125 2024-09-17 16:35:03,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=225148.0, ans=0.1 2024-09-17 16:35:11,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=22.5 2024-09-17 16:35:11,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=225148.0, ans=0.0 2024-09-17 16:35:13,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=225194.66666666666, ans=0.125 2024-09-17 16:35:25,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-17 16:35:31,320 INFO [train.py:1198] (1/2) Epoch 13, batch 1750, loss[loss=0.2322, simple_loss=0.2648, pruned_loss=0.07645, ctc_loss=0.1532, cr_loss=0.3994, over 34184.00 frames. ], tot_loss[loss=0.249, simple_loss=0.2899, pruned_loss=0.07951, ctc_loss=0.1594, cr_loss=0.43, over 6752112.78 frames. ], batch size: 78, lr: 9.33e-03, grad_scale: 32.0 2024-09-17 16:35:40,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.28 vs. limit=22.5 2024-09-17 16:35:43,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.70 vs. limit=12.0 2024-09-17 16:35:49,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=225288.0, ans=0.025 2024-09-17 16:35:51,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=225288.0, ans=0.0 2024-09-17 16:35:54,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=225288.0, ans=0.125 2024-09-17 16:35:59,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225288.0, ans=0.1 2024-09-17 16:36:06,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=225334.66666666666, ans=0.09899494936611666 2024-09-17 16:36:06,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=225334.66666666666, ans=0.07 2024-09-17 16:36:07,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=225334.66666666666, ans=0.125 2024-09-17 16:36:12,876 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:36:40,666 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.729e+02 3.291e+02 4.256e+02 7.683e+02, threshold=6.582e+02, percent-clipped=5.0 2024-09-17 16:36:44,270 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:36:55,248 INFO [train.py:1198] (1/2) Epoch 13, batch 1800, loss[loss=0.2479, simple_loss=0.2938, pruned_loss=0.07633, ctc_loss=0.1546, cr_loss=0.4619, over 34694.00 frames. ], tot_loss[loss=0.2493, simple_loss=0.2902, pruned_loss=0.07963, ctc_loss=0.1595, cr_loss=0.4303, over 6754649.23 frames. ], batch size: 97, lr: 9.33e-03, grad_scale: 32.0 2024-09-17 16:37:04,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.75 vs. limit=22.5 2024-09-17 16:37:25,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=225521.33333333334, ans=0.125 2024-09-17 16:38:19,875 INFO [train.py:1198] (1/2) Epoch 13, batch 1850, loss[loss=0.2525, simple_loss=0.3003, pruned_loss=0.07866, ctc_loss=0.1537, cr_loss=0.4158, over 34463.00 frames. ], tot_loss[loss=0.2494, simple_loss=0.2903, pruned_loss=0.07965, ctc_loss=0.1595, cr_loss=0.4303, over 6762168.54 frames. ], batch size: 100, lr: 9.32e-03, grad_scale: 32.0 2024-09-17 16:38:26,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=225708.0, ans=0.125 2024-09-17 16:38:38,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=225754.66666666666, ans=0.2 2024-09-17 16:38:38,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=225754.66666666666, ans=0.0 2024-09-17 16:38:39,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=225754.66666666666, ans=0.125 2024-09-17 16:38:48,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=225754.66666666666, ans=0.0 2024-09-17 16:38:52,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=225801.33333333334, ans=0.125 2024-09-17 16:38:55,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=225801.33333333334, ans=0.125 2024-09-17 16:39:00,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=225801.33333333334, ans=0.07 2024-09-17 16:39:00,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225801.33333333334, ans=0.1 2024-09-17 16:39:24,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.50 vs. limit=15.0 2024-09-17 16:39:28,029 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 2.664e+02 3.187e+02 4.428e+02 8.411e+02, threshold=6.373e+02, percent-clipped=9.0 2024-09-17 16:39:33,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=225894.66666666666, ans=0.025 2024-09-17 16:39:42,804 INFO [train.py:1198] (1/2) Epoch 13, batch 1900, loss[loss=0.2602, simple_loss=0.304, pruned_loss=0.08303, ctc_loss=0.1643, cr_loss=0.4353, over 34352.00 frames. ], tot_loss[loss=0.2505, simple_loss=0.2914, pruned_loss=0.08007, ctc_loss=0.1604, cr_loss=0.4328, over 6772072.80 frames. ], batch size: 103, lr: 9.32e-03, grad_scale: 32.0 2024-09-17 16:39:48,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=225941.33333333334, ans=0.125 2024-09-17 16:40:46,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=226081.33333333334, ans=0.025 2024-09-17 16:40:52,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=226128.0, ans=0.1 2024-09-17 16:40:57,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=226128.0, ans=10.0 2024-09-17 16:41:08,579 INFO [train.py:1198] (1/2) Epoch 13, batch 1950, loss[loss=0.2609, simple_loss=0.2976, pruned_loss=0.08653, ctc_loss=0.1704, cr_loss=0.4286, over 34379.00 frames. ], tot_loss[loss=0.2512, simple_loss=0.2923, pruned_loss=0.0803, ctc_loss=0.1607, cr_loss=0.4331, over 6788445.83 frames. ], batch size: 91, lr: 9.31e-03, grad_scale: 32.0 2024-09-17 16:41:12,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=226174.66666666666, ans=0.125 2024-09-17 16:41:14,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=226174.66666666666, ans=0.125 2024-09-17 16:41:17,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.70 vs. limit=22.5 2024-09-17 16:41:26,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.87 vs. limit=15.0 2024-09-17 16:41:40,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=226268.0, ans=0.95 2024-09-17 16:41:43,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=226268.0, ans=0.0 2024-09-17 16:41:52,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_positive, batch_count=226268.0, ans=0.05 2024-09-17 16:41:55,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=226268.0, ans=0.125 2024-09-17 16:42:16,348 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.872e+02 2.368e+02 2.686e+02 3.555e+02 8.665e+02, threshold=5.372e+02, percent-clipped=1.0 2024-09-17 16:42:24,044 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=13.21 vs. limit=15.0 2024-09-17 16:42:31,370 INFO [train.py:1198] (1/2) Epoch 13, batch 2000, loss[loss=0.2189, simple_loss=0.2581, pruned_loss=0.06805, ctc_loss=0.1385, cr_loss=0.3985, over 34131.00 frames. ], tot_loss[loss=0.2517, simple_loss=0.2927, pruned_loss=0.08051, ctc_loss=0.1612, cr_loss=0.4337, over 6764277.37 frames. ], batch size: 78, lr: 9.31e-03, grad_scale: 32.0 2024-09-17 16:42:53,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=226454.66666666666, ans=0.2 2024-09-17 16:43:03,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=226501.33333333334, ans=0.125 2024-09-17 16:43:04,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=226501.33333333334, ans=0.125 2024-09-17 16:43:20,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=226548.0, ans=0.125 2024-09-17 16:43:39,390 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=15.0 2024-09-17 16:43:53,498 INFO [train.py:1198] (1/2) Epoch 13, batch 2050, loss[loss=0.2249, simple_loss=0.2655, pruned_loss=0.07031, ctc_loss=0.1426, cr_loss=0.3784, over 34529.00 frames. ], tot_loss[loss=0.2503, simple_loss=0.2913, pruned_loss=0.07999, ctc_loss=0.1602, cr_loss=0.4315, over 6756090.97 frames. ], batch size: 82, lr: 9.30e-03, grad_scale: 32.0 2024-09-17 16:44:12,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=226688.0, ans=0.125 2024-09-17 16:44:45,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=226781.33333333334, ans=0.125 2024-09-17 16:44:50,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=226781.33333333334, ans=0.1 2024-09-17 16:45:05,165 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.538e+02 2.933e+02 3.710e+02 6.108e+02, threshold=5.865e+02, percent-clipped=3.0 2024-09-17 16:45:05,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=226828.0, ans=0.2 2024-09-17 16:45:07,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=226828.0, ans=0.1 2024-09-17 16:45:19,942 INFO [train.py:1198] (1/2) Epoch 13, batch 2100, loss[loss=0.2568, simple_loss=0.2975, pruned_loss=0.08324, ctc_loss=0.1612, cr_loss=0.4366, over 34509.00 frames. ], tot_loss[loss=0.2493, simple_loss=0.2904, pruned_loss=0.07952, ctc_loss=0.1594, cr_loss=0.4302, over 6770956.97 frames. ], batch size: 94, lr: 9.30e-03, grad_scale: 32.0 2024-09-17 16:45:41,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=226921.33333333334, ans=0.125 2024-09-17 16:45:41,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=226921.33333333334, ans=0.125 2024-09-17 16:46:17,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=227014.66666666666, ans=0.0 2024-09-17 16:46:23,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227061.33333333334, ans=0.1 2024-09-17 16:46:27,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=227061.33333333334, ans=10.0 2024-09-17 16:46:32,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=227061.33333333334, ans=0.1 2024-09-17 16:46:41,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=15.0 2024-09-17 16:46:41,669 INFO [train.py:1198] (1/2) Epoch 13, batch 2150, loss[loss=0.2554, simple_loss=0.2907, pruned_loss=0.08427, ctc_loss=0.1691, cr_loss=0.4426, over 34349.00 frames. ], tot_loss[loss=0.2489, simple_loss=0.29, pruned_loss=0.07938, ctc_loss=0.159, cr_loss=0.43, over 6789418.41 frames. ], batch size: 91, lr: 9.29e-03, grad_scale: 32.0 2024-09-17 16:47:20,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=227201.33333333334, ans=0.09899494936611666 2024-09-17 16:47:26,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=227201.33333333334, ans=0.1 2024-09-17 16:47:49,602 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.067e+02 2.603e+02 3.206e+02 4.409e+02 8.569e+02, threshold=6.412e+02, percent-clipped=12.0 2024-09-17 16:48:03,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=227294.66666666666, ans=0.125 2024-09-17 16:48:06,503 INFO [train.py:1198] (1/2) Epoch 13, batch 2200, loss[loss=0.2604, simple_loss=0.3037, pruned_loss=0.08294, ctc_loss=0.1646, cr_loss=0.4563, over 34453.00 frames. ], tot_loss[loss=0.2493, simple_loss=0.2904, pruned_loss=0.07959, ctc_loss=0.1593, cr_loss=0.4309, over 6785662.91 frames. ], batch size: 100, lr: 9.29e-03, grad_scale: 32.0 2024-09-17 16:48:31,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227388.0, ans=0.1 2024-09-17 16:48:31,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=227388.0, ans=0.0 2024-09-17 16:49:04,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=227481.33333333334, ans=0.0 2024-09-17 16:49:13,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=227528.0, ans=0.1 2024-09-17 16:49:18,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=227528.0, ans=0.125 2024-09-17 16:49:27,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.90 vs. limit=15.0 2024-09-17 16:49:31,167 INFO [train.py:1198] (1/2) Epoch 13, batch 2250, loss[loss=0.2413, simple_loss=0.2868, pruned_loss=0.07394, ctc_loss=0.1547, cr_loss=0.4257, over 34441.00 frames. ], tot_loss[loss=0.249, simple_loss=0.2901, pruned_loss=0.0794, ctc_loss=0.159, cr_loss=0.4304, over 6782483.31 frames. ], batch size: 95, lr: 9.28e-03, grad_scale: 32.0 2024-09-17 16:50:00,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=227621.33333333334, ans=0.125 2024-09-17 16:50:27,377 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:50:31,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.05 vs. limit=15.0 2024-09-17 16:50:38,601 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.558e+02 3.508e+02 4.696e+02 8.454e+02, threshold=7.016e+02, percent-clipped=2.0 2024-09-17 16:50:45,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=227761.33333333334, ans=0.125 2024-09-17 16:50:53,487 INFO [train.py:1198] (1/2) Epoch 13, batch 2300, loss[loss=0.2218, simple_loss=0.2627, pruned_loss=0.06798, ctc_loss=0.1419, cr_loss=0.4122, over 34279.00 frames. ], tot_loss[loss=0.2478, simple_loss=0.289, pruned_loss=0.07887, ctc_loss=0.1583, cr_loss=0.4288, over 6767837.41 frames. ], batch size: 83, lr: 9.28e-03, grad_scale: 32.0 2024-09-17 16:50:58,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=227808.0, ans=0.025 2024-09-17 16:51:11,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227854.66666666666, ans=0.1 2024-09-17 16:51:44,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.25 vs. limit=15.0 2024-09-17 16:51:56,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=227948.0, ans=0.125 2024-09-17 16:52:19,458 INFO [train.py:1198] (1/2) Epoch 13, batch 2350, loss[loss=0.2469, simple_loss=0.2932, pruned_loss=0.07614, ctc_loss=0.1566, cr_loss=0.4224, over 34713.00 frames. ], tot_loss[loss=0.2483, simple_loss=0.2894, pruned_loss=0.07912, ctc_loss=0.1586, cr_loss=0.4293, over 6774193.15 frames. ], batch size: 97, lr: 9.27e-03, grad_scale: 32.0 2024-09-17 16:52:23,135 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:52:30,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.21 vs. limit=10.0 2024-09-17 16:52:42,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=228088.0, ans=0.5 2024-09-17 16:52:59,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.91 vs. limit=6.0 2024-09-17 16:53:00,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.73 vs. limit=22.5 2024-09-17 16:53:07,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=228181.33333333334, ans=0.125 2024-09-17 16:53:10,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=228181.33333333334, ans=0.125 2024-09-17 16:53:19,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=228181.33333333334, ans=0.0 2024-09-17 16:53:24,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=228228.0, ans=0.0 2024-09-17 16:53:27,053 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.067e+02 2.559e+02 3.235e+02 4.170e+02 7.867e+02, threshold=6.470e+02, percent-clipped=1.0 2024-09-17 16:53:42,012 INFO [train.py:1198] (1/2) Epoch 13, batch 2400, loss[loss=0.2351, simple_loss=0.2774, pruned_loss=0.07358, ctc_loss=0.1468, cr_loss=0.4076, over 34567.00 frames. ], tot_loss[loss=0.2485, simple_loss=0.2896, pruned_loss=0.0792, ctc_loss=0.1588, cr_loss=0.4301, over 6779285.44 frames. ], batch size: 89, lr: 9.27e-03, grad_scale: 32.0 2024-09-17 16:53:45,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=228274.66666666666, ans=0.04949747468305833 2024-09-17 16:53:51,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=228274.66666666666, ans=0.125 2024-09-17 16:54:21,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=228368.0, ans=0.125 2024-09-17 16:54:22,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2024-09-17 16:54:23,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.01 vs. limit=10.0 2024-09-17 16:54:37,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=228414.66666666666, ans=0.0 2024-09-17 16:55:04,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2024-09-17 16:55:04,631 INFO [train.py:1198] (1/2) Epoch 13, batch 2450, loss[loss=0.252, simple_loss=0.2929, pruned_loss=0.0808, ctc_loss=0.1619, cr_loss=0.4284, over 34428.00 frames. ], tot_loss[loss=0.2496, simple_loss=0.2907, pruned_loss=0.07968, ctc_loss=0.1597, cr_loss=0.4314, over 6753657.88 frames. ], batch size: 95, lr: 9.27e-03, grad_scale: 32.0 2024-09-17 16:55:14,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=228508.0, ans=0.125 2024-09-17 16:55:32,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.15 vs. limit=15.0 2024-09-17 16:55:41,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-17 16:55:47,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=228601.33333333334, ans=0.025 2024-09-17 16:56:10,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=228648.0, ans=0.125 2024-09-17 16:56:16,635 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.033e+02 2.532e+02 2.985e+02 3.758e+02 6.766e+02, threshold=5.969e+02, percent-clipped=1.0 2024-09-17 16:56:22,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=228694.66666666666, ans=0.2 2024-09-17 16:56:22,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=228694.66666666666, ans=0.1 2024-09-17 16:56:22,834 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-09-17 16:56:25,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=228694.66666666666, ans=0.04949747468305833 2024-09-17 16:56:27,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=228694.66666666666, ans=0.2 2024-09-17 16:56:30,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.58 vs. limit=15.0 2024-09-17 16:56:31,575 INFO [train.py:1198] (1/2) Epoch 13, batch 2500, loss[loss=0.257, simple_loss=0.3039, pruned_loss=0.08002, ctc_loss=0.1622, cr_loss=0.438, over 34437.00 frames. ], tot_loss[loss=0.2494, simple_loss=0.2903, pruned_loss=0.07964, ctc_loss=0.1594, cr_loss=0.4308, over 6764814.99 frames. ], batch size: 100, lr: 9.26e-03, grad_scale: 32.0 2024-09-17 16:56:36,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=228741.33333333334, ans=0.125 2024-09-17 16:56:38,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=228741.33333333334, ans=0.0 2024-09-17 16:56:38,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=228741.33333333334, ans=0.0 2024-09-17 16:56:41,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=228741.33333333334, ans=0.04949747468305833 2024-09-17 16:56:43,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=228741.33333333334, ans=0.0 2024-09-17 16:57:07,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.52 vs. limit=15.0 2024-09-17 16:57:21,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=228881.33333333334, ans=0.0 2024-09-17 16:57:29,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=228881.33333333334, ans=0.07 2024-09-17 16:57:53,892 INFO [train.py:1198] (1/2) Epoch 13, batch 2550, loss[loss=0.2224, simple_loss=0.263, pruned_loss=0.0686, ctc_loss=0.1427, cr_loss=0.4006, over 34154.00 frames. ], tot_loss[loss=0.2489, simple_loss=0.29, pruned_loss=0.07941, ctc_loss=0.1592, cr_loss=0.4302, over 6770258.79 frames. ], batch size: 78, lr: 9.26e-03, grad_scale: 32.0 2024-09-17 16:57:54,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=228974.66666666666, ans=0.0 2024-09-17 16:57:57,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=228974.66666666666, ans=0.95 2024-09-17 16:58:17,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=229021.33333333334, ans=0.125 2024-09-17 16:58:40,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=229068.0, ans=0.95 2024-09-17 16:58:46,957 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 16:59:01,429 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.124e+02 2.492e+02 2.930e+02 3.813e+02 6.743e+02, threshold=5.859e+02, percent-clipped=2.0 2024-09-17 16:59:02,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.77 vs. limit=10.0 2024-09-17 16:59:15,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=229208.0, ans=0.0 2024-09-17 16:59:18,423 INFO [train.py:1198] (1/2) Epoch 13, batch 2600, loss[loss=0.2233, simple_loss=0.268, pruned_loss=0.06756, ctc_loss=0.1374, cr_loss=0.4001, over 34360.00 frames. ], tot_loss[loss=0.2494, simple_loss=0.2905, pruned_loss=0.07959, ctc_loss=0.1595, cr_loss=0.4305, over 6765381.74 frames. ], batch size: 91, lr: 9.25e-03, grad_scale: 32.0 2024-09-17 16:59:18,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=229208.0, ans=0.1 2024-09-17 16:59:31,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=229208.0, ans=0.125 2024-09-17 16:59:33,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=229208.0, ans=0.125 2024-09-17 16:59:34,209 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2024-09-17 16:59:36,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=229254.66666666666, ans=0.2 2024-09-17 16:59:51,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.57 vs. limit=15.0 2024-09-17 17:00:02,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=229301.33333333334, ans=0.125 2024-09-17 17:00:07,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=229348.0, ans=0.2 2024-09-17 17:00:11,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.47 vs. limit=22.5 2024-09-17 17:00:21,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=229348.0, ans=0.0 2024-09-17 17:00:30,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.79 vs. limit=15.0 2024-09-17 17:00:41,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=229441.33333333334, ans=0.125 2024-09-17 17:00:42,273 INFO [train.py:1198] (1/2) Epoch 13, batch 2650, loss[loss=0.2604, simple_loss=0.3045, pruned_loss=0.08228, ctc_loss=0.1657, cr_loss=0.4662, over 34229.00 frames. ], tot_loss[loss=0.2496, simple_loss=0.2907, pruned_loss=0.0796, ctc_loss=0.1596, cr_loss=0.4315, over 6773094.90 frames. ], batch size: 117, lr: 9.25e-03, grad_scale: 32.0 2024-09-17 17:00:48,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.41 vs. limit=12.0 2024-09-17 17:00:50,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=229441.33333333334, ans=0.125 2024-09-17 17:01:34,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=229581.33333333334, ans=0.125 2024-09-17 17:01:47,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.39 vs. limit=15.0 2024-09-17 17:01:49,626 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.617e+02 3.276e+02 4.178e+02 6.810e+02, threshold=6.552e+02, percent-clipped=3.0 2024-09-17 17:01:51,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=229628.0, ans=0.125 2024-09-17 17:02:04,589 INFO [train.py:1198] (1/2) Epoch 13, batch 2700, loss[loss=0.2518, simple_loss=0.297, pruned_loss=0.07906, ctc_loss=0.1594, cr_loss=0.4117, over 34608.00 frames. ], tot_loss[loss=0.2503, simple_loss=0.2914, pruned_loss=0.07995, ctc_loss=0.1603, cr_loss=0.4327, over 6768275.56 frames. ], batch size: 102, lr: 9.24e-03, grad_scale: 32.0 2024-09-17 17:02:38,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.73 vs. limit=22.5 2024-09-17 17:02:51,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.88 vs. limit=22.5 2024-09-17 17:02:59,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=229814.66666666666, ans=0.2 2024-09-17 17:03:01,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=229814.66666666666, ans=0.0 2024-09-17 17:03:28,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=229861.33333333334, ans=0.05 2024-09-17 17:03:29,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=229908.0, ans=0.125 2024-09-17 17:03:31,180 INFO [train.py:1198] (1/2) Epoch 13, batch 2750, loss[loss=0.2371, simple_loss=0.2795, pruned_loss=0.07407, ctc_loss=0.15, cr_loss=0.4139, over 34619.00 frames. ], tot_loss[loss=0.249, simple_loss=0.29, pruned_loss=0.07944, ctc_loss=0.1593, cr_loss=0.431, over 6763684.06 frames. ], batch size: 88, lr: 9.24e-03, grad_scale: 32.0 2024-09-17 17:03:51,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=229954.66666666666, ans=0.125 2024-09-17 17:03:58,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=229954.66666666666, ans=0.025 2024-09-17 17:03:59,657 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:04:07,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=230001.33333333334, ans=0.1 2024-09-17 17:04:16,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=230001.33333333334, ans=0.125 2024-09-17 17:04:18,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.52 vs. limit=15.0 2024-09-17 17:04:25,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=230048.0, ans=0.125 2024-09-17 17:04:29,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=230048.0, ans=0.0 2024-09-17 17:04:39,102 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.157e+02 2.567e+02 2.968e+02 3.696e+02 6.541e+02, threshold=5.936e+02, percent-clipped=0.0 2024-09-17 17:04:54,224 INFO [train.py:1198] (1/2) Epoch 13, batch 2800, loss[loss=0.292, simple_loss=0.3181, pruned_loss=0.1032, ctc_loss=0.2068, cr_loss=0.4527, over 23257.00 frames. ], tot_loss[loss=0.2494, simple_loss=0.2902, pruned_loss=0.07966, ctc_loss=0.1597, cr_loss=0.4316, over 6740661.59 frames. ], batch size: 244, lr: 9.23e-03, grad_scale: 32.0 2024-09-17 17:05:01,961 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.50 vs. limit=12.0 2024-09-17 17:05:04,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=230141.33333333334, ans=0.2 2024-09-17 17:05:04,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.62 vs. limit=15.0 2024-09-17 17:05:54,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=230281.33333333334, ans=0.125 2024-09-17 17:05:54,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.21 vs. limit=15.0 2024-09-17 17:06:15,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=230374.66666666666, ans=0.125 2024-09-17 17:06:16,622 INFO [train.py:1198] (1/2) Epoch 13, batch 2850, loss[loss=0.2285, simple_loss=0.2741, pruned_loss=0.06926, ctc_loss=0.1436, cr_loss=0.394, over 34502.00 frames. ], tot_loss[loss=0.2501, simple_loss=0.2909, pruned_loss=0.07999, ctc_loss=0.1605, cr_loss=0.4322, over 6724052.96 frames. ], batch size: 90, lr: 9.23e-03, grad_scale: 32.0 2024-09-17 17:06:17,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=230374.66666666666, ans=0.025 2024-09-17 17:06:20,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=230374.66666666666, ans=0.05 2024-09-17 17:07:26,264 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.441e+02 2.937e+02 3.943e+02 1.025e+03, threshold=5.873e+02, percent-clipped=4.0 2024-09-17 17:07:41,234 INFO [train.py:1198] (1/2) Epoch 13, batch 2900, loss[loss=0.2566, simple_loss=0.299, pruned_loss=0.08244, ctc_loss=0.1605, cr_loss=0.4311, over 34563.00 frames. ], tot_loss[loss=0.2508, simple_loss=0.2917, pruned_loss=0.08022, ctc_loss=0.1608, cr_loss=0.4338, over 6754575.05 frames. ], batch size: 94, lr: 9.22e-03, grad_scale: 32.0 2024-09-17 17:07:41,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=230608.0, ans=0.125 2024-09-17 17:07:41,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=230608.0, ans=0.2 2024-09-17 17:07:44,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=230608.0, ans=0.2 2024-09-17 17:07:56,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=230654.66666666666, ans=0.025 2024-09-17 17:07:57,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.80 vs. limit=12.0 2024-09-17 17:08:42,750 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-09-17 17:08:44,655 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2024-09-17 17:08:52,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=230794.66666666666, ans=0.125 2024-09-17 17:09:03,275 INFO [train.py:1198] (1/2) Epoch 13, batch 2950, loss[loss=0.2332, simple_loss=0.2717, pruned_loss=0.07485, ctc_loss=0.1487, cr_loss=0.3822, over 34635.00 frames. ], tot_loss[loss=0.2497, simple_loss=0.2906, pruned_loss=0.07978, ctc_loss=0.1599, cr_loss=0.4322, over 6749082.77 frames. ], batch size: 88, lr: 9.22e-03, grad_scale: 32.0 2024-09-17 17:09:26,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=230888.0, ans=0.125 2024-09-17 17:09:27,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.50 vs. limit=12.0 2024-09-17 17:09:38,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=230934.66666666666, ans=0.1 2024-09-17 17:10:11,007 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.355e+02 2.791e+02 3.427e+02 6.387e+02, threshold=5.582e+02, percent-clipped=2.0 2024-09-17 17:10:11,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=231028.0, ans=0.125 2024-09-17 17:10:26,046 INFO [train.py:1198] (1/2) Epoch 13, batch 3000, loss[loss=0.2496, simple_loss=0.2949, pruned_loss=0.07717, ctc_loss=0.1597, cr_loss=0.4505, over 34528.00 frames. ], tot_loss[loss=0.2498, simple_loss=0.2906, pruned_loss=0.07982, ctc_loss=0.1599, cr_loss=0.4321, over 6748311.69 frames. ], batch size: 94, lr: 9.21e-03, grad_scale: 32.0 2024-09-17 17:10:26,047 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 17:10:43,965 INFO [train.py:1230] (1/2) Epoch 13, validation: loss=0.1518, simple_loss=0.2502, pruned_loss=0.02223, ctc_loss=0.04454, cr_loss=1.641e-14, over 944034.00 frames. 2024-09-17 17:10:43,965 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 17:10:47,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=231074.66666666666, ans=0.0 2024-09-17 17:10:55,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=11.47 vs. limit=12.0 2024-09-17 17:10:56,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=231074.66666666666, ans=0.2 2024-09-17 17:11:05,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=231121.33333333334, ans=0.125 2024-09-17 17:11:19,239 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.83 vs. limit=15.0 2024-09-17 17:11:24,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.26 vs. limit=22.5 2024-09-17 17:11:26,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=231168.0, ans=0.1 2024-09-17 17:11:51,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=231261.33333333334, ans=0.2 2024-09-17 17:11:54,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2024-09-17 17:11:57,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=231261.33333333334, ans=0.125 2024-09-17 17:12:05,590 INFO [train.py:1198] (1/2) Epoch 13, batch 3050, loss[loss=0.2414, simple_loss=0.2827, pruned_loss=0.07649, ctc_loss=0.1504, cr_loss=0.4243, over 34587.00 frames. ], tot_loss[loss=0.2504, simple_loss=0.2913, pruned_loss=0.08008, ctc_loss=0.1605, cr_loss=0.4331, over 6741588.38 frames. ], batch size: 89, lr: 9.21e-03, grad_scale: 32.0 2024-09-17 17:12:13,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.93 vs. limit=10.0 2024-09-17 17:12:28,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=231354.66666666666, ans=0.07 2024-09-17 17:13:03,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=231448.0, ans=0.125 2024-09-17 17:13:04,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=231448.0, ans=0.0 2024-09-17 17:13:12,345 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.500e+02 2.999e+02 3.499e+02 6.406e+02, threshold=5.997e+02, percent-clipped=2.0 2024-09-17 17:13:19,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231494.66666666666, ans=0.1 2024-09-17 17:13:26,901 INFO [train.py:1198] (1/2) Epoch 13, batch 3100, loss[loss=0.2841, simple_loss=0.3221, pruned_loss=0.09475, ctc_loss=0.1855, cr_loss=0.4864, over 34262.00 frames. ], tot_loss[loss=0.2498, simple_loss=0.2907, pruned_loss=0.07982, ctc_loss=0.16, cr_loss=0.4317, over 6742445.33 frames. ], batch size: 117, lr: 9.21e-03, grad_scale: 32.0 2024-09-17 17:13:35,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=231541.33333333334, ans=0.0 2024-09-17 17:13:43,577 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:13:46,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=231588.0, ans=0.0 2024-09-17 17:13:56,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=231588.0, ans=0.0 2024-09-17 17:14:13,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=231681.33333333334, ans=0.0 2024-09-17 17:14:44,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=231728.0, ans=0.0 2024-09-17 17:14:47,687 INFO [train.py:1198] (1/2) Epoch 13, batch 3150, loss[loss=0.2711, simple_loss=0.3165, pruned_loss=0.08572, ctc_loss=0.1762, cr_loss=0.4745, over 33892.00 frames. ], tot_loss[loss=0.2495, simple_loss=0.2905, pruned_loss=0.07966, ctc_loss=0.1597, cr_loss=0.4311, over 6748642.72 frames. ], batch size: 122, lr: 9.20e-03, grad_scale: 16.0 2024-09-17 17:14:50,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=231774.66666666666, ans=15.0 2024-09-17 17:14:53,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.28 vs. limit=15.0 2024-09-17 17:15:02,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=231821.33333333334, ans=0.07 2024-09-17 17:15:11,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=231821.33333333334, ans=0.0 2024-09-17 17:15:33,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=231868.0, ans=0.025 2024-09-17 17:15:41,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231914.66666666666, ans=0.1 2024-09-17 17:15:50,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2024-09-17 17:15:55,918 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 3.039e+02 3.724e+02 5.496e+02 9.649e+02, threshold=7.448e+02, percent-clipped=17.0 2024-09-17 17:16:10,280 INFO [train.py:1198] (1/2) Epoch 13, batch 3200, loss[loss=0.254, simple_loss=0.2975, pruned_loss=0.08042, ctc_loss=0.1632, cr_loss=0.4232, over 34519.00 frames. ], tot_loss[loss=0.2494, simple_loss=0.2903, pruned_loss=0.07964, ctc_loss=0.1597, cr_loss=0.4314, over 6762524.82 frames. ], batch size: 94, lr: 9.20e-03, grad_scale: 32.0 2024-09-17 17:16:18,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=232008.0, ans=0.125 2024-09-17 17:17:01,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=232148.0, ans=0.125 2024-09-17 17:17:04,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=232148.0, ans=0.125 2024-09-17 17:17:22,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2024-09-17 17:17:24,280 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.00 vs. limit=10.0 2024-09-17 17:17:33,002 INFO [train.py:1198] (1/2) Epoch 13, batch 3250, loss[loss=0.2534, simple_loss=0.2973, pruned_loss=0.08109, ctc_loss=0.1566, cr_loss=0.3999, over 34654.00 frames. ], tot_loss[loss=0.2498, simple_loss=0.2908, pruned_loss=0.0798, ctc_loss=0.1599, cr_loss=0.4322, over 6771917.61 frames. ], batch size: 98, lr: 9.19e-03, grad_scale: 32.0 2024-09-17 17:18:24,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=232381.33333333334, ans=0.07 2024-09-17 17:18:26,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=232381.33333333334, ans=0.125 2024-09-17 17:18:42,449 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.804e+02 3.439e+02 4.435e+02 7.790e+02, threshold=6.878e+02, percent-clipped=1.0 2024-09-17 17:18:51,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2024-09-17 17:18:53,737 INFO [train.py:1198] (1/2) Epoch 13, batch 3300, loss[loss=0.2777, simple_loss=0.3161, pruned_loss=0.09126, ctc_loss=0.1857, cr_loss=0.4922, over 33237.00 frames. ], tot_loss[loss=0.2481, simple_loss=0.2893, pruned_loss=0.07902, ctc_loss=0.1586, cr_loss=0.4298, over 6768885.24 frames. ], batch size: 130, lr: 9.19e-03, grad_scale: 16.0 2024-09-17 17:19:00,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=232474.66666666666, ans=0.025 2024-09-17 17:19:04,692 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.84 vs. limit=12.0 2024-09-17 17:19:15,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=232521.33333333334, ans=0.0 2024-09-17 17:19:28,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=232568.0, ans=0.2 2024-09-17 17:19:34,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=232568.0, ans=0.125 2024-09-17 17:19:49,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=232614.66666666666, ans=0.125 2024-09-17 17:20:08,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=232661.33333333334, ans=0.025 2024-09-17 17:20:14,703 INFO [train.py:1198] (1/2) Epoch 13, batch 3350, loss[loss=0.259, simple_loss=0.3004, pruned_loss=0.08428, ctc_loss=0.1635, cr_loss=0.4082, over 33768.00 frames. ], tot_loss[loss=0.2492, simple_loss=0.2903, pruned_loss=0.07951, ctc_loss=0.1595, cr_loss=0.4306, over 6743019.11 frames. ], batch size: 122, lr: 9.18e-03, grad_scale: 16.0 2024-09-17 17:20:19,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=232708.0, ans=0.0 2024-09-17 17:20:59,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=232801.33333333334, ans=0.125 2024-09-17 17:21:16,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.42 vs. limit=22.5 2024-09-17 17:21:25,494 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.440e+02 2.749e+02 3.431e+02 5.309e+02, threshold=5.498e+02, percent-clipped=0.0 2024-09-17 17:21:27,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=232894.66666666666, ans=0.125 2024-09-17 17:21:36,821 INFO [train.py:1198] (1/2) Epoch 13, batch 3400, loss[loss=0.2142, simple_loss=0.2579, pruned_loss=0.06425, ctc_loss=0.1346, cr_loss=0.3792, over 34185.00 frames. ], tot_loss[loss=0.2493, simple_loss=0.2901, pruned_loss=0.07964, ctc_loss=0.1597, cr_loss=0.4309, over 6733286.34 frames. ], batch size: 78, lr: 9.18e-03, grad_scale: 16.0 2024-09-17 17:21:41,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=232941.33333333334, ans=0.125 2024-09-17 17:21:46,673 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:21:53,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=232988.0, ans=0.125 2024-09-17 17:22:53,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=233128.0, ans=0.0 2024-09-17 17:22:58,550 INFO [train.py:1198] (1/2) Epoch 13, batch 3450, loss[loss=0.261, simple_loss=0.3038, pruned_loss=0.08343, ctc_loss=0.1664, cr_loss=0.4523, over 32950.00 frames. ], tot_loss[loss=0.249, simple_loss=0.2901, pruned_loss=0.07945, ctc_loss=0.1593, cr_loss=0.4303, over 6744922.25 frames. ], batch size: 130, lr: 9.17e-03, grad_scale: 16.0 2024-09-17 17:23:23,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=233221.33333333334, ans=0.04949747468305833 2024-09-17 17:23:35,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.83 vs. limit=22.5 2024-09-17 17:23:37,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=233268.0, ans=0.0 2024-09-17 17:23:48,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=233314.66666666666, ans=0.125 2024-09-17 17:24:08,105 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.109e+02 2.557e+02 3.043e+02 3.796e+02 5.891e+02, threshold=6.086e+02, percent-clipped=1.0 2024-09-17 17:24:14,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=233361.33333333334, ans=0.125 2024-09-17 17:24:17,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=233408.0, ans=0.125 2024-09-17 17:24:19,250 INFO [train.py:1198] (1/2) Epoch 13, batch 3500, loss[loss=0.2226, simple_loss=0.2648, pruned_loss=0.06767, ctc_loss=0.1391, cr_loss=0.43, over 34495.00 frames. ], tot_loss[loss=0.2486, simple_loss=0.2896, pruned_loss=0.07932, ctc_loss=0.1591, cr_loss=0.4301, over 6747473.41 frames. ], batch size: 85, lr: 9.17e-03, grad_scale: 16.0 2024-09-17 17:24:25,325 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.84 vs. limit=10.0 2024-09-17 17:24:29,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=233408.0, ans=0.125 2024-09-17 17:24:30,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=233408.0, ans=0.0 2024-09-17 17:24:35,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=233454.66666666666, ans=0.125 2024-09-17 17:24:56,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=233501.33333333334, ans=0.125 2024-09-17 17:25:15,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=233548.0, ans=0.0 2024-09-17 17:25:19,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=233548.0, ans=0.125 2024-09-17 17:25:38,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=233641.33333333334, ans=0.1 2024-09-17 17:25:40,255 INFO [train.py:1198] (1/2) Epoch 13, batch 3550, loss[loss=0.2616, simple_loss=0.3052, pruned_loss=0.08334, ctc_loss=0.1674, cr_loss=0.4463, over 34392.00 frames. ], tot_loss[loss=0.2484, simple_loss=0.2896, pruned_loss=0.07914, ctc_loss=0.1589, cr_loss=0.43, over 6756992.80 frames. ], batch size: 103, lr: 9.17e-03, grad_scale: 16.0 2024-09-17 17:25:44,312 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.31 vs. limit=15.0 2024-09-17 17:25:45,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=233641.33333333334, ans=0.07 2024-09-17 17:26:39,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=233781.33333333334, ans=0.0 2024-09-17 17:26:50,829 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.118e+02 2.587e+02 3.367e+02 4.133e+02 6.965e+02, threshold=6.734e+02, percent-clipped=4.0 2024-09-17 17:27:00,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=233874.66666666666, ans=0.125 2024-09-17 17:27:02,214 INFO [train.py:1198] (1/2) Epoch 13, batch 3600, loss[loss=0.2331, simple_loss=0.2756, pruned_loss=0.07221, ctc_loss=0.1461, cr_loss=0.424, over 34460.00 frames. ], tot_loss[loss=0.2488, simple_loss=0.29, pruned_loss=0.07929, ctc_loss=0.159, cr_loss=0.4305, over 6766277.06 frames. ], batch size: 90, lr: 9.16e-03, grad_scale: 32.0 2024-09-17 17:27:05,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=233874.66666666666, ans=0.1 2024-09-17 17:27:16,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=233921.33333333334, ans=0.2 2024-09-17 17:27:18,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=233921.33333333334, ans=0.2 2024-09-17 17:27:39,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.05 vs. limit=12.0 2024-09-17 17:27:44,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=233968.0, ans=0.07 2024-09-17 17:28:06,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=234061.33333333334, ans=0.125 2024-09-17 17:28:20,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.57 vs. limit=15.0 2024-09-17 17:28:22,653 INFO [train.py:1198] (1/2) Epoch 13, batch 3650, loss[loss=0.2818, simple_loss=0.3154, pruned_loss=0.09586, ctc_loss=0.1875, cr_loss=0.4741, over 34444.00 frames. ], tot_loss[loss=0.2481, simple_loss=0.2893, pruned_loss=0.07898, ctc_loss=0.1585, cr_loss=0.4294, over 6769515.64 frames. ], batch size: 110, lr: 9.16e-03, grad_scale: 32.0 2024-09-17 17:28:31,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2024-09-17 17:29:10,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=234248.0, ans=10.0 2024-09-17 17:29:32,550 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.047e+02 2.649e+02 3.622e+02 4.525e+02 7.942e+02, threshold=7.245e+02, percent-clipped=3.0 2024-09-17 17:29:39,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=234294.66666666666, ans=0.125 2024-09-17 17:29:43,811 INFO [train.py:1198] (1/2) Epoch 13, batch 3700, loss[loss=0.2603, simple_loss=0.3075, pruned_loss=0.08134, ctc_loss=0.1652, cr_loss=0.4319, over 34618.00 frames. ], tot_loss[loss=0.248, simple_loss=0.2896, pruned_loss=0.07875, ctc_loss=0.1582, cr_loss=0.429, over 6783912.03 frames. ], batch size: 102, lr: 9.15e-03, grad_scale: 32.0 2024-09-17 17:29:52,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=234341.33333333334, ans=0.1 2024-09-17 17:30:04,568 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:30:05,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=234388.0, ans=0.0 2024-09-17 17:30:17,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=234434.66666666666, ans=0.125 2024-09-17 17:30:22,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=234434.66666666666, ans=0.0 2024-09-17 17:30:25,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=234434.66666666666, ans=0.0 2024-09-17 17:30:34,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=234481.33333333334, ans=0.125 2024-09-17 17:30:48,871 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2024-09-17 17:31:01,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.82 vs. limit=10.0 2024-09-17 17:31:05,544 INFO [train.py:1198] (1/2) Epoch 13, batch 3750, loss[loss=0.2571, simple_loss=0.2991, pruned_loss=0.08171, ctc_loss=0.165, cr_loss=0.4643, over 34291.00 frames. ], tot_loss[loss=0.2511, simple_loss=0.2926, pruned_loss=0.08007, ctc_loss=0.1606, cr_loss=0.4343, over 6785589.47 frames. ], batch size: 113, lr: 9.15e-03, grad_scale: 16.0 2024-09-17 17:31:10,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=234574.66666666666, ans=0.0 2024-09-17 17:31:11,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.55 vs. limit=15.0 2024-09-17 17:31:36,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=234668.0, ans=0.125 2024-09-17 17:31:48,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=234668.0, ans=0.025 2024-09-17 17:31:58,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=234714.66666666666, ans=0.09899494936611666 2024-09-17 17:32:01,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=234714.66666666666, ans=0.2 2024-09-17 17:32:01,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=234714.66666666666, ans=0.1 2024-09-17 17:32:06,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=234714.66666666666, ans=0.0 2024-09-17 17:32:08,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.47 vs. limit=15.0 2024-09-17 17:32:16,885 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.083e+02 2.401e+02 2.596e+02 2.919e+02 1.813e+03, threshold=5.191e+02, percent-clipped=1.0 2024-09-17 17:32:24,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=234761.33333333334, ans=0.125 2024-09-17 17:32:27,343 INFO [train.py:1198] (1/2) Epoch 13, batch 3800, loss[loss=0.2855, simple_loss=0.3167, pruned_loss=0.09876, ctc_loss=0.1898, cr_loss=0.4733, over 29672.00 frames. ], tot_loss[loss=0.2549, simple_loss=0.2955, pruned_loss=0.082, ctc_loss=0.1641, cr_loss=0.438, over 6674962.48 frames. ], batch size: 175, lr: 9.14e-03, grad_scale: 16.0 2024-09-17 17:32:33,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.25 vs. limit=12.0 2024-09-17 17:32:34,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=234808.0, ans=0.125 2024-09-17 17:32:47,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=234854.66666666666, ans=0.0 2024-09-17 17:32:56,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=234854.66666666666, ans=0.1 2024-09-17 17:33:23,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=234948.0, ans=0.1 2024-09-17 17:33:24,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=234948.0, ans=0.0 2024-09-17 17:33:29,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=234948.0, ans=0.125 2024-09-17 17:33:31,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=234948.0, ans=0.125 2024-09-17 17:33:33,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.20 vs. limit=15.0 2024-09-17 17:33:34,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=234994.66666666666, ans=0.125 2024-09-17 17:33:34,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=234994.66666666666, ans=0.1 2024-09-17 17:33:37,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=234994.66666666666, ans=0.2 2024-09-17 17:33:50,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=235041.33333333334, ans=0.125 2024-09-17 17:33:51,334 INFO [train.py:1198] (1/2) Epoch 13, batch 3850, loss[loss=0.2888, simple_loss=0.3119, pruned_loss=0.1024, ctc_loss=0.2093, cr_loss=0.4771, over 23033.00 frames. ], tot_loss[loss=0.2613, simple_loss=0.2994, pruned_loss=0.08567, ctc_loss=0.1713, cr_loss=0.4423, over 6244705.20 frames. ], batch size: 245, lr: 9.14e-03, grad_scale: 16.0 2024-09-17 17:34:06,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=235088.0, ans=0.2 2024-09-17 17:35:23,533 INFO [train.py:1198] (1/2) Epoch 14, batch 0, loss[loss=0.2153, simple_loss=0.2629, pruned_loss=0.06346, ctc_loss=0.1313, cr_loss=0.3602, over 34477.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.2629, pruned_loss=0.06346, ctc_loss=0.1313, cr_loss=0.3602, over 34477.00 frames. ], batch size: 85, lr: 8.80e-03, grad_scale: 32.0 2024-09-17 17:35:23,534 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 17:35:26,224 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.9590, 4.0378, 4.4009, 4.3850], device='cuda:1') 2024-09-17 17:35:40,379 INFO [train.py:1230] (1/2) Epoch 14, validation: loss=0.1534, simple_loss=0.2526, pruned_loss=0.02256, ctc_loss=0.04484, cr_loss=1.644e-14, over 944034.00 frames. 2024-09-17 17:35:40,379 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 17:35:45,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.78 vs. limit=15.0 2024-09-17 17:35:55,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=235209.33333333334, ans=0.015 2024-09-17 17:36:08,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=235209.33333333334, ans=0.125 2024-09-17 17:36:10,918 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.183e+02 2.698e+02 2.976e+02 3.450e+02 7.059e+02, threshold=5.952e+02, percent-clipped=6.0 2024-09-17 17:36:24,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.85 vs. limit=15.0 2024-09-17 17:36:37,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=235302.66666666666, ans=0.1 2024-09-17 17:36:46,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.17 vs. limit=6.0 2024-09-17 17:36:49,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=235349.33333333334, ans=0.05 2024-09-17 17:36:50,369 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.64 vs. limit=15.0 2024-09-17 17:36:52,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=235349.33333333334, ans=0.125 2024-09-17 17:36:56,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=235349.33333333334, ans=0.0 2024-09-17 17:37:01,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=235349.33333333334, ans=0.125 2024-09-17 17:37:01,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=235349.33333333334, ans=0.125 2024-09-17 17:37:05,688 INFO [train.py:1198] (1/2) Epoch 14, batch 50, loss[loss=0.2252, simple_loss=0.2658, pruned_loss=0.0702, ctc_loss=0.1417, cr_loss=0.3939, over 34515.00 frames. ], tot_loss[loss=0.2512, simple_loss=0.2918, pruned_loss=0.08045, ctc_loss=0.1613, cr_loss=0.4343, over 1480151.20 frames. ], batch size: 82, lr: 8.80e-03, grad_scale: 32.0 2024-09-17 17:37:42,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.21 vs. limit=22.5 2024-09-17 17:37:54,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=235536.0, ans=0.125 2024-09-17 17:38:01,259 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.31 vs. limit=6.0 2024-09-17 17:38:05,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=235536.0, ans=0.025 2024-09-17 17:38:14,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=235582.66666666666, ans=0.0 2024-09-17 17:38:22,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=235582.66666666666, ans=0.125 2024-09-17 17:38:22,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=235582.66666666666, ans=0.025 2024-09-17 17:38:30,436 INFO [train.py:1198] (1/2) Epoch 14, batch 100, loss[loss=0.2236, simple_loss=0.2696, pruned_loss=0.06767, ctc_loss=0.1353, cr_loss=0.3799, over 34584.00 frames. ], tot_loss[loss=0.2515, simple_loss=0.2927, pruned_loss=0.08039, ctc_loss=0.1611, cr_loss=0.4346, over 2629088.35 frames. ], batch size: 89, lr: 8.79e-03, grad_scale: 32.0 2024-09-17 17:38:37,578 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-09-17 17:38:38,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=235629.33333333334, ans=0.125 2024-09-17 17:38:53,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=235676.0, ans=0.0 2024-09-17 17:38:58,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=235676.0, ans=0.0 2024-09-17 17:38:59,891 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.737e+02 3.511e+02 4.402e+02 7.215e+02, threshold=7.021e+02, percent-clipped=4.0 2024-09-17 17:39:01,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=235722.66666666666, ans=0.125 2024-09-17 17:39:08,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=235722.66666666666, ans=0.125 2024-09-17 17:39:16,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=235722.66666666666, ans=0.125 2024-09-17 17:39:21,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=235769.33333333334, ans=0.125 2024-09-17 17:39:24,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=235769.33333333334, ans=0.07 2024-09-17 17:39:34,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.38 vs. limit=22.5 2024-09-17 17:39:47,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.15 vs. limit=15.0 2024-09-17 17:39:51,703 INFO [train.py:1198] (1/2) Epoch 14, batch 150, loss[loss=0.2235, simple_loss=0.2615, pruned_loss=0.07007, ctc_loss=0.1443, cr_loss=0.4107, over 34519.00 frames. ], tot_loss[loss=0.2484, simple_loss=0.2902, pruned_loss=0.07883, ctc_loss=0.1582, cr_loss=0.4305, over 3557119.90 frames. ], batch size: 82, lr: 8.79e-03, grad_scale: 32.0 2024-09-17 17:39:57,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=235862.66666666666, ans=0.0 2024-09-17 17:40:09,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=235909.33333333334, ans=0.125 2024-09-17 17:40:15,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=235909.33333333334, ans=0.2 2024-09-17 17:40:38,662 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:40:40,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=235956.0, ans=0.04949747468305833 2024-09-17 17:40:58,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=236049.33333333334, ans=0.0 2024-09-17 17:40:58,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=236049.33333333334, ans=0.125 2024-09-17 17:41:06,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=236049.33333333334, ans=0.025 2024-09-17 17:41:16,147 INFO [train.py:1198] (1/2) Epoch 14, batch 200, loss[loss=0.2627, simple_loss=0.3049, pruned_loss=0.08434, ctc_loss=0.1699, cr_loss=0.4491, over 31890.00 frames. ], tot_loss[loss=0.2475, simple_loss=0.2892, pruned_loss=0.0785, ctc_loss=0.1577, cr_loss=0.4303, over 4270884.45 frames. ], batch size: 145, lr: 8.79e-03, grad_scale: 32.0 2024-09-17 17:41:31,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=236142.66666666666, ans=0.0 2024-09-17 17:41:33,924 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.54 vs. limit=22.5 2024-09-17 17:41:46,293 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.631e+02 3.105e+02 4.250e+02 6.768e+02, threshold=6.209e+02, percent-clipped=0.0 2024-09-17 17:42:05,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=236189.33333333334, ans=0.125 2024-09-17 17:42:27,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.03 vs. limit=15.0 2024-09-17 17:42:31,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=236282.66666666666, ans=0.125 2024-09-17 17:42:35,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=236282.66666666666, ans=0.2 2024-09-17 17:42:41,215 INFO [train.py:1198] (1/2) Epoch 14, batch 250, loss[loss=0.2491, simple_loss=0.2939, pruned_loss=0.07807, ctc_loss=0.1551, cr_loss=0.4298, over 34325.00 frames. ], tot_loss[loss=0.2468, simple_loss=0.2887, pruned_loss=0.07816, ctc_loss=0.157, cr_loss=0.4294, over 4832557.94 frames. ], batch size: 117, lr: 8.78e-03, grad_scale: 32.0 2024-09-17 17:43:43,867 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.25 vs. limit=10.0 2024-09-17 17:43:50,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=236516.0, ans=0.2 2024-09-17 17:44:04,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=236562.66666666666, ans=0.125 2024-09-17 17:44:05,402 INFO [train.py:1198] (1/2) Epoch 14, batch 300, loss[loss=0.2726, simple_loss=0.3111, pruned_loss=0.0896, ctc_loss=0.1763, cr_loss=0.4914, over 34339.00 frames. ], tot_loss[loss=0.2467, simple_loss=0.2885, pruned_loss=0.0782, ctc_loss=0.157, cr_loss=0.429, over 5261901.86 frames. ], batch size: 107, lr: 8.78e-03, grad_scale: 32.0 2024-09-17 17:44:35,115 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.097e+02 2.531e+02 2.975e+02 3.785e+02 5.706e+02, threshold=5.951e+02, percent-clipped=0.0 2024-09-17 17:44:37,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=236656.0, ans=0.125 2024-09-17 17:44:53,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=236702.66666666666, ans=0.1 2024-09-17 17:44:57,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.77 vs. limit=15.0 2024-09-17 17:45:27,498 INFO [train.py:1198] (1/2) Epoch 14, batch 350, loss[loss=0.2074, simple_loss=0.2503, pruned_loss=0.06183, ctc_loss=0.1284, cr_loss=0.3791, over 34284.00 frames. ], tot_loss[loss=0.2467, simple_loss=0.2887, pruned_loss=0.07812, ctc_loss=0.1569, cr_loss=0.4294, over 5596459.12 frames. ], batch size: 83, lr: 8.77e-03, grad_scale: 32.0 2024-09-17 17:45:49,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=236842.66666666666, ans=0.0 2024-09-17 17:45:49,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.10 vs. limit=15.0 2024-09-17 17:46:11,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.49 vs. limit=10.0 2024-09-17 17:46:34,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=236982.66666666666, ans=0.125 2024-09-17 17:46:39,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=236982.66666666666, ans=0.0 2024-09-17 17:46:42,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=236982.66666666666, ans=0.125 2024-09-17 17:46:47,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=236982.66666666666, ans=0.2 2024-09-17 17:46:47,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2024-09-17 17:46:51,926 INFO [train.py:1198] (1/2) Epoch 14, batch 400, loss[loss=0.2482, simple_loss=0.2923, pruned_loss=0.07779, ctc_loss=0.1557, cr_loss=0.4358, over 34427.00 frames. ], tot_loss[loss=0.246, simple_loss=0.2881, pruned_loss=0.07774, ctc_loss=0.1562, cr_loss=0.4278, over 5864302.21 frames. ], batch size: 95, lr: 8.77e-03, grad_scale: 32.0 2024-09-17 17:47:02,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=237029.33333333334, ans=0.1 2024-09-17 17:47:16,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=237076.0, ans=0.025 2024-09-17 17:47:22,320 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.112e+02 2.499e+02 3.052e+02 3.847e+02 6.955e+02, threshold=6.105e+02, percent-clipped=3.0 2024-09-17 17:47:32,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.68 vs. limit=12.0 2024-09-17 17:48:00,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=237216.0, ans=0.0 2024-09-17 17:48:12,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=237216.0, ans=0.07 2024-09-17 17:48:13,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=237216.0, ans=0.0 2024-09-17 17:48:13,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=237216.0, ans=0.0 2024-09-17 17:48:16,704 INFO [train.py:1198] (1/2) Epoch 14, batch 450, loss[loss=0.2718, simple_loss=0.3104, pruned_loss=0.08997, ctc_loss=0.1744, cr_loss=0.461, over 34705.00 frames. ], tot_loss[loss=0.2461, simple_loss=0.2881, pruned_loss=0.07785, ctc_loss=0.1565, cr_loss=0.4286, over 6054313.94 frames. ], batch size: 97, lr: 8.77e-03, grad_scale: 32.0 2024-09-17 17:48:18,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=237262.66666666666, ans=0.0 2024-09-17 17:48:18,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237262.66666666666, ans=0.1 2024-09-17 17:48:23,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=237262.66666666666, ans=0.1 2024-09-17 17:48:25,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=237262.66666666666, ans=0.2 2024-09-17 17:48:28,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=237262.66666666666, ans=0.0 2024-09-17 17:48:38,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=237309.33333333334, ans=0.125 2024-09-17 17:48:46,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=237309.33333333334, ans=0.025 2024-09-17 17:48:55,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=237356.0, ans=0.125 2024-09-17 17:49:10,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=237402.66666666666, ans=0.125 2024-09-17 17:49:30,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=237449.33333333334, ans=0.125 2024-09-17 17:49:39,598 INFO [train.py:1198] (1/2) Epoch 14, batch 500, loss[loss=0.2669, simple_loss=0.3099, pruned_loss=0.08571, ctc_loss=0.1724, cr_loss=0.4502, over 34449.00 frames. ], tot_loss[loss=0.2457, simple_loss=0.2876, pruned_loss=0.07773, ctc_loss=0.1562, cr_loss=0.4281, over 6220415.25 frames. ], batch size: 110, lr: 8.76e-03, grad_scale: 32.0 2024-09-17 17:50:11,680 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.498e+02 2.920e+02 3.618e+02 6.472e+02, threshold=5.840e+02, percent-clipped=1.0 2024-09-17 17:50:40,975 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2024-09-17 17:50:47,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=237682.66666666666, ans=0.1 2024-09-17 17:51:04,670 INFO [train.py:1198] (1/2) Epoch 14, batch 550, loss[loss=0.2779, simple_loss=0.3161, pruned_loss=0.09188, ctc_loss=0.1871, cr_loss=0.4609, over 33833.00 frames. ], tot_loss[loss=0.2463, simple_loss=0.2879, pruned_loss=0.07808, ctc_loss=0.1568, cr_loss=0.4283, over 6331168.19 frames. ], batch size: 122, lr: 8.76e-03, grad_scale: 32.0 2024-09-17 17:51:18,406 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 17:51:43,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=237822.66666666666, ans=0.0 2024-09-17 17:51:44,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237822.66666666666, ans=0.1 2024-09-17 17:52:02,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=237869.33333333334, ans=0.0 2024-09-17 17:52:29,084 INFO [train.py:1198] (1/2) Epoch 14, batch 600, loss[loss=0.2532, simple_loss=0.2935, pruned_loss=0.08158, ctc_loss=0.1616, cr_loss=0.4327, over 34225.00 frames. ], tot_loss[loss=0.2461, simple_loss=0.2878, pruned_loss=0.078, ctc_loss=0.1567, cr_loss=0.4282, over 6432658.64 frames. ], batch size: 117, lr: 8.75e-03, grad_scale: 32.0 2024-09-17 17:52:32,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=237962.66666666666, ans=0.125 2024-09-17 17:52:47,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=238009.33333333334, ans=0.125 2024-09-17 17:52:47,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=238009.33333333334, ans=0.0 2024-09-17 17:52:47,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=238009.33333333334, ans=0.0 2024-09-17 17:52:49,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=238009.33333333334, ans=0.125 2024-09-17 17:52:55,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.45 vs. limit=15.0 2024-09-17 17:53:00,656 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.454e+02 2.947e+02 3.541e+02 9.232e+02, threshold=5.893e+02, percent-clipped=4.0 2024-09-17 17:53:02,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=238056.0, ans=0.0 2024-09-17 17:53:09,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2024-09-17 17:53:20,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238102.66666666666, ans=0.1 2024-09-17 17:53:24,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 2024-09-17 17:53:30,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238102.66666666666, ans=0.1 2024-09-17 17:53:30,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=238102.66666666666, ans=0.1 2024-09-17 17:53:52,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=238196.0, ans=0.09899494936611666 2024-09-17 17:53:53,573 INFO [train.py:1198] (1/2) Epoch 14, batch 650, loss[loss=0.2493, simple_loss=0.2944, pruned_loss=0.07796, ctc_loss=0.156, cr_loss=0.4296, over 34523.00 frames. ], tot_loss[loss=0.2446, simple_loss=0.2867, pruned_loss=0.07716, ctc_loss=0.1552, cr_loss=0.4259, over 6524246.03 frames. ], batch size: 94, lr: 8.75e-03, grad_scale: 16.0 2024-09-17 17:54:08,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=238242.66666666666, ans=0.1 2024-09-17 17:54:21,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=238242.66666666666, ans=0.125 2024-09-17 17:54:36,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=238289.33333333334, ans=0.125 2024-09-17 17:54:58,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=238382.66666666666, ans=0.0 2024-09-17 17:55:09,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=238382.66666666666, ans=0.1 2024-09-17 17:55:17,728 INFO [train.py:1198] (1/2) Epoch 14, batch 700, loss[loss=0.2287, simple_loss=0.2734, pruned_loss=0.07008, ctc_loss=0.1402, cr_loss=0.3956, over 34573.00 frames. ], tot_loss[loss=0.2455, simple_loss=0.2877, pruned_loss=0.07752, ctc_loss=0.1559, cr_loss=0.4278, over 6578627.01 frames. ], batch size: 89, lr: 8.74e-03, grad_scale: 16.0 2024-09-17 17:55:35,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=238476.0, ans=0.0 2024-09-17 17:55:45,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=238476.0, ans=0.0 2024-09-17 17:55:48,935 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.634e+02 3.326e+02 4.786e+02 8.540e+02, threshold=6.651e+02, percent-clipped=13.0 2024-09-17 17:55:54,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=238522.66666666666, ans=0.025 2024-09-17 17:56:00,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.91 vs. limit=15.0 2024-09-17 17:56:05,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=238569.33333333334, ans=0.125 2024-09-17 17:56:26,599 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.49 vs. limit=22.5 2024-09-17 17:56:27,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=238616.0, ans=0.0 2024-09-17 17:56:40,312 INFO [train.py:1198] (1/2) Epoch 14, batch 750, loss[loss=0.2434, simple_loss=0.2889, pruned_loss=0.07553, ctc_loss=0.1519, cr_loss=0.41, over 34397.00 frames. ], tot_loss[loss=0.2452, simple_loss=0.2873, pruned_loss=0.07741, ctc_loss=0.1558, cr_loss=0.4276, over 6618499.61 frames. ], batch size: 95, lr: 8.74e-03, grad_scale: 16.0 2024-09-17 17:56:52,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=238662.66666666666, ans=0.0 2024-09-17 17:57:00,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=238709.33333333334, ans=0.1 2024-09-17 17:57:00,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=238709.33333333334, ans=0.0 2024-09-17 17:57:06,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=238709.33333333334, ans=0.2 2024-09-17 17:57:47,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=238849.33333333334, ans=0.015 2024-09-17 17:58:02,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=238849.33333333334, ans=0.5 2024-09-17 17:58:04,827 INFO [train.py:1198] (1/2) Epoch 14, batch 800, loss[loss=0.2137, simple_loss=0.2616, pruned_loss=0.06237, ctc_loss=0.1304, cr_loss=0.3745, over 34500.00 frames. ], tot_loss[loss=0.2455, simple_loss=0.2874, pruned_loss=0.07762, ctc_loss=0.1561, cr_loss=0.4279, over 6656338.80 frames. ], batch size: 85, lr: 8.74e-03, grad_scale: 32.0 2024-09-17 17:58:09,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.63 vs. limit=22.5 2024-09-17 17:58:10,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=238896.0, ans=0.0 2024-09-17 17:58:31,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=238942.66666666666, ans=10.0 2024-09-17 17:58:36,266 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.432e+02 3.091e+02 3.826e+02 6.701e+02, threshold=6.182e+02, percent-clipped=1.0 2024-09-17 17:58:38,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=238989.33333333334, ans=10.0 2024-09-17 17:58:40,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=238989.33333333334, ans=0.1 2024-09-17 17:59:26,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.max_positive, batch_count=239082.66666666666, ans=0.95 2024-09-17 17:59:29,013 INFO [train.py:1198] (1/2) Epoch 14, batch 850, loss[loss=0.2587, simple_loss=0.3034, pruned_loss=0.08107, ctc_loss=0.169, cr_loss=0.4499, over 34373.00 frames. ], tot_loss[loss=0.2454, simple_loss=0.2874, pruned_loss=0.0776, ctc_loss=0.156, cr_loss=0.4281, over 6689392.98 frames. ], batch size: 103, lr: 8.73e-03, grad_scale: 32.0 2024-09-17 17:59:29,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239129.33333333334, ans=0.1 2024-09-17 17:59:45,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239176.0, ans=0.1 2024-09-17 18:00:05,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=239222.66666666666, ans=0.125 2024-09-17 18:00:22,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=239269.33333333334, ans=0.2 2024-09-17 18:00:22,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=239269.33333333334, ans=0.125 2024-09-17 18:00:36,409 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2024-09-17 18:00:38,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=239316.0, ans=0.125 2024-09-17 18:00:48,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=239316.0, ans=0.0 2024-09-17 18:00:51,662 INFO [train.py:1198] (1/2) Epoch 14, batch 900, loss[loss=0.2133, simple_loss=0.2623, pruned_loss=0.06118, ctc_loss=0.1303, cr_loss=0.3964, over 34485.00 frames. ], tot_loss[loss=0.2454, simple_loss=0.2875, pruned_loss=0.07752, ctc_loss=0.156, cr_loss=0.4274, over 6696175.73 frames. ], batch size: 85, lr: 8.73e-03, grad_scale: 32.0 2024-09-17 18:01:02,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.22 vs. limit=22.5 2024-09-17 18:01:08,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=239409.33333333334, ans=0.2 2024-09-17 18:01:14,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=239409.33333333334, ans=0.2 2024-09-17 18:01:18,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2024-09-17 18:01:26,532 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.516e+02 3.128e+02 4.166e+02 7.648e+02, threshold=6.255e+02, percent-clipped=1.0 2024-09-17 18:01:48,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=239502.66666666666, ans=0.125 2024-09-17 18:02:11,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=239549.33333333334, ans=0.125 2024-09-17 18:02:16,031 INFO [train.py:1198] (1/2) Epoch 14, batch 950, loss[loss=0.2167, simple_loss=0.2584, pruned_loss=0.06587, ctc_loss=0.1365, cr_loss=0.3967, over 34682.00 frames. ], tot_loss[loss=0.2455, simple_loss=0.2875, pruned_loss=0.07756, ctc_loss=0.1561, cr_loss=0.4275, over 6700703.23 frames. ], batch size: 87, lr: 8.72e-03, grad_scale: 16.0 2024-09-17 18:02:26,153 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:02:48,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=239642.66666666666, ans=0.1 2024-09-17 18:03:03,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2024-09-17 18:03:04,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=239689.33333333334, ans=0.125 2024-09-17 18:03:06,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=239736.0, ans=0.025 2024-09-17 18:03:11,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239736.0, ans=0.1 2024-09-17 18:03:11,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.73 vs. limit=15.0 2024-09-17 18:03:21,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=239736.0, ans=0.125 2024-09-17 18:03:34,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.02 vs. limit=22.5 2024-09-17 18:03:35,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=239782.66666666666, ans=0.2 2024-09-17 18:03:37,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=239782.66666666666, ans=0.0 2024-09-17 18:03:40,103 INFO [train.py:1198] (1/2) Epoch 14, batch 1000, loss[loss=0.2421, simple_loss=0.2807, pruned_loss=0.07759, ctc_loss=0.1555, cr_loss=0.43, over 34495.00 frames. ], tot_loss[loss=0.2467, simple_loss=0.2886, pruned_loss=0.07814, ctc_loss=0.1572, cr_loss=0.429, over 6694854.81 frames. ], batch size: 90, lr: 8.72e-03, grad_scale: 16.0 2024-09-17 18:03:42,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=239829.33333333334, ans=0.07 2024-09-17 18:03:53,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=239829.33333333334, ans=0.125 2024-09-17 18:04:00,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=239876.0, ans=0.125 2024-09-17 18:04:05,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=239876.0, ans=0.0 2024-09-17 18:04:13,733 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.478e+02 2.814e+02 3.298e+02 6.391e+02, threshold=5.628e+02, percent-clipped=1.0 2024-09-17 18:04:14,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.35 vs. limit=22.5 2024-09-17 18:04:22,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=239922.66666666666, ans=0.125 2024-09-17 18:04:26,343 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.68 vs. limit=6.0 2024-09-17 18:04:35,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=239969.33333333334, ans=0.07 2024-09-17 18:05:05,274 INFO [train.py:1198] (1/2) Epoch 14, batch 1050, loss[loss=0.2469, simple_loss=0.2935, pruned_loss=0.07615, ctc_loss=0.1526, cr_loss=0.4361, over 34580.00 frames. ], tot_loss[loss=0.2461, simple_loss=0.288, pruned_loss=0.07789, ctc_loss=0.1568, cr_loss=0.4278, over 6705038.17 frames. ], batch size: 99, lr: 8.71e-03, grad_scale: 16.0 2024-09-17 18:05:17,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=240062.66666666666, ans=0.1 2024-09-17 18:05:17,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2024-09-17 18:05:41,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=240156.0, ans=0.5 2024-09-17 18:06:01,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=240202.66666666666, ans=0.125 2024-09-17 18:06:16,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=240249.33333333334, ans=0.0 2024-09-17 18:06:19,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=240249.33333333334, ans=0.0 2024-09-17 18:06:29,619 INFO [train.py:1198] (1/2) Epoch 14, batch 1100, loss[loss=0.2366, simple_loss=0.2847, pruned_loss=0.07131, ctc_loss=0.1456, cr_loss=0.4161, over 34373.00 frames. ], tot_loss[loss=0.2458, simple_loss=0.2879, pruned_loss=0.0777, ctc_loss=0.1563, cr_loss=0.4267, over 6718232.45 frames. ], batch size: 91, lr: 8.71e-03, grad_scale: 16.0 2024-09-17 18:06:50,446 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2024-09-17 18:07:02,751 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.085e+02 2.455e+02 2.860e+02 3.658e+02 5.815e+02, threshold=5.719e+02, percent-clipped=2.0 2024-09-17 18:07:41,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=240482.66666666666, ans=0.0 2024-09-17 18:07:52,199 INFO [train.py:1198] (1/2) Epoch 14, batch 1150, loss[loss=0.2338, simple_loss=0.2743, pruned_loss=0.07352, ctc_loss=0.148, cr_loss=0.4157, over 34343.00 frames. ], tot_loss[loss=0.2459, simple_loss=0.2878, pruned_loss=0.07779, ctc_loss=0.1564, cr_loss=0.427, over 6715820.63 frames. ], batch size: 91, lr: 8.71e-03, grad_scale: 16.0 2024-09-17 18:07:55,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=240529.33333333334, ans=0.95 2024-09-17 18:08:10,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=240576.0, ans=0.0 2024-09-17 18:08:13,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=240576.0, ans=0.025 2024-09-17 18:08:14,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.56 vs. limit=15.0 2024-09-17 18:08:24,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.71 vs. limit=15.0 2024-09-17 18:08:55,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.91 vs. limit=15.0 2024-09-17 18:09:17,001 INFO [train.py:1198] (1/2) Epoch 14, batch 1200, loss[loss=0.2475, simple_loss=0.2909, pruned_loss=0.07812, ctc_loss=0.156, cr_loss=0.4149, over 34567.00 frames. ], tot_loss[loss=0.2469, simple_loss=0.2889, pruned_loss=0.07818, ctc_loss=0.1571, cr_loss=0.4281, over 6709583.07 frames. ], batch size: 99, lr: 8.70e-03, grad_scale: 32.0 2024-09-17 18:09:26,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.67 vs. limit=15.0 2024-09-17 18:09:27,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=240762.66666666666, ans=0.125 2024-09-17 18:09:29,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=12.0 2024-09-17 18:09:32,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=240809.33333333334, ans=0.0 2024-09-17 18:09:43,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.61 vs. limit=22.5 2024-09-17 18:09:45,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=240809.33333333334, ans=0.125 2024-09-17 18:09:52,752 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:09:54,165 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.536e+02 3.136e+02 4.273e+02 9.233e+02, threshold=6.273e+02, percent-clipped=5.0 2024-09-17 18:10:09,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.07 vs. limit=15.0 2024-09-17 18:10:32,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=240949.33333333334, ans=0.125 2024-09-17 18:10:42,189 INFO [train.py:1198] (1/2) Epoch 14, batch 1250, loss[loss=0.2616, simple_loss=0.3032, pruned_loss=0.08396, ctc_loss=0.1667, cr_loss=0.4698, over 34322.00 frames. ], tot_loss[loss=0.2473, simple_loss=0.2894, pruned_loss=0.07829, ctc_loss=0.1571, cr_loss=0.4292, over 6743337.92 frames. ], batch size: 107, lr: 8.70e-03, grad_scale: 16.0 2024-09-17 18:11:12,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=241042.66666666666, ans=0.025 2024-09-17 18:11:22,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=241089.33333333334, ans=0.1 2024-09-17 18:11:38,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=241136.0, ans=0.0 2024-09-17 18:11:52,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=241182.66666666666, ans=0.1 2024-09-17 18:12:04,967 INFO [train.py:1198] (1/2) Epoch 14, batch 1300, loss[loss=0.2607, simple_loss=0.3034, pruned_loss=0.08314, ctc_loss=0.1708, cr_loss=0.4399, over 32984.00 frames. ], tot_loss[loss=0.2464, simple_loss=0.2886, pruned_loss=0.07794, ctc_loss=0.1565, cr_loss=0.4277, over 6746563.33 frames. ], batch size: 130, lr: 8.69e-03, grad_scale: 16.0 2024-09-17 18:12:39,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.845e+02 2.579e+02 3.227e+02 3.988e+02 8.383e+02, threshold=6.455e+02, percent-clipped=1.0 2024-09-17 18:12:41,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=241322.66666666666, ans=0.125 2024-09-17 18:13:00,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=241369.33333333334, ans=0.0 2024-09-17 18:13:04,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.19 vs. limit=22.5 2024-09-17 18:13:10,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=241369.33333333334, ans=0.2 2024-09-17 18:13:20,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=241416.0, ans=0.2 2024-09-17 18:13:30,000 INFO [train.py:1198] (1/2) Epoch 14, batch 1350, loss[loss=0.2398, simple_loss=0.2822, pruned_loss=0.07504, ctc_loss=0.1507, cr_loss=0.4292, over 34554.00 frames. ], tot_loss[loss=0.2456, simple_loss=0.288, pruned_loss=0.07751, ctc_loss=0.1557, cr_loss=0.4268, over 6766676.26 frames. ], batch size: 94, lr: 8.69e-03, grad_scale: 16.0 2024-09-17 18:13:34,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.30 vs. limit=22.5 2024-09-17 18:14:16,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2024-09-17 18:14:39,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=241649.33333333334, ans=0.95 2024-09-17 18:14:53,494 INFO [train.py:1198] (1/2) Epoch 14, batch 1400, loss[loss=0.2064, simple_loss=0.2475, pruned_loss=0.06245, ctc_loss=0.1263, cr_loss=0.3783, over 34248.00 frames. ], tot_loss[loss=0.2454, simple_loss=0.2876, pruned_loss=0.0775, ctc_loss=0.1556, cr_loss=0.4269, over 6779159.65 frames. ], batch size: 80, lr: 8.69e-03, grad_scale: 16.0 2024-09-17 18:14:59,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.51 vs. limit=15.0 2024-09-17 18:15:03,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=241696.0, ans=0.0 2024-09-17 18:15:15,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=241742.66666666666, ans=0.09899494936611666 2024-09-17 18:15:28,156 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.041e+02 2.498e+02 2.828e+02 3.407e+02 6.999e+02, threshold=5.656e+02, percent-clipped=1.0 2024-09-17 18:15:29,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.57 vs. limit=10.0 2024-09-17 18:15:40,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2024-09-17 18:15:43,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=241836.0, ans=0.125 2024-09-17 18:15:51,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=241836.0, ans=0.125 2024-09-17 18:15:56,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=241836.0, ans=0.125 2024-09-17 18:15:59,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=241882.66666666666, ans=0.125 2024-09-17 18:16:03,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=241882.66666666666, ans=0.1 2024-09-17 18:16:15,906 INFO [train.py:1198] (1/2) Epoch 14, batch 1450, loss[loss=0.2667, simple_loss=0.3089, pruned_loss=0.08599, ctc_loss=0.1709, cr_loss=0.4603, over 34467.00 frames. ], tot_loss[loss=0.2458, simple_loss=0.2881, pruned_loss=0.07761, ctc_loss=0.156, cr_loss=0.4276, over 6775844.81 frames. ], batch size: 110, lr: 8.68e-03, grad_scale: 16.0 2024-09-17 18:16:21,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=241929.33333333334, ans=0.2 2024-09-17 18:16:22,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=241929.33333333334, ans=0.125 2024-09-17 18:16:28,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.70 vs. limit=15.0 2024-09-17 18:16:50,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.61 vs. limit=15.0 2024-09-17 18:16:54,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=242022.66666666666, ans=0.0 2024-09-17 18:17:11,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=242069.33333333334, ans=0.125 2024-09-17 18:17:21,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=242069.33333333334, ans=0.125 2024-09-17 18:17:24,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242116.0, ans=0.1 2024-09-17 18:17:32,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=242116.0, ans=0.2 2024-09-17 18:17:42,339 INFO [train.py:1198] (1/2) Epoch 14, batch 1500, loss[loss=0.2653, simple_loss=0.3009, pruned_loss=0.08781, ctc_loss=0.177, cr_loss=0.468, over 34463.00 frames. ], tot_loss[loss=0.2465, simple_loss=0.2888, pruned_loss=0.0779, ctc_loss=0.1564, cr_loss=0.4284, over 6776021.76 frames. ], batch size: 100, lr: 8.68e-03, grad_scale: 16.0 2024-09-17 18:18:16,815 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.539e+02 2.971e+02 3.919e+02 7.377e+02, threshold=5.942e+02, percent-clipped=4.0 2024-09-17 18:18:29,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=242256.0, ans=0.1 2024-09-17 18:18:29,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=242256.0, ans=0.0 2024-09-17 18:18:44,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.64 vs. limit=15.0 2024-09-17 18:19:04,746 INFO [train.py:1198] (1/2) Epoch 14, batch 1550, loss[loss=0.2647, simple_loss=0.3066, pruned_loss=0.08535, ctc_loss=0.1681, cr_loss=0.4632, over 34418.00 frames. ], tot_loss[loss=0.2466, simple_loss=0.2886, pruned_loss=0.07805, ctc_loss=0.1566, cr_loss=0.4282, over 6745218.92 frames. ], batch size: 105, lr: 8.67e-03, grad_scale: 16.0 2024-09-17 18:19:14,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=242396.0, ans=0.0 2024-09-17 18:19:29,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=242442.66666666666, ans=0.0 2024-09-17 18:19:36,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=242489.33333333334, ans=0.0 2024-09-17 18:19:42,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=242489.33333333334, ans=0.125 2024-09-17 18:19:42,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=242489.33333333334, ans=0.125 2024-09-17 18:20:08,729 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.61 vs. limit=10.0 2024-09-17 18:20:29,130 INFO [train.py:1198] (1/2) Epoch 14, batch 1600, loss[loss=0.2537, simple_loss=0.2996, pruned_loss=0.07934, ctc_loss=0.1615, cr_loss=0.4213, over 34534.00 frames. ], tot_loss[loss=0.2461, simple_loss=0.2882, pruned_loss=0.07781, ctc_loss=0.1562, cr_loss=0.4274, over 6724910.80 frames. ], batch size: 99, lr: 8.67e-03, grad_scale: 32.0 2024-09-17 18:20:37,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=242629.33333333334, ans=0.0 2024-09-17 18:20:52,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=242676.0, ans=0.125 2024-09-17 18:21:07,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=242676.0, ans=0.0 2024-09-17 18:21:12,240 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.501e+02 3.040e+02 3.764e+02 6.837e+02, threshold=6.080e+02, percent-clipped=1.0 2024-09-17 18:21:16,527 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.75 vs. limit=15.0 2024-09-17 18:21:47,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=242816.0, ans=0.125 2024-09-17 18:21:58,540 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:21:58,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=242862.66666666666, ans=0.0 2024-09-17 18:21:59,812 INFO [train.py:1198] (1/2) Epoch 14, batch 1650, loss[loss=0.2509, simple_loss=0.2977, pruned_loss=0.07733, ctc_loss=0.1566, cr_loss=0.4549, over 34402.00 frames. ], tot_loss[loss=0.2458, simple_loss=0.2879, pruned_loss=0.07772, ctc_loss=0.1562, cr_loss=0.4272, over 6716457.97 frames. ], batch size: 103, lr: 8.67e-03, grad_scale: 32.0 2024-09-17 18:22:31,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=242956.0, ans=0.1 2024-09-17 18:23:02,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=243002.66666666666, ans=0.025 2024-09-17 18:23:06,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=243049.33333333334, ans=0.125 2024-09-17 18:23:07,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=243049.33333333334, ans=0.2 2024-09-17 18:23:20,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=243096.0, ans=0.1 2024-09-17 18:23:22,104 INFO [train.py:1198] (1/2) Epoch 14, batch 1700, loss[loss=0.2137, simple_loss=0.2581, pruned_loss=0.06363, ctc_loss=0.1325, cr_loss=0.3897, over 34340.00 frames. ], tot_loss[loss=0.2453, simple_loss=0.2875, pruned_loss=0.07742, ctc_loss=0.1556, cr_loss=0.4268, over 6742958.54 frames. ], batch size: 80, lr: 8.66e-03, grad_scale: 32.0 2024-09-17 18:23:37,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=243142.66666666666, ans=0.09899494936611666 2024-09-17 18:23:47,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=243142.66666666666, ans=0.125 2024-09-17 18:23:47,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=243142.66666666666, ans=0.0 2024-09-17 18:23:52,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=243142.66666666666, ans=0.0 2024-09-17 18:23:56,437 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.097e+02 2.665e+02 3.238e+02 4.005e+02 8.003e+02, threshold=6.475e+02, percent-clipped=2.0 2024-09-17 18:23:58,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=243189.33333333334, ans=0.125 2024-09-17 18:24:25,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=243236.0, ans=0.125 2024-09-17 18:24:25,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=243236.0, ans=0.0 2024-09-17 18:24:33,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=243282.66666666666, ans=0.125 2024-09-17 18:24:46,520 INFO [train.py:1198] (1/2) Epoch 14, batch 1750, loss[loss=0.2226, simple_loss=0.2583, pruned_loss=0.07048, ctc_loss=0.1455, cr_loss=0.4224, over 34151.00 frames. ], tot_loss[loss=0.2451, simple_loss=0.2873, pruned_loss=0.07735, ctc_loss=0.1556, cr_loss=0.4273, over 6751990.87 frames. ], batch size: 78, lr: 8.66e-03, grad_scale: 32.0 2024-09-17 18:24:51,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=243329.33333333334, ans=0.125 2024-09-17 18:24:53,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=243329.33333333334, ans=0.0 2024-09-17 18:24:58,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=243329.33333333334, ans=0.125 2024-09-17 18:25:41,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=243469.33333333334, ans=0.0 2024-09-17 18:25:46,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=243469.33333333334, ans=0.125 2024-09-17 18:25:53,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=10.80 vs. limit=15.0 2024-09-17 18:26:07,683 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:26:10,511 INFO [train.py:1198] (1/2) Epoch 14, batch 1800, loss[loss=0.2506, simple_loss=0.293, pruned_loss=0.07926, ctc_loss=0.1608, cr_loss=0.4393, over 34704.00 frames. ], tot_loss[loss=0.2456, simple_loss=0.2877, pruned_loss=0.07758, ctc_loss=0.1561, cr_loss=0.428, over 6755390.88 frames. ], batch size: 97, lr: 8.65e-03, grad_scale: 32.0 2024-09-17 18:26:19,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=243562.66666666666, ans=0.0 2024-09-17 18:26:22,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=243562.66666666666, ans=0.025 2024-09-17 18:26:45,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.590e+02 2.962e+02 3.830e+02 8.507e+02, threshold=5.925e+02, percent-clipped=5.0 2024-09-17 18:26:49,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=243656.0, ans=0.025 2024-09-17 18:27:08,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=243702.66666666666, ans=0.125 2024-09-17 18:27:17,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=243749.33333333334, ans=0.0 2024-09-17 18:27:20,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=243749.33333333334, ans=0.1 2024-09-17 18:27:22,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2024-09-17 18:27:28,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=243749.33333333334, ans=0.125 2024-09-17 18:27:33,491 INFO [train.py:1198] (1/2) Epoch 14, batch 1850, loss[loss=0.2492, simple_loss=0.2976, pruned_loss=0.07687, ctc_loss=0.1539, cr_loss=0.4079, over 34454.00 frames. ], tot_loss[loss=0.2453, simple_loss=0.2876, pruned_loss=0.07741, ctc_loss=0.1557, cr_loss=0.4274, over 6763024.90 frames. ], batch size: 100, lr: 8.65e-03, grad_scale: 32.0 2024-09-17 18:27:51,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=243842.66666666666, ans=0.125 2024-09-17 18:27:57,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=243842.66666666666, ans=0.07 2024-09-17 18:27:57,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=243842.66666666666, ans=0.125 2024-09-17 18:28:03,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.61 vs. limit=15.0 2024-09-17 18:28:19,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=243889.33333333334, ans=0.025 2024-09-17 18:28:21,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=12.0 2024-09-17 18:28:28,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=243936.0, ans=0.0 2024-09-17 18:28:42,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=243982.66666666666, ans=0.0 2024-09-17 18:28:43,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=243982.66666666666, ans=0.125 2024-09-17 18:28:47,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=243982.66666666666, ans=0.07 2024-09-17 18:28:52,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=22.5 2024-09-17 18:28:59,937 INFO [train.py:1198] (1/2) Epoch 14, batch 1900, loss[loss=0.2529, simple_loss=0.2973, pruned_loss=0.07893, ctc_loss=0.1602, cr_loss=0.4648, over 34379.00 frames. ], tot_loss[loss=0.2457, simple_loss=0.2881, pruned_loss=0.07754, ctc_loss=0.1559, cr_loss=0.4281, over 6771460.75 frames. ], batch size: 103, lr: 8.65e-03, grad_scale: 32.0 2024-09-17 18:29:21,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=244076.0, ans=0.125 2024-09-17 18:29:34,410 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.077e+02 2.543e+02 3.106e+02 4.231e+02 8.714e+02, threshold=6.213e+02, percent-clipped=10.0 2024-09-17 18:29:39,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=244122.66666666666, ans=0.125 2024-09-17 18:29:46,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.61 vs. limit=15.0 2024-09-17 18:30:18,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=244216.0, ans=0.2 2024-09-17 18:30:21,945 INFO [train.py:1198] (1/2) Epoch 14, batch 1950, loss[loss=0.2317, simple_loss=0.2809, pruned_loss=0.06904, ctc_loss=0.1427, cr_loss=0.3979, over 34350.00 frames. ], tot_loss[loss=0.2464, simple_loss=0.2891, pruned_loss=0.07771, ctc_loss=0.1561, cr_loss=0.429, over 6788299.64 frames. ], batch size: 91, lr: 8.64e-03, grad_scale: 32.0 2024-09-17 18:30:30,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=244262.66666666666, ans=0.125 2024-09-17 18:31:03,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_positive, batch_count=244356.0, ans=0.05 2024-09-17 18:31:08,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=244356.0, ans=0.0 2024-09-17 18:31:30,916 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.67 vs. limit=22.5 2024-09-17 18:31:34,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.91 vs. limit=12.0 2024-09-17 18:31:41,917 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:31:44,822 INFO [train.py:1198] (1/2) Epoch 14, batch 2000, loss[loss=0.2157, simple_loss=0.257, pruned_loss=0.06608, ctc_loss=0.1348, cr_loss=0.3809, over 34148.00 frames. ], tot_loss[loss=0.2471, simple_loss=0.2895, pruned_loss=0.07804, ctc_loss=0.1568, cr_loss=0.43, over 6763119.86 frames. ], batch size: 78, lr: 8.64e-03, grad_scale: 32.0 2024-09-17 18:32:06,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=244542.66666666666, ans=0.0 2024-09-17 18:32:23,787 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.477e+02 3.072e+02 4.376e+02 1.237e+03, threshold=6.143e+02, percent-clipped=4.0 2024-09-17 18:32:24,256 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:32:35,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=244589.33333333334, ans=0.1 2024-09-17 18:32:48,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.min_positive, batch_count=244636.0, ans=0.025 2024-09-17 18:32:58,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=244682.66666666666, ans=0.125 2024-09-17 18:33:06,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=244682.66666666666, ans=0.125 2024-09-17 18:33:11,412 INFO [train.py:1198] (1/2) Epoch 14, batch 2050, loss[loss=0.209, simple_loss=0.2554, pruned_loss=0.06077, ctc_loss=0.1294, cr_loss=0.3812, over 34471.00 frames. ], tot_loss[loss=0.2462, simple_loss=0.2885, pruned_loss=0.07778, ctc_loss=0.1564, cr_loss=0.4285, over 6755889.64 frames. ], batch size: 82, lr: 8.63e-03, grad_scale: 32.0 2024-09-17 18:33:41,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=22.5 2024-09-17 18:33:51,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=244822.66666666666, ans=0.5 2024-09-17 18:33:53,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=244822.66666666666, ans=0.125 2024-09-17 18:34:19,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=244916.0, ans=0.0 2024-09-17 18:34:32,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=244962.66666666666, ans=0.1 2024-09-17 18:34:33,859 INFO [train.py:1198] (1/2) Epoch 14, batch 2100, loss[loss=0.2385, simple_loss=0.283, pruned_loss=0.07334, ctc_loss=0.1514, cr_loss=0.4268, over 34534.00 frames. ], tot_loss[loss=0.2454, simple_loss=0.2878, pruned_loss=0.07742, ctc_loss=0.1556, cr_loss=0.4273, over 6769545.07 frames. ], batch size: 94, lr: 8.63e-03, grad_scale: 32.0 2024-09-17 18:34:46,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.58 vs. limit=15.0 2024-09-17 18:34:50,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=245009.33333333334, ans=0.125 2024-09-17 18:35:08,148 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.579e+02 2.876e+02 4.066e+02 8.838e+02, threshold=5.753e+02, percent-clipped=4.0 2024-09-17 18:35:16,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=245056.0, ans=0.125 2024-09-17 18:35:19,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=245056.0, ans=0.0 2024-09-17 18:35:27,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=15.0 2024-09-17 18:35:34,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=245102.66666666666, ans=0.0 2024-09-17 18:35:58,018 INFO [train.py:1198] (1/2) Epoch 14, batch 2150, loss[loss=0.2473, simple_loss=0.2881, pruned_loss=0.07874, ctc_loss=0.1585, cr_loss=0.4354, over 34342.00 frames. ], tot_loss[loss=0.2443, simple_loss=0.2869, pruned_loss=0.07686, ctc_loss=0.1547, cr_loss=0.4258, over 6788100.98 frames. ], batch size: 91, lr: 8.63e-03, grad_scale: 32.0 2024-09-17 18:36:01,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=245196.0, ans=0.125 2024-09-17 18:36:25,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.66 vs. limit=12.0 2024-09-17 18:36:33,520 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:36:53,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=245336.0, ans=0.0 2024-09-17 18:37:03,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=245336.0, ans=0.0 2024-09-17 18:37:11,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=245382.66666666666, ans=0.125 2024-09-17 18:37:21,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2024-09-17 18:37:22,348 INFO [train.py:1198] (1/2) Epoch 14, batch 2200, loss[loss=0.246, simple_loss=0.292, pruned_loss=0.0759, ctc_loss=0.1553, cr_loss=0.4257, over 34452.00 frames. ], tot_loss[loss=0.2443, simple_loss=0.287, pruned_loss=0.07685, ctc_loss=0.1547, cr_loss=0.4262, over 6783573.60 frames. ], batch size: 100, lr: 8.62e-03, grad_scale: 32.0 2024-09-17 18:37:51,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=245476.0, ans=0.04949747468305833 2024-09-17 18:37:56,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=245522.66666666666, ans=0.125 2024-09-17 18:37:57,629 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.976e+02 2.556e+02 3.070e+02 4.470e+02 1.271e+03, threshold=6.140e+02, percent-clipped=8.0 2024-09-17 18:38:22,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=245569.33333333334, ans=0.125 2024-09-17 18:38:44,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=245662.66666666666, ans=0.0 2024-09-17 18:38:45,738 INFO [train.py:1198] (1/2) Epoch 14, batch 2250, loss[loss=0.2519, simple_loss=0.2905, pruned_loss=0.08121, ctc_loss=0.1638, cr_loss=0.4536, over 34419.00 frames. ], tot_loss[loss=0.2445, simple_loss=0.2871, pruned_loss=0.07691, ctc_loss=0.1548, cr_loss=0.427, over 6782686.10 frames. ], batch size: 95, lr: 8.62e-03, grad_scale: 32.0 2024-09-17 18:38:51,109 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:38:51,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=245662.66666666666, ans=0.0 2024-09-17 18:39:07,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=245709.33333333334, ans=0.5 2024-09-17 18:39:42,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=245802.66666666666, ans=0.0 2024-09-17 18:40:03,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=15.0 2024-09-17 18:40:11,472 INFO [train.py:1198] (1/2) Epoch 14, batch 2300, loss[loss=0.2158, simple_loss=0.2608, pruned_loss=0.06423, ctc_loss=0.1347, cr_loss=0.3858, over 34265.00 frames. ], tot_loss[loss=0.2437, simple_loss=0.2862, pruned_loss=0.07665, ctc_loss=0.1543, cr_loss=0.4257, over 6768290.45 frames. ], batch size: 83, lr: 8.61e-03, grad_scale: 32.0 2024-09-17 18:40:13,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=245896.0, ans=0.125 2024-09-17 18:40:23,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=245896.0, ans=0.09899494936611666 2024-09-17 18:40:26,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.88 vs. limit=22.5 2024-09-17 18:40:46,118 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.084e+02 2.554e+02 3.106e+02 4.293e+02 7.999e+02, threshold=6.211e+02, percent-clipped=3.0 2024-09-17 18:40:59,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=246036.0, ans=0.125 2024-09-17 18:41:01,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=246036.0, ans=0.0 2024-09-17 18:41:33,972 INFO [train.py:1198] (1/2) Epoch 14, batch 2350, loss[loss=0.2599, simple_loss=0.3005, pruned_loss=0.08347, ctc_loss=0.1706, cr_loss=0.4543, over 34703.00 frames. ], tot_loss[loss=0.2446, simple_loss=0.2869, pruned_loss=0.07709, ctc_loss=0.155, cr_loss=0.4273, over 6773751.71 frames. ], batch size: 97, lr: 8.61e-03, grad_scale: 32.0 2024-09-17 18:41:36,110 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:42:20,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=246222.66666666666, ans=0.2 2024-09-17 18:42:23,880 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.98 vs. limit=15.0 2024-09-17 18:42:30,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.71 vs. limit=15.0 2024-09-17 18:42:56,262 INFO [train.py:1198] (1/2) Epoch 14, batch 2400, loss[loss=0.2296, simple_loss=0.2753, pruned_loss=0.06962, ctc_loss=0.1424, cr_loss=0.4058, over 34579.00 frames. ], tot_loss[loss=0.245, simple_loss=0.2873, pruned_loss=0.07731, ctc_loss=0.1552, cr_loss=0.4279, over 6778262.86 frames. ], batch size: 89, lr: 8.60e-03, grad_scale: 32.0 2024-09-17 18:43:04,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=246362.66666666666, ans=0.025 2024-09-17 18:43:26,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=246409.33333333334, ans=0.2 2024-09-17 18:43:32,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=246456.0, ans=0.125 2024-09-17 18:43:35,487 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.704e+02 3.221e+02 4.092e+02 9.196e+02, threshold=6.443e+02, percent-clipped=2.0 2024-09-17 18:44:23,625 INFO [train.py:1198] (1/2) Epoch 14, batch 2450, loss[loss=0.2364, simple_loss=0.2856, pruned_loss=0.07143, ctc_loss=0.1414, cr_loss=0.4005, over 34414.00 frames. ], tot_loss[loss=0.2457, simple_loss=0.2879, pruned_loss=0.07758, ctc_loss=0.1559, cr_loss=0.4284, over 6752916.77 frames. ], batch size: 95, lr: 8.60e-03, grad_scale: 32.0 2024-09-17 18:44:30,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=246596.0, ans=0.125 2024-09-17 18:44:32,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=246596.0, ans=0.0 2024-09-17 18:44:37,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.55 vs. limit=15.0 2024-09-17 18:44:56,291 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.04 vs. limit=15.0 2024-09-17 18:45:00,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=246689.33333333334, ans=0.0 2024-09-17 18:45:30,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=246782.66666666666, ans=0.07 2024-09-17 18:45:38,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=246782.66666666666, ans=0.0 2024-09-17 18:45:46,398 INFO [train.py:1198] (1/2) Epoch 14, batch 2500, loss[loss=0.2499, simple_loss=0.2969, pruned_loss=0.07734, ctc_loss=0.1527, cr_loss=0.4401, over 34440.00 frames. ], tot_loss[loss=0.2454, simple_loss=0.2878, pruned_loss=0.07738, ctc_loss=0.1555, cr_loss=0.4282, over 6763122.29 frames. ], batch size: 100, lr: 8.60e-03, grad_scale: 32.0 2024-09-17 18:45:50,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=246829.33333333334, ans=0.2 2024-09-17 18:46:11,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=246876.0, ans=0.0 2024-09-17 18:46:13,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=246876.0, ans=0.2 2024-09-17 18:46:16,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=246876.0, ans=0.0 2024-09-17 18:46:21,211 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.442e+02 2.727e+02 3.671e+02 6.105e+02, threshold=5.453e+02, percent-clipped=0.0 2024-09-17 18:46:28,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=246922.66666666666, ans=0.125 2024-09-17 18:46:29,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=246922.66666666666, ans=0.125 2024-09-17 18:46:33,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=246922.66666666666, ans=0.0 2024-09-17 18:46:38,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=246969.33333333334, ans=0.125 2024-09-17 18:46:56,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=247016.0, ans=0.125 2024-09-17 18:46:59,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=247016.0, ans=0.125 2024-09-17 18:47:11,492 INFO [train.py:1198] (1/2) Epoch 14, batch 2550, loss[loss=0.2108, simple_loss=0.2568, pruned_loss=0.06172, ctc_loss=0.1308, cr_loss=0.3802, over 34167.00 frames. ], tot_loss[loss=0.2452, simple_loss=0.2875, pruned_loss=0.0773, ctc_loss=0.1555, cr_loss=0.4285, over 6766842.12 frames. ], batch size: 78, lr: 8.59e-03, grad_scale: 32.0 2024-09-17 18:47:36,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=247109.33333333334, ans=0.0 2024-09-17 18:48:04,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.20 vs. limit=12.0 2024-09-17 18:48:18,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.49 vs. limit=15.0 2024-09-17 18:48:33,859 INFO [train.py:1198] (1/2) Epoch 14, batch 2600, loss[loss=0.2387, simple_loss=0.2802, pruned_loss=0.07501, ctc_loss=0.1524, cr_loss=0.4176, over 34733.00 frames. ], tot_loss[loss=0.2455, simple_loss=0.2879, pruned_loss=0.07736, ctc_loss=0.1557, cr_loss=0.4286, over 6761794.14 frames. ], batch size: 92, lr: 8.59e-03, grad_scale: 32.0 2024-09-17 18:48:51,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.62 vs. limit=15.0 2024-09-17 18:48:52,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=247342.66666666666, ans=0.0 2024-09-17 18:49:08,922 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.091e+02 2.393e+02 2.761e+02 3.594e+02 6.410e+02, threshold=5.521e+02, percent-clipped=5.0 2024-09-17 18:49:12,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=247389.33333333334, ans=0.2 2024-09-17 18:49:56,359 INFO [train.py:1198] (1/2) Epoch 14, batch 2650, loss[loss=0.2727, simple_loss=0.3116, pruned_loss=0.09006, ctc_loss=0.1738, cr_loss=0.4732, over 34259.00 frames. ], tot_loss[loss=0.2455, simple_loss=0.2881, pruned_loss=0.0773, ctc_loss=0.1555, cr_loss=0.4288, over 6769520.74 frames. ], batch size: 117, lr: 8.59e-03, grad_scale: 32.0 2024-09-17 18:50:00,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=247529.33333333334, ans=0.0 2024-09-17 18:50:13,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=5.06 vs. limit=12.0 2024-09-17 18:50:21,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=247576.0, ans=0.0 2024-09-17 18:50:30,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.67 vs. limit=22.5 2024-09-17 18:50:56,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=247669.33333333334, ans=0.0 2024-09-17 18:51:10,214 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-09-17 18:51:12,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=247716.0, ans=0.1 2024-09-17 18:51:16,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=247716.0, ans=0.0 2024-09-17 18:51:19,374 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 18:51:22,210 INFO [train.py:1198] (1/2) Epoch 14, batch 2700, loss[loss=0.2516, simple_loss=0.2981, pruned_loss=0.07811, ctc_loss=0.1569, cr_loss=0.4355, over 34661.00 frames. ], tot_loss[loss=0.246, simple_loss=0.2885, pruned_loss=0.07761, ctc_loss=0.1559, cr_loss=0.4289, over 6763837.32 frames. ], batch size: 102, lr: 8.58e-03, grad_scale: 32.0 2024-09-17 18:51:24,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=247762.66666666666, ans=0.125 2024-09-17 18:51:25,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=247762.66666666666, ans=0.125 2024-09-17 18:51:37,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=247809.33333333334, ans=0.05 2024-09-17 18:51:50,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=247809.33333333334, ans=0.07 2024-09-17 18:51:54,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2024-09-17 18:51:56,685 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.542e+02 2.850e+02 3.808e+02 7.312e+02, threshold=5.700e+02, percent-clipped=11.0 2024-09-17 18:52:02,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=247856.0, ans=0.0 2024-09-17 18:52:05,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=247856.0, ans=0.125 2024-09-17 18:52:31,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.45 vs. limit=12.0 2024-09-17 18:52:45,058 INFO [train.py:1198] (1/2) Epoch 14, batch 2750, loss[loss=0.2391, simple_loss=0.2793, pruned_loss=0.07602, ctc_loss=0.152, cr_loss=0.4137, over 34631.00 frames. ], tot_loss[loss=0.2446, simple_loss=0.2872, pruned_loss=0.07703, ctc_loss=0.1548, cr_loss=0.4262, over 6760967.46 frames. ], batch size: 88, lr: 8.58e-03, grad_scale: 32.0 2024-09-17 18:52:48,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=247996.0, ans=0.0 2024-09-17 18:53:13,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=248042.66666666666, ans=0.125 2024-09-17 18:53:15,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=248042.66666666666, ans=22.5 2024-09-17 18:53:22,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=248089.33333333334, ans=0.025 2024-09-17 18:54:07,489 INFO [train.py:1198] (1/2) Epoch 14, batch 2800, loss[loss=0.2853, simple_loss=0.3154, pruned_loss=0.09878, ctc_loss=0.1991, cr_loss=0.4477, over 23441.00 frames. ], tot_loss[loss=0.2452, simple_loss=0.2876, pruned_loss=0.07734, ctc_loss=0.1554, cr_loss=0.427, over 6740415.36 frames. ], batch size: 245, lr: 8.57e-03, grad_scale: 32.0 2024-09-17 18:54:14,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=248229.33333333334, ans=0.0 2024-09-17 18:54:21,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=248229.33333333334, ans=0.125 2024-09-17 18:54:30,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.19 vs. limit=15.0 2024-09-17 18:54:46,327 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.603e+02 2.966e+02 3.333e+02 5.957e+02, threshold=5.933e+02, percent-clipped=2.0 2024-09-17 18:55:18,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=15.0 2024-09-17 18:55:22,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=248416.0, ans=0.125 2024-09-17 18:55:32,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=248462.66666666666, ans=0.0 2024-09-17 18:55:34,153 INFO [train.py:1198] (1/2) Epoch 14, batch 2850, loss[loss=0.2437, simple_loss=0.2847, pruned_loss=0.07758, ctc_loss=0.1551, cr_loss=0.4136, over 34490.00 frames. ], tot_loss[loss=0.2463, simple_loss=0.2885, pruned_loss=0.07783, ctc_loss=0.1563, cr_loss=0.4282, over 6725916.67 frames. ], batch size: 90, lr: 8.57e-03, grad_scale: 32.0 2024-09-17 18:56:07,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=248556.0, ans=0.125 2024-09-17 18:56:28,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=248602.66666666666, ans=0.125 2024-09-17 18:56:56,781 INFO [train.py:1198] (1/2) Epoch 14, batch 2900, loss[loss=0.2504, simple_loss=0.2928, pruned_loss=0.07945, ctc_loss=0.1604, cr_loss=0.4241, over 34546.00 frames. ], tot_loss[loss=0.2469, simple_loss=0.2894, pruned_loss=0.07794, ctc_loss=0.1565, cr_loss=0.4297, over 6756058.23 frames. ], batch size: 94, lr: 8.57e-03, grad_scale: 32.0 2024-09-17 18:57:32,955 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.052e+02 2.415e+02 3.105e+02 3.911e+02 6.555e+02, threshold=6.210e+02, percent-clipped=3.0 2024-09-17 18:57:59,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=248836.0, ans=0.125 2024-09-17 18:58:09,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=248882.66666666666, ans=0.125 2024-09-17 18:58:11,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=248882.66666666666, ans=0.1 2024-09-17 18:58:11,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=248882.66666666666, ans=0.125 2024-09-17 18:58:11,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=248882.66666666666, ans=0.2 2024-09-17 18:58:20,930 INFO [train.py:1198] (1/2) Epoch 14, batch 2950, loss[loss=0.2214, simple_loss=0.2646, pruned_loss=0.06773, ctc_loss=0.1353, cr_loss=0.3939, over 34598.00 frames. ], tot_loss[loss=0.2457, simple_loss=0.2881, pruned_loss=0.07751, ctc_loss=0.1559, cr_loss=0.4283, over 6752230.15 frames. ], batch size: 88, lr: 8.56e-03, grad_scale: 16.0 2024-09-17 18:58:26,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=248929.33333333334, ans=0.125 2024-09-17 18:58:29,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=248929.33333333334, ans=0.025 2024-09-17 18:58:55,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=15.0 2024-09-17 18:58:59,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=249022.66666666666, ans=0.0 2024-09-17 18:59:11,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=249069.33333333334, ans=0.125 2024-09-17 18:59:27,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=249116.0, ans=0.5 2024-09-17 18:59:34,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=249116.0, ans=0.025 2024-09-17 18:59:45,547 INFO [train.py:1198] (1/2) Epoch 14, batch 3000, loss[loss=0.2381, simple_loss=0.2851, pruned_loss=0.07235, ctc_loss=0.1488, cr_loss=0.4173, over 34509.00 frames. ], tot_loss[loss=0.2456, simple_loss=0.2881, pruned_loss=0.07737, ctc_loss=0.1559, cr_loss=0.4285, over 6753559.76 frames. ], batch size: 94, lr: 8.56e-03, grad_scale: 16.0 2024-09-17 18:59:45,547 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 19:00:02,426 INFO [train.py:1230] (1/2) Epoch 14, validation: loss=0.151, simple_loss=0.2493, pruned_loss=0.02192, ctc_loss=0.04432, cr_loss=1.686e-14, over 944034.00 frames. 2024-09-17 19:00:02,427 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 19:00:15,079 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.34 vs. limit=22.5 2024-09-17 19:00:27,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=249209.33333333334, ans=0.2 2024-09-17 19:00:32,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=249209.33333333334, ans=0.125 2024-09-17 19:00:33,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=249256.0, ans=0.0 2024-09-17 19:00:40,126 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.433e+02 3.151e+02 4.032e+02 9.838e+02, threshold=6.303e+02, percent-clipped=6.0 2024-09-17 19:00:56,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=249302.66666666666, ans=0.05 2024-09-17 19:01:03,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=249302.66666666666, ans=0.0 2024-09-17 19:01:22,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=249396.0, ans=0.125 2024-09-17 19:01:24,197 INFO [train.py:1198] (1/2) Epoch 14, batch 3050, loss[loss=0.2329, simple_loss=0.2742, pruned_loss=0.0727, ctc_loss=0.1464, cr_loss=0.42, over 34572.00 frames. ], tot_loss[loss=0.246, simple_loss=0.2887, pruned_loss=0.07745, ctc_loss=0.1561, cr_loss=0.4288, over 6743699.57 frames. ], batch size: 89, lr: 8.55e-03, grad_scale: 8.0 2024-09-17 19:01:49,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.32 vs. limit=10.0 2024-09-17 19:02:28,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=249582.66666666666, ans=0.125 2024-09-17 19:02:32,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=249582.66666666666, ans=0.1 2024-09-17 19:02:32,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.26 vs. limit=22.5 2024-09-17 19:02:46,484 INFO [train.py:1198] (1/2) Epoch 14, batch 3100, loss[loss=0.2598, simple_loss=0.3011, pruned_loss=0.08313, ctc_loss=0.1688, cr_loss=0.4616, over 34219.00 frames. ], tot_loss[loss=0.2454, simple_loss=0.288, pruned_loss=0.07726, ctc_loss=0.1558, cr_loss=0.428, over 6743327.77 frames. ], batch size: 117, lr: 8.55e-03, grad_scale: 8.0 2024-09-17 19:03:07,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=249676.0, ans=0.2 2024-09-17 19:03:12,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=249676.0, ans=0.125 2024-09-17 19:03:25,719 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.382e+02 2.848e+02 3.759e+02 7.725e+02, threshold=5.696e+02, percent-clipped=5.0 2024-09-17 19:03:32,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=249722.66666666666, ans=0.125 2024-09-17 19:03:45,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=249769.33333333334, ans=0.0 2024-09-17 19:03:58,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=249816.0, ans=0.0 2024-09-17 19:04:06,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=249816.0, ans=0.125 2024-09-17 19:04:09,688 INFO [train.py:1198] (1/2) Epoch 14, batch 3150, loss[loss=0.2681, simple_loss=0.3107, pruned_loss=0.08613, ctc_loss=0.1725, cr_loss=0.4668, over 33804.00 frames. ], tot_loss[loss=0.2454, simple_loss=0.2879, pruned_loss=0.0773, ctc_loss=0.1558, cr_loss=0.4278, over 6749485.75 frames. ], batch size: 122, lr: 8.55e-03, grad_scale: 8.0 2024-09-17 19:04:26,792 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.63 vs. limit=10.0 2024-09-17 19:04:29,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=249909.33333333334, ans=0.125 2024-09-17 19:04:29,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=22.5 2024-09-17 19:04:41,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.67 vs. limit=6.0 2024-09-17 19:04:42,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2024-09-17 19:04:58,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=250002.66666666666, ans=0.1 2024-09-17 19:04:58,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=250002.66666666666, ans=0.0 2024-09-17 19:04:58,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=250002.66666666666, ans=0.125 2024-09-17 19:04:59,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=250002.66666666666, ans=0.1 2024-09-17 19:05:00,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=250002.66666666666, ans=0.0 2024-09-17 19:05:13,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=250049.33333333334, ans=0.025 2024-09-17 19:05:29,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.11 vs. limit=22.5 2024-09-17 19:05:30,567 INFO [train.py:1198] (1/2) Epoch 14, batch 3200, loss[loss=0.2485, simple_loss=0.2878, pruned_loss=0.08018, ctc_loss=0.1553, cr_loss=0.4462, over 34538.00 frames. ], tot_loss[loss=0.2452, simple_loss=0.2878, pruned_loss=0.07721, ctc_loss=0.1555, cr_loss=0.4277, over 6762062.90 frames. ], batch size: 94, lr: 8.54e-03, grad_scale: 16.0 2024-09-17 19:05:44,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=250096.0, ans=0.035 2024-09-17 19:05:45,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=250142.66666666666, ans=0.2 2024-09-17 19:05:49,507 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2024-09-17 19:06:00,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=250142.66666666666, ans=0.125 2024-09-17 19:06:08,457 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.096e+02 2.801e+02 3.470e+02 4.357e+02 6.933e+02, threshold=6.941e+02, percent-clipped=9.0 2024-09-17 19:06:25,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=250236.0, ans=0.0 2024-09-17 19:06:29,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=250236.0, ans=0.5 2024-09-17 19:06:31,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=250236.0, ans=0.125 2024-09-17 19:06:38,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=250282.66666666666, ans=15.0 2024-09-17 19:06:47,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=250282.66666666666, ans=0.125 2024-09-17 19:06:52,187 INFO [train.py:1198] (1/2) Epoch 14, batch 3250, loss[loss=0.2524, simple_loss=0.2956, pruned_loss=0.07937, ctc_loss=0.1623, cr_loss=0.4488, over 34636.00 frames. ], tot_loss[loss=0.2449, simple_loss=0.2876, pruned_loss=0.07701, ctc_loss=0.1551, cr_loss=0.4277, over 6771315.14 frames. ], batch size: 98, lr: 8.54e-03, grad_scale: 16.0 2024-09-17 19:07:15,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.57 vs. limit=15.0 2024-09-17 19:07:27,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=250422.66666666666, ans=0.125 2024-09-17 19:07:30,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=250422.66666666666, ans=0.0 2024-09-17 19:07:35,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=250422.66666666666, ans=0.125 2024-09-17 19:07:42,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=250469.33333333334, ans=0.1 2024-09-17 19:07:43,993 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.009e-02 2024-09-17 19:07:48,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=250469.33333333334, ans=0.125 2024-09-17 19:07:49,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.56 vs. limit=12.0 2024-09-17 19:08:02,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=250516.0, ans=0.125 2024-09-17 19:08:05,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=250516.0, ans=0.125 2024-09-17 19:08:10,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=250562.66666666666, ans=0.2 2024-09-17 19:08:12,169 INFO [train.py:1198] (1/2) Epoch 14, batch 3300, loss[loss=0.2523, simple_loss=0.3009, pruned_loss=0.0773, ctc_loss=0.1572, cr_loss=0.4404, over 33181.00 frames. ], tot_loss[loss=0.2434, simple_loss=0.2862, pruned_loss=0.07638, ctc_loss=0.154, cr_loss=0.4255, over 6769513.19 frames. ], batch size: 130, lr: 8.53e-03, grad_scale: 16.0 2024-09-17 19:08:14,160 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:08:20,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=250562.66666666666, ans=0.125 2024-09-17 19:08:43,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=250609.33333333334, ans=0.125 2024-09-17 19:08:50,833 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.696e+02 3.188e+02 3.898e+02 1.284e+03, threshold=6.376e+02, percent-clipped=1.0 2024-09-17 19:09:03,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=250702.66666666666, ans=0.1 2024-09-17 19:09:07,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=250702.66666666666, ans=0.125 2024-09-17 19:09:15,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=250702.66666666666, ans=0.0 2024-09-17 19:09:16,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=250702.66666666666, ans=0.0 2024-09-17 19:09:19,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2024-09-17 19:09:28,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2024-09-17 19:09:35,742 INFO [train.py:1198] (1/2) Epoch 14, batch 3350, loss[loss=0.2577, simple_loss=0.3042, pruned_loss=0.07998, ctc_loss=0.1644, cr_loss=0.4595, over 33794.00 frames. ], tot_loss[loss=0.2442, simple_loss=0.287, pruned_loss=0.07675, ctc_loss=0.1547, cr_loss=0.4267, over 6743251.17 frames. ], batch size: 122, lr: 8.53e-03, grad_scale: 16.0 2024-09-17 19:10:02,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=250842.66666666666, ans=0.125 2024-09-17 19:10:11,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=250889.33333333334, ans=0.1 2024-09-17 19:10:11,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=250889.33333333334, ans=0.2 2024-09-17 19:10:17,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.93 vs. limit=22.5 2024-09-17 19:10:37,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=250936.0, ans=0.0 2024-09-17 19:10:38,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=250982.66666666666, ans=0.2 2024-09-17 19:10:41,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-17 19:10:55,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=251029.33333333334, ans=0.025 2024-09-17 19:10:56,397 INFO [train.py:1198] (1/2) Epoch 14, batch 3400, loss[loss=0.2027, simple_loss=0.2472, pruned_loss=0.05965, ctc_loss=0.1237, cr_loss=0.3542, over 34137.00 frames. ], tot_loss[loss=0.2446, simple_loss=0.2871, pruned_loss=0.07704, ctc_loss=0.1552, cr_loss=0.4268, over 6732166.07 frames. ], batch size: 78, lr: 8.53e-03, grad_scale: 16.0 2024-09-17 19:11:03,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.53 vs. limit=10.0 2024-09-17 19:11:09,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=251029.33333333334, ans=0.2 2024-09-17 19:11:13,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=251076.0, ans=0.0 2024-09-17 19:11:18,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=251076.0, ans=0.0 2024-09-17 19:11:25,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2024-09-17 19:11:27,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=251122.66666666666, ans=0.0 2024-09-17 19:11:30,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.18 vs. limit=15.0 2024-09-17 19:11:31,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=251122.66666666666, ans=0.125 2024-09-17 19:11:33,895 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.084e+02 2.470e+02 2.923e+02 3.626e+02 8.995e+02, threshold=5.845e+02, percent-clipped=1.0 2024-09-17 19:11:35,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=251122.66666666666, ans=0.0 2024-09-17 19:11:35,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=251122.66666666666, ans=0.125 2024-09-17 19:11:44,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=251169.33333333334, ans=10.0 2024-09-17 19:11:59,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.63 vs. limit=22.5 2024-09-17 19:12:16,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=251262.66666666666, ans=0.125 2024-09-17 19:12:17,425 INFO [train.py:1198] (1/2) Epoch 14, batch 3450, loss[loss=0.2603, simple_loss=0.3045, pruned_loss=0.08266, ctc_loss=0.1643, cr_loss=0.4505, over 33132.00 frames. ], tot_loss[loss=0.2451, simple_loss=0.2876, pruned_loss=0.07717, ctc_loss=0.1554, cr_loss=0.4277, over 6745108.75 frames. ], batch size: 130, lr: 8.52e-03, grad_scale: 16.0 2024-09-17 19:12:24,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.64 vs. limit=6.0 2024-09-17 19:12:46,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=251309.33333333334, ans=0.125 2024-09-17 19:13:00,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=251356.0, ans=0.125 2024-09-17 19:13:06,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=251402.66666666666, ans=0.2 2024-09-17 19:13:11,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=251402.66666666666, ans=0.125 2024-09-17 19:13:14,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=251402.66666666666, ans=0.0 2024-09-17 19:13:19,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=251402.66666666666, ans=0.0 2024-09-17 19:13:24,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=251449.33333333334, ans=0.2 2024-09-17 19:13:27,524 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:13:29,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=251449.33333333334, ans=0.025 2024-09-17 19:13:32,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=251449.33333333334, ans=0.2 2024-09-17 19:13:37,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=251496.0, ans=0.125 2024-09-17 19:13:38,335 INFO [train.py:1198] (1/2) Epoch 14, batch 3500, loss[loss=0.2055, simple_loss=0.2579, pruned_loss=0.05713, ctc_loss=0.1201, cr_loss=0.3677, over 34465.00 frames. ], tot_loss[loss=0.2447, simple_loss=0.2872, pruned_loss=0.07704, ctc_loss=0.1551, cr_loss=0.4275, over 6747842.52 frames. ], batch size: 85, lr: 8.52e-03, grad_scale: 16.0 2024-09-17 19:14:02,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=251542.66666666666, ans=0.0 2024-09-17 19:14:12,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=251589.33333333334, ans=0.0 2024-09-17 19:14:16,896 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.586e+02 3.061e+02 4.155e+02 5.668e+02, threshold=6.123e+02, percent-clipped=0.0 2024-09-17 19:14:18,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=251589.33333333334, ans=0.2 2024-09-17 19:14:18,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=251589.33333333334, ans=0.0 2024-09-17 19:14:22,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=251589.33333333334, ans=0.1 2024-09-17 19:14:38,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_ff2.min_abs, batch_count=251636.0, ans=0.1 2024-09-17 19:14:48,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.17 vs. limit=15.0 2024-09-17 19:14:57,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=251682.66666666666, ans=0.125 2024-09-17 19:14:57,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=16.65 vs. limit=15.0 2024-09-17 19:15:00,265 INFO [train.py:1198] (1/2) Epoch 14, batch 3550, loss[loss=0.2489, simple_loss=0.2897, pruned_loss=0.07948, ctc_loss=0.1583, cr_loss=0.4369, over 34364.00 frames. ], tot_loss[loss=0.245, simple_loss=0.2875, pruned_loss=0.07714, ctc_loss=0.1553, cr_loss=0.4279, over 6758208.18 frames. ], batch size: 103, lr: 8.51e-03, grad_scale: 16.0 2024-09-17 19:15:13,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2024-09-17 19:15:16,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=251776.0, ans=0.125 2024-09-17 19:15:16,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=251776.0, ans=0.125 2024-09-17 19:15:27,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=251776.0, ans=0.0 2024-09-17 19:15:43,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=251822.66666666666, ans=0.125 2024-09-17 19:15:58,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=251869.33333333334, ans=0.125 2024-09-17 19:16:20,391 INFO [train.py:1198] (1/2) Epoch 14, batch 3600, loss[loss=0.2417, simple_loss=0.2825, pruned_loss=0.07635, ctc_loss=0.155, cr_loss=0.4287, over 34482.00 frames. ], tot_loss[loss=0.2451, simple_loss=0.2876, pruned_loss=0.07716, ctc_loss=0.1552, cr_loss=0.4278, over 6767282.23 frames. ], batch size: 90, lr: 8.51e-03, grad_scale: 32.0 2024-09-17 19:16:23,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=251962.66666666666, ans=0.5 2024-09-17 19:16:58,104 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.554e+02 3.253e+02 4.167e+02 6.183e+02, threshold=6.506e+02, percent-clipped=1.0 2024-09-17 19:17:15,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=252102.66666666666, ans=0.0 2024-09-17 19:17:17,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=252102.66666666666, ans=0.0 2024-09-17 19:17:25,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=252149.33333333334, ans=0.1 2024-09-17 19:17:37,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=252149.33333333334, ans=0.025 2024-09-17 19:17:42,278 INFO [train.py:1198] (1/2) Epoch 14, batch 3650, loss[loss=0.266, simple_loss=0.309, pruned_loss=0.08533, ctc_loss=0.1689, cr_loss=0.4648, over 34442.00 frames. ], tot_loss[loss=0.2444, simple_loss=0.287, pruned_loss=0.07687, ctc_loss=0.1547, cr_loss=0.4274, over 6769792.32 frames. ], batch size: 110, lr: 8.51e-03, grad_scale: 32.0 2024-09-17 19:17:53,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=252196.0, ans=0.04949747468305833 2024-09-17 19:17:55,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=252196.0, ans=0.0 2024-09-17 19:18:22,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=252289.33333333334, ans=0.2 2024-09-17 19:18:26,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.09 vs. limit=22.5 2024-09-17 19:18:32,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=252336.0, ans=0.125 2024-09-17 19:18:36,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=252336.0, ans=0.0 2024-09-17 19:18:51,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=252382.66666666666, ans=0.025 2024-09-17 19:18:58,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=252382.66666666666, ans=0.125 2024-09-17 19:19:01,830 INFO [train.py:1198] (1/2) Epoch 14, batch 3700, loss[loss=0.2577, simple_loss=0.3027, pruned_loss=0.08149, ctc_loss=0.1632, cr_loss=0.4283, over 34619.00 frames. ], tot_loss[loss=0.2447, simple_loss=0.2875, pruned_loss=0.07695, ctc_loss=0.1549, cr_loss=0.4276, over 6784112.42 frames. ], batch size: 102, lr: 8.50e-03, grad_scale: 32.0 2024-09-17 19:19:28,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=252476.0, ans=0.1 2024-09-17 19:19:35,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.39 vs. limit=22.5 2024-09-17 19:19:38,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2024-09-17 19:19:39,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=252522.66666666666, ans=0.025 2024-09-17 19:19:40,505 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.061e+02 2.544e+02 3.147e+02 3.838e+02 8.740e+02, threshold=6.294e+02, percent-clipped=1.0 2024-09-17 19:20:09,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=252616.0, ans=0.125 2024-09-17 19:20:10,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=252616.0, ans=0.125 2024-09-17 19:20:12,939 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2024-09-17 19:20:22,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=252662.66666666666, ans=0.0 2024-09-17 19:20:23,337 INFO [train.py:1198] (1/2) Epoch 14, batch 3750, loss[loss=0.2646, simple_loss=0.3099, pruned_loss=0.08356, ctc_loss=0.1679, cr_loss=0.4649, over 34398.00 frames. ], tot_loss[loss=0.2477, simple_loss=0.2906, pruned_loss=0.07812, ctc_loss=0.157, cr_loss=0.4319, over 6785356.61 frames. ], batch size: 113, lr: 8.50e-03, grad_scale: 16.0 2024-09-17 19:20:36,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=252662.66666666666, ans=0.125 2024-09-17 19:21:18,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.36 vs. limit=15.0 2024-09-17 19:21:32,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=252849.33333333334, ans=0.2 2024-09-17 19:21:45,456 INFO [train.py:1198] (1/2) Epoch 14, batch 3800, loss[loss=0.2762, simple_loss=0.3062, pruned_loss=0.0946, ctc_loss=0.1888, cr_loss=0.4828, over 29950.00 frames. ], tot_loss[loss=0.2515, simple_loss=0.2935, pruned_loss=0.07993, ctc_loss=0.1605, cr_loss=0.4367, over 6676573.02 frames. ], batch size: 176, lr: 8.50e-03, grad_scale: 16.0 2024-09-17 19:21:45,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=252896.0, ans=0.125 2024-09-17 19:22:05,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2024-09-17 19:22:18,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=252989.33333333334, ans=0.0 2024-09-17 19:22:18,363 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:22:19,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=252989.33333333334, ans=0.125 2024-09-17 19:22:23,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=252989.33333333334, ans=0.1 2024-09-17 19:22:24,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=252989.33333333334, ans=0.125 2024-09-17 19:22:24,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=252989.33333333334, ans=0.125 2024-09-17 19:22:26,026 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.418e+02 2.721e+02 3.132e+02 6.365e+02, threshold=5.443e+02, percent-clipped=1.0 2024-09-17 19:22:36,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=253036.0, ans=0.1 2024-09-17 19:22:41,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=253036.0, ans=0.1 2024-09-17 19:22:43,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=253036.0, ans=0.025 2024-09-17 19:23:08,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=253129.33333333334, ans=0.025 2024-09-17 19:23:09,752 INFO [train.py:1198] (1/2) Epoch 14, batch 3850, loss[loss=0.2711, simple_loss=0.3045, pruned_loss=0.09128, ctc_loss=0.1877, cr_loss=0.4381, over 23709.00 frames. ], tot_loss[loss=0.2572, simple_loss=0.2969, pruned_loss=0.08323, ctc_loss=0.1671, cr_loss=0.4398, over 6248245.18 frames. ], batch size: 245, lr: 8.49e-03, grad_scale: 16.0 2024-09-17 19:23:23,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=253129.33333333334, ans=0.05 2024-09-17 19:23:26,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=253176.0, ans=0.0 2024-09-17 19:23:34,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=253176.0, ans=0.125 2024-09-17 19:23:38,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=253176.0, ans=0.2 2024-09-17 19:23:43,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=253222.66666666666, ans=0.2 2024-09-17 19:23:45,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=253222.66666666666, ans=0.07 2024-09-17 19:24:43,802 INFO [train.py:1198] (1/2) Epoch 15, batch 0, loss[loss=0.2153, simple_loss=0.2612, pruned_loss=0.06428, ctc_loss=0.1313, cr_loss=0.3644, over 34474.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.2612, pruned_loss=0.06428, ctc_loss=0.1313, cr_loss=0.3644, over 34474.00 frames. ], batch size: 85, lr: 8.20e-03, grad_scale: 32.0 2024-09-17 19:24:43,803 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 19:25:00,614 INFO [train.py:1230] (1/2) Epoch 15, validation: loss=0.1522, simple_loss=0.2513, pruned_loss=0.02204, ctc_loss=0.04464, cr_loss=1.632e-14, over 944034.00 frames. 2024-09-17 19:25:00,615 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 19:25:00,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=253255.33333333334, ans=0.0 2024-09-17 19:25:06,142 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.39 vs. limit=15.0 2024-09-17 19:25:09,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=253255.33333333334, ans=0.2 2024-09-17 19:25:14,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.37 vs. limit=10.0 2024-09-17 19:25:20,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=253302.0, ans=0.07 2024-09-17 19:25:24,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=253302.0, ans=0.0 2024-09-17 19:25:42,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=253348.66666666666, ans=0.125 2024-09-17 19:25:57,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=253395.33333333334, ans=0.0 2024-09-17 19:26:14,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=253442.0, ans=0.1 2024-09-17 19:26:20,301 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.049e+02 2.566e+02 2.766e+02 3.151e+02 5.957e+02, threshold=5.532e+02, percent-clipped=1.0 2024-09-17 19:26:25,282 INFO [train.py:1198] (1/2) Epoch 15, batch 50, loss[loss=0.2181, simple_loss=0.2586, pruned_loss=0.06721, ctc_loss=0.1376, cr_loss=0.391, over 34458.00 frames. ], tot_loss[loss=0.2462, simple_loss=0.2886, pruned_loss=0.07763, ctc_loss=0.1565, cr_loss=0.43, over 1481682.26 frames. ], batch size: 82, lr: 8.20e-03, grad_scale: 32.0 2024-09-17 19:26:27,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.52 vs. limit=22.5 2024-09-17 19:26:34,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=253488.66666666666, ans=0.125 2024-09-17 19:27:22,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.47 vs. limit=15.0 2024-09-17 19:27:49,203 INFO [train.py:1198] (1/2) Epoch 15, batch 100, loss[loss=0.2386, simple_loss=0.2816, pruned_loss=0.07421, ctc_loss=0.1503, cr_loss=0.4258, over 34587.00 frames. ], tot_loss[loss=0.248, simple_loss=0.2904, pruned_loss=0.07837, ctc_loss=0.1576, cr_loss=0.4332, over 2631360.51 frames. ], batch size: 89, lr: 8.19e-03, grad_scale: 32.0 2024-09-17 19:27:51,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=253722.0, ans=0.125 2024-09-17 19:27:57,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=253722.0, ans=0.125 2024-09-17 19:28:08,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=253768.66666666666, ans=0.1 2024-09-17 19:28:12,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=253768.66666666666, ans=0.09899494936611666 2024-09-17 19:28:12,423 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:28:17,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=253768.66666666666, ans=0.1 2024-09-17 19:28:56,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=253908.66666666666, ans=0.125 2024-09-17 19:29:04,534 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:29:05,788 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.409e+02 2.765e+02 3.175e+02 5.511e+02, threshold=5.530e+02, percent-clipped=0.0 2024-09-17 19:29:09,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.99 vs. limit=15.0 2024-09-17 19:29:10,704 INFO [train.py:1198] (1/2) Epoch 15, batch 150, loss[loss=0.2174, simple_loss=0.2596, pruned_loss=0.06617, ctc_loss=0.1349, cr_loss=0.3976, over 34522.00 frames. ], tot_loss[loss=0.2438, simple_loss=0.2872, pruned_loss=0.07629, ctc_loss=0.1541, cr_loss=0.4268, over 3558351.41 frames. ], batch size: 82, lr: 8.19e-03, grad_scale: 32.0 2024-09-17 19:29:29,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=254002.0, ans=0.125 2024-09-17 19:29:46,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=254048.66666666666, ans=0.125 2024-09-17 19:30:01,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.21 vs. limit=10.0 2024-09-17 19:30:12,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=254095.33333333334, ans=0.0 2024-09-17 19:30:22,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=254142.0, ans=0.0 2024-09-17 19:30:23,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=254142.0, ans=0.0 2024-09-17 19:30:30,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=254142.0, ans=0.0 2024-09-17 19:30:35,116 INFO [train.py:1198] (1/2) Epoch 15, batch 200, loss[loss=0.2643, simple_loss=0.3043, pruned_loss=0.08523, ctc_loss=0.1735, cr_loss=0.4803, over 31684.00 frames. ], tot_loss[loss=0.2427, simple_loss=0.2861, pruned_loss=0.07581, ctc_loss=0.1533, cr_loss=0.4261, over 4273266.71 frames. ], batch size: 145, lr: 8.18e-03, grad_scale: 32.0 2024-09-17 19:30:38,850 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:30:50,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2024-09-17 19:30:53,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=254235.33333333334, ans=0.025 2024-09-17 19:30:55,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=254235.33333333334, ans=0.125 2024-09-17 19:31:05,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=254235.33333333334, ans=0.95 2024-09-17 19:31:08,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=254282.0, ans=0.025 2024-09-17 19:31:43,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=254375.33333333334, ans=0.125 2024-09-17 19:31:45,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.45 vs. limit=15.0 2024-09-17 19:31:48,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=254375.33333333334, ans=0.0 2024-09-17 19:31:54,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.987e+02 2.671e+02 3.358e+02 4.403e+02 6.368e+02, threshold=6.716e+02, percent-clipped=9.0 2024-09-17 19:31:59,463 INFO [train.py:1198] (1/2) Epoch 15, batch 250, loss[loss=0.2645, simple_loss=0.31, pruned_loss=0.083, ctc_loss=0.1713, cr_loss=0.4684, over 34221.00 frames. ], tot_loss[loss=0.2425, simple_loss=0.286, pruned_loss=0.07571, ctc_loss=0.1531, cr_loss=0.426, over 4834642.13 frames. ], batch size: 117, lr: 8.18e-03, grad_scale: 32.0 2024-09-17 19:32:09,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=254422.0, ans=0.2 2024-09-17 19:32:26,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=254468.66666666666, ans=0.2 2024-09-17 19:32:44,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=254515.33333333334, ans=0.2 2024-09-17 19:32:54,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=254562.0, ans=0.125 2024-09-17 19:33:15,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=254608.66666666666, ans=0.2 2024-09-17 19:33:21,234 INFO [train.py:1198] (1/2) Epoch 15, batch 300, loss[loss=0.2594, simple_loss=0.3008, pruned_loss=0.08335, ctc_loss=0.1652, cr_loss=0.4533, over 34377.00 frames. ], tot_loss[loss=0.2422, simple_loss=0.2856, pruned_loss=0.07562, ctc_loss=0.1527, cr_loss=0.4255, over 5263892.60 frames. ], batch size: 107, lr: 8.18e-03, grad_scale: 32.0 2024-09-17 19:33:46,072 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.53 vs. limit=15.0 2024-09-17 19:33:58,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=254748.66666666666, ans=0.125 2024-09-17 19:34:28,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=254842.0, ans=0.5 2024-09-17 19:34:39,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=15.0 2024-09-17 19:34:41,718 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.075e+02 2.440e+02 3.039e+02 3.807e+02 6.376e+02, threshold=6.078e+02, percent-clipped=0.0 2024-09-17 19:34:46,795 INFO [train.py:1198] (1/2) Epoch 15, batch 350, loss[loss=0.2065, simple_loss=0.2543, pruned_loss=0.05976, ctc_loss=0.1229, cr_loss=0.3649, over 34273.00 frames. ], tot_loss[loss=0.2427, simple_loss=0.286, pruned_loss=0.07584, ctc_loss=0.1532, cr_loss=0.4264, over 5599672.24 frames. ], batch size: 83, lr: 8.17e-03, grad_scale: 32.0 2024-09-17 19:34:51,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=254888.66666666666, ans=0.0 2024-09-17 19:35:05,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=254935.33333333334, ans=0.0 2024-09-17 19:35:37,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=255028.66666666666, ans=0.2 2024-09-17 19:35:38,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=22.5 2024-09-17 19:35:49,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=255028.66666666666, ans=0.1 2024-09-17 19:35:54,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=255075.33333333334, ans=0.1 2024-09-17 19:35:55,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2024-09-17 19:36:10,446 INFO [train.py:1198] (1/2) Epoch 15, batch 400, loss[loss=0.2316, simple_loss=0.2827, pruned_loss=0.06797, ctc_loss=0.1399, cr_loss=0.4137, over 34415.00 frames. ], tot_loss[loss=0.2416, simple_loss=0.2852, pruned_loss=0.07527, ctc_loss=0.1522, cr_loss=0.4241, over 5866810.20 frames. ], batch size: 95, lr: 8.17e-03, grad_scale: 32.0 2024-09-17 19:36:10,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=255122.0, ans=0.125 2024-09-17 19:36:19,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=255122.0, ans=0.125 2024-09-17 19:36:48,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=255215.33333333334, ans=0.0 2024-09-17 19:37:02,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=255262.0, ans=0.0 2024-09-17 19:37:18,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=255308.66666666666, ans=0.125 2024-09-17 19:37:30,448 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.421e+02 2.734e+02 3.339e+02 6.280e+02, threshold=5.469e+02, percent-clipped=3.0 2024-09-17 19:37:35,350 INFO [train.py:1198] (1/2) Epoch 15, batch 450, loss[loss=0.254, simple_loss=0.2944, pruned_loss=0.08166, ctc_loss=0.1621, cr_loss=0.4456, over 34703.00 frames. ], tot_loss[loss=0.2416, simple_loss=0.2852, pruned_loss=0.07534, ctc_loss=0.1523, cr_loss=0.4242, over 6054960.23 frames. ], batch size: 97, lr: 8.17e-03, grad_scale: 32.0 2024-09-17 19:37:55,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=255402.0, ans=0.125 2024-09-17 19:38:01,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=255402.0, ans=0.1 2024-09-17 19:38:08,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_positive, batch_count=255448.66666666666, ans=0.05 2024-09-17 19:38:13,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=255448.66666666666, ans=0.5 2024-09-17 19:38:18,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=255448.66666666666, ans=0.125 2024-09-17 19:38:34,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=255495.33333333334, ans=15.0 2024-09-17 19:38:38,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=255495.33333333334, ans=0.2 2024-09-17 19:38:41,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=255542.0, ans=0.125 2024-09-17 19:38:59,857 INFO [train.py:1198] (1/2) Epoch 15, batch 500, loss[loss=0.2578, simple_loss=0.3008, pruned_loss=0.08194, ctc_loss=0.1648, cr_loss=0.4473, over 34442.00 frames. ], tot_loss[loss=0.2409, simple_loss=0.2845, pruned_loss=0.07503, ctc_loss=0.1516, cr_loss=0.4233, over 6222256.77 frames. ], batch size: 110, lr: 8.16e-03, grad_scale: 32.0 2024-09-17 19:39:11,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=255588.66666666666, ans=0.0 2024-09-17 19:39:13,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=255588.66666666666, ans=0.125 2024-09-17 19:39:42,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=255682.0, ans=0.2 2024-09-17 19:39:58,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=15.0 2024-09-17 19:39:59,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=255728.66666666666, ans=0.0 2024-09-17 19:40:12,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.58 vs. limit=15.0 2024-09-17 19:40:17,372 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.464e+02 3.026e+02 4.240e+02 6.810e+02, threshold=6.052e+02, percent-clipped=8.0 2024-09-17 19:40:21,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=255822.0, ans=0.125 2024-09-17 19:40:22,370 INFO [train.py:1198] (1/2) Epoch 15, batch 550, loss[loss=0.253, simple_loss=0.2961, pruned_loss=0.07985, ctc_loss=0.1629, cr_loss=0.4399, over 33750.00 frames. ], tot_loss[loss=0.2413, simple_loss=0.2848, pruned_loss=0.0752, ctc_loss=0.1519, cr_loss=0.4235, over 6329497.30 frames. ], batch size: 122, lr: 8.16e-03, grad_scale: 32.0 2024-09-17 19:40:30,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=255822.0, ans=0.0 2024-09-17 19:40:45,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=255868.66666666666, ans=0.125 2024-09-17 19:41:27,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=255962.0, ans=0.0 2024-09-17 19:41:44,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.24 vs. limit=15.0 2024-09-17 19:41:47,139 INFO [train.py:1198] (1/2) Epoch 15, batch 600, loss[loss=0.2618, simple_loss=0.3113, pruned_loss=0.08154, ctc_loss=0.1589, cr_loss=0.433, over 34152.00 frames. ], tot_loss[loss=0.2414, simple_loss=0.285, pruned_loss=0.07521, ctc_loss=0.1519, cr_loss=0.4243, over 6432014.80 frames. ], batch size: 117, lr: 8.16e-03, grad_scale: 16.0 2024-09-17 19:41:55,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=256055.33333333334, ans=0.0 2024-09-17 19:42:18,326 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:42:24,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=256148.66666666666, ans=0.1 2024-09-17 19:42:29,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.91 vs. limit=22.5 2024-09-17 19:42:44,926 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2024-09-17 19:43:07,203 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.452e+02 2.909e+02 3.965e+02 7.786e+02, threshold=5.818e+02, percent-clipped=6.0 2024-09-17 19:43:10,527 INFO [train.py:1198] (1/2) Epoch 15, batch 650, loss[loss=0.241, simple_loss=0.285, pruned_loss=0.07466, ctc_loss=0.1534, cr_loss=0.4228, over 34517.00 frames. ], tot_loss[loss=0.241, simple_loss=0.2846, pruned_loss=0.07506, ctc_loss=0.1516, cr_loss=0.4232, over 6523807.14 frames. ], batch size: 94, lr: 8.15e-03, grad_scale: 16.0 2024-09-17 19:43:22,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=256288.66666666666, ans=0.025 2024-09-17 19:43:36,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=22.5 2024-09-17 19:44:08,719 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:44:32,900 INFO [train.py:1198] (1/2) Epoch 15, batch 700, loss[loss=0.2338, simple_loss=0.2721, pruned_loss=0.0747, ctc_loss=0.1487, cr_loss=0.4083, over 34564.00 frames. ], tot_loss[loss=0.2418, simple_loss=0.2854, pruned_loss=0.07542, ctc_loss=0.1521, cr_loss=0.4245, over 6579256.98 frames. ], batch size: 89, lr: 8.15e-03, grad_scale: 16.0 2024-09-17 19:44:33,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=256522.0, ans=0.125 2024-09-17 19:44:51,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.46 vs. limit=15.0 2024-09-17 19:45:01,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=256568.66666666666, ans=0.125 2024-09-17 19:45:15,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=256615.33333333334, ans=0.125 2024-09-17 19:45:20,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=256615.33333333334, ans=0.2 2024-09-17 19:45:23,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=256662.0, ans=0.0 2024-09-17 19:45:54,633 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.420e+02 2.990e+02 4.072e+02 7.325e+02, threshold=5.981e+02, percent-clipped=5.0 2024-09-17 19:45:57,813 INFO [train.py:1198] (1/2) Epoch 15, batch 750, loss[loss=0.2353, simple_loss=0.2865, pruned_loss=0.07015, ctc_loss=0.1411, cr_loss=0.3916, over 34417.00 frames. ], tot_loss[loss=0.2414, simple_loss=0.285, pruned_loss=0.07522, ctc_loss=0.1519, cr_loss=0.4244, over 6622903.46 frames. ], batch size: 95, lr: 8.14e-03, grad_scale: 16.0 2024-09-17 19:45:59,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=256755.33333333334, ans=0.125 2024-09-17 19:46:06,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=256755.33333333334, ans=0.125 2024-09-17 19:46:12,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=256802.0, ans=0.05 2024-09-17 19:46:54,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=256895.33333333334, ans=0.125 2024-09-17 19:47:10,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=256942.0, ans=0.1 2024-09-17 19:47:22,151 INFO [train.py:1198] (1/2) Epoch 15, batch 800, loss[loss=0.2142, simple_loss=0.2604, pruned_loss=0.0632, ctc_loss=0.1309, cr_loss=0.3852, over 34457.00 frames. ], tot_loss[loss=0.2412, simple_loss=0.2848, pruned_loss=0.07513, ctc_loss=0.1518, cr_loss=0.4243, over 6658269.00 frames. ], batch size: 85, lr: 8.14e-03, grad_scale: 32.0 2024-09-17 19:47:32,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=256988.66666666666, ans=0.125 2024-09-17 19:47:42,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=257035.33333333334, ans=0.0 2024-09-17 19:47:47,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=257035.33333333334, ans=0.125 2024-09-17 19:47:56,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=257082.0, ans=0.0 2024-09-17 19:47:57,083 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:47:57,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=257082.0, ans=0.125 2024-09-17 19:47:59,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2024-09-17 19:48:16,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=257128.66666666666, ans=0.2 2024-09-17 19:48:32,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.76 vs. limit=10.0 2024-09-17 19:48:34,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2024-09-17 19:48:40,648 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.540e+02 3.070e+02 3.818e+02 6.768e+02, threshold=6.140e+02, percent-clipped=3.0 2024-09-17 19:48:43,844 INFO [train.py:1198] (1/2) Epoch 15, batch 850, loss[loss=0.2489, simple_loss=0.296, pruned_loss=0.07652, ctc_loss=0.1572, cr_loss=0.4313, over 34379.00 frames. ], tot_loss[loss=0.2409, simple_loss=0.2845, pruned_loss=0.07501, ctc_loss=0.1515, cr_loss=0.4236, over 6690526.89 frames. ], batch size: 103, lr: 8.14e-03, grad_scale: 32.0 2024-09-17 19:48:46,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.01 vs. limit=15.0 2024-09-17 19:48:47,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=257222.0, ans=0.025 2024-09-17 19:48:53,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=257222.0, ans=0.125 2024-09-17 19:49:05,067 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.68 vs. limit=15.0 2024-09-17 19:49:48,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=257362.0, ans=0.1 2024-09-17 19:49:52,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=257408.66666666666, ans=0.125 2024-09-17 19:50:08,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=257455.33333333334, ans=0.025 2024-09-17 19:50:10,127 INFO [train.py:1198] (1/2) Epoch 15, batch 900, loss[loss=0.2242, simple_loss=0.2667, pruned_loss=0.06913, ctc_loss=0.14, cr_loss=0.3888, over 34506.00 frames. ], tot_loss[loss=0.2417, simple_loss=0.2852, pruned_loss=0.07539, ctc_loss=0.1522, cr_loss=0.4247, over 6695833.27 frames. ], batch size: 85, lr: 8.13e-03, grad_scale: 16.0 2024-09-17 19:50:21,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=257455.33333333334, ans=10.0 2024-09-17 19:50:40,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.61 vs. limit=12.0 2024-09-17 19:50:42,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.36 vs. limit=22.5 2024-09-17 19:50:44,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=257548.66666666666, ans=0.125 2024-09-17 19:51:13,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=257595.33333333334, ans=0.0 2024-09-17 19:51:23,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=257642.0, ans=0.125 2024-09-17 19:51:30,960 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.907e+02 2.471e+02 2.917e+02 3.624e+02 6.337e+02, threshold=5.834e+02, percent-clipped=2.0 2024-09-17 19:51:32,603 INFO [train.py:1198] (1/2) Epoch 15, batch 950, loss[loss=0.2223, simple_loss=0.2603, pruned_loss=0.06959, ctc_loss=0.1424, cr_loss=0.4149, over 34734.00 frames. ], tot_loss[loss=0.2423, simple_loss=0.2858, pruned_loss=0.07566, ctc_loss=0.1528, cr_loss=0.4259, over 6698609.83 frames. ], batch size: 87, lr: 8.13e-03, grad_scale: 16.0 2024-09-17 19:52:16,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=257782.0, ans=0.125 2024-09-17 19:52:32,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=257828.66666666666, ans=0.125 2024-09-17 19:52:51,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.00 vs. limit=10.0 2024-09-17 19:52:55,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=257922.0, ans=0.0 2024-09-17 19:52:57,300 INFO [train.py:1198] (1/2) Epoch 15, batch 1000, loss[loss=0.237, simple_loss=0.2751, pruned_loss=0.07568, ctc_loss=0.1514, cr_loss=0.4321, over 34468.00 frames. ], tot_loss[loss=0.2431, simple_loss=0.2863, pruned_loss=0.07609, ctc_loss=0.1535, cr_loss=0.4267, over 6691990.51 frames. ], batch size: 90, lr: 8.13e-03, grad_scale: 16.0 2024-09-17 19:53:10,843 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 19:53:14,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=257968.66666666666, ans=0.0 2024-09-17 19:53:45,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=258015.33333333334, ans=0.125 2024-09-17 19:54:01,251 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.96 vs. limit=22.5 2024-09-17 19:54:04,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=258108.66666666666, ans=0.125 2024-09-17 19:54:20,223 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.411e+02 2.636e+02 3.222e+02 5.241e+02, threshold=5.272e+02, percent-clipped=0.0 2024-09-17 19:54:21,858 INFO [train.py:1198] (1/2) Epoch 15, batch 1050, loss[loss=0.244, simple_loss=0.2902, pruned_loss=0.07521, ctc_loss=0.151, cr_loss=0.4296, over 34568.00 frames. ], tot_loss[loss=0.2423, simple_loss=0.2855, pruned_loss=0.07573, ctc_loss=0.1528, cr_loss=0.4252, over 6703051.86 frames. ], batch size: 99, lr: 8.12e-03, grad_scale: 16.0 2024-09-17 19:54:22,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=258155.33333333334, ans=0.125 2024-09-17 19:54:52,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=258202.0, ans=15.0 2024-09-17 19:55:06,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=258248.66666666666, ans=0.0 2024-09-17 19:55:43,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=258388.66666666666, ans=0.1 2024-09-17 19:55:44,453 INFO [train.py:1198] (1/2) Epoch 15, batch 1100, loss[loss=0.2258, simple_loss=0.2691, pruned_loss=0.06888, ctc_loss=0.142, cr_loss=0.4062, over 34359.00 frames. ], tot_loss[loss=0.2417, simple_loss=0.2849, pruned_loss=0.07547, ctc_loss=0.1524, cr_loss=0.4244, over 6715474.09 frames. ], batch size: 91, lr: 8.12e-03, grad_scale: 16.0 2024-09-17 19:55:49,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=258388.66666666666, ans=0.0 2024-09-17 19:56:30,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=258482.0, ans=0.0 2024-09-17 19:56:33,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=258528.66666666666, ans=0.0 2024-09-17 19:56:41,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=258528.66666666666, ans=0.1 2024-09-17 19:56:48,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=258528.66666666666, ans=0.125 2024-09-17 19:56:51,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=258575.33333333334, ans=0.2 2024-09-17 19:56:53,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=258575.33333333334, ans=0.0 2024-09-17 19:57:03,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=258575.33333333334, ans=0.125 2024-09-17 19:57:07,539 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.575e+02 2.873e+02 3.704e+02 6.337e+02, threshold=5.746e+02, percent-clipped=5.0 2024-09-17 19:57:09,238 INFO [train.py:1198] (1/2) Epoch 15, batch 1150, loss[loss=0.2413, simple_loss=0.2825, pruned_loss=0.07591, ctc_loss=0.1531, cr_loss=0.4434, over 34346.00 frames. ], tot_loss[loss=0.2415, simple_loss=0.2848, pruned_loss=0.0754, ctc_loss=0.1522, cr_loss=0.4239, over 6714771.55 frames. ], batch size: 91, lr: 8.12e-03, grad_scale: 16.0 2024-09-17 19:57:14,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=258622.0, ans=0.125 2024-09-17 19:57:31,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=258668.66666666666, ans=0.125 2024-09-17 19:57:36,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=258668.66666666666, ans=0.0 2024-09-17 19:57:59,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=258762.0, ans=0.2 2024-09-17 19:57:59,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=258762.0, ans=0.1 2024-09-17 19:58:08,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=258762.0, ans=0.0 2024-09-17 19:58:12,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=258762.0, ans=0.1 2024-09-17 19:58:24,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=258808.66666666666, ans=0.125 2024-09-17 19:58:33,966 INFO [train.py:1198] (1/2) Epoch 15, batch 1200, loss[loss=0.2533, simple_loss=0.2981, pruned_loss=0.07972, ctc_loss=0.1571, cr_loss=0.4422, over 34555.00 frames. ], tot_loss[loss=0.2425, simple_loss=0.2858, pruned_loss=0.07582, ctc_loss=0.153, cr_loss=0.4251, over 6707896.34 frames. ], batch size: 99, lr: 8.11e-03, grad_scale: 32.0 2024-09-17 19:58:34,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=258855.33333333334, ans=0.2 2024-09-17 19:58:58,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2024-09-17 19:58:59,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=258902.0, ans=10.0 2024-09-17 19:59:20,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=258948.66666666666, ans=0.0 2024-09-17 19:59:22,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=258995.33333333334, ans=0.125 2024-09-17 19:59:32,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=258995.33333333334, ans=0.125 2024-09-17 19:59:40,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=259042.0, ans=0.0 2024-09-17 19:59:42,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=259042.0, ans=0.2 2024-09-17 19:59:54,753 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.472e+02 2.976e+02 3.603e+02 6.912e+02, threshold=5.951e+02, percent-clipped=3.0 2024-09-17 19:59:56,445 INFO [train.py:1198] (1/2) Epoch 15, batch 1250, loss[loss=0.2636, simple_loss=0.3017, pruned_loss=0.08639, ctc_loss=0.1695, cr_loss=0.4715, over 34349.00 frames. ], tot_loss[loss=0.2434, simple_loss=0.2868, pruned_loss=0.07617, ctc_loss=0.1534, cr_loss=0.4266, over 6741449.38 frames. ], batch size: 107, lr: 8.11e-03, grad_scale: 32.0 2024-09-17 20:00:03,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=259088.66666666666, ans=0.125 2024-09-17 20:00:15,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=259135.33333333334, ans=0.0 2024-09-17 20:01:19,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=259275.33333333334, ans=0.125 2024-09-17 20:01:22,762 INFO [train.py:1198] (1/2) Epoch 15, batch 1300, loss[loss=0.2366, simple_loss=0.2852, pruned_loss=0.07096, ctc_loss=0.1467, cr_loss=0.4162, over 33102.00 frames. ], tot_loss[loss=0.242, simple_loss=0.2854, pruned_loss=0.07556, ctc_loss=0.1524, cr_loss=0.425, over 6744632.47 frames. ], batch size: 130, lr: 8.10e-03, grad_scale: 16.0 2024-09-17 20:01:36,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=259322.0, ans=0.1 2024-09-17 20:01:50,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.10 vs. limit=15.0 2024-09-17 20:01:57,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=259415.33333333334, ans=0.2 2024-09-17 20:01:59,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=259415.33333333334, ans=0.125 2024-09-17 20:02:06,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=259415.33333333334, ans=0.0 2024-09-17 20:02:10,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2024-09-17 20:02:45,553 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.555e+02 3.050e+02 3.880e+02 6.281e+02, threshold=6.101e+02, percent-clipped=1.0 2024-09-17 20:02:45,574 INFO [train.py:1198] (1/2) Epoch 15, batch 1350, loss[loss=0.2263, simple_loss=0.2743, pruned_loss=0.06708, ctc_loss=0.14, cr_loss=0.4036, over 34554.00 frames. ], tot_loss[loss=0.2415, simple_loss=0.2849, pruned_loss=0.07534, ctc_loss=0.152, cr_loss=0.4246, over 6764520.10 frames. ], batch size: 94, lr: 8.10e-03, grad_scale: 16.0 2024-09-17 20:03:02,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=259602.0, ans=0.0 2024-09-17 20:03:54,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=259742.0, ans=0.0 2024-09-17 20:03:56,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=259742.0, ans=0.125 2024-09-17 20:03:59,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=259742.0, ans=0.125 2024-09-17 20:04:03,722 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.02 vs. limit=15.0 2024-09-17 20:04:07,813 INFO [train.py:1198] (1/2) Epoch 15, batch 1400, loss[loss=0.2137, simple_loss=0.2558, pruned_loss=0.06501, ctc_loss=0.1303, cr_loss=0.3864, over 34279.00 frames. ], tot_loss[loss=0.2414, simple_loss=0.2849, pruned_loss=0.07529, ctc_loss=0.1519, cr_loss=0.4245, over 6776924.40 frames. ], batch size: 80, lr: 8.10e-03, grad_scale: 16.0 2024-09-17 20:04:18,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=259788.66666666666, ans=0.1 2024-09-17 20:04:19,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=259788.66666666666, ans=0.1 2024-09-17 20:04:48,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=259882.0, ans=0.0 2024-09-17 20:04:50,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.81 vs. limit=15.0 2024-09-17 20:04:53,044 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:04:54,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=259882.0, ans=0.125 2024-09-17 20:04:56,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=259882.0, ans=0.125 2024-09-17 20:05:05,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=259928.66666666666, ans=0.04949747468305833 2024-09-17 20:05:05,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.40 vs. limit=6.0 2024-09-17 20:05:21,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=259975.33333333334, ans=0.025 2024-09-17 20:05:34,300 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.940e+02 2.427e+02 2.737e+02 3.679e+02 7.171e+02, threshold=5.473e+02, percent-clipped=1.0 2024-09-17 20:05:34,321 INFO [train.py:1198] (1/2) Epoch 15, batch 1450, loss[loss=0.2666, simple_loss=0.3059, pruned_loss=0.08742, ctc_loss=0.1703, cr_loss=0.4581, over 34493.00 frames. ], tot_loss[loss=0.2423, simple_loss=0.2858, pruned_loss=0.07561, ctc_loss=0.1527, cr_loss=0.4257, over 6774810.51 frames. ], batch size: 110, lr: 8.09e-03, grad_scale: 16.0 2024-09-17 20:05:41,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=260022.0, ans=0.0 2024-09-17 20:05:49,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=260068.66666666666, ans=0.125 2024-09-17 20:06:01,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=260068.66666666666, ans=0.0 2024-09-17 20:06:31,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=260162.0, ans=0.1 2024-09-17 20:06:55,921 INFO [train.py:1198] (1/2) Epoch 15, batch 1500, loss[loss=0.2501, simple_loss=0.2967, pruned_loss=0.07787, ctc_loss=0.1533, cr_loss=0.4254, over 34450.00 frames. ], tot_loss[loss=0.2421, simple_loss=0.2857, pruned_loss=0.07549, ctc_loss=0.1524, cr_loss=0.4253, over 6773839.58 frames. ], batch size: 100, lr: 8.09e-03, grad_scale: 16.0 2024-09-17 20:06:58,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=260255.33333333334, ans=0.0 2024-09-17 20:07:03,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=260255.33333333334, ans=0.125 2024-09-17 20:07:30,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=260348.66666666666, ans=0.5 2024-09-17 20:07:34,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=260348.66666666666, ans=0.1 2024-09-17 20:07:42,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.04 vs. limit=15.0 2024-09-17 20:07:56,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=260395.33333333334, ans=0.2 2024-09-17 20:08:15,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=260442.0, ans=0.125 2024-09-17 20:08:21,637 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.079e+02 2.398e+02 2.672e+02 3.575e+02 1.892e+03, threshold=5.343e+02, percent-clipped=5.0 2024-09-17 20:08:21,658 INFO [train.py:1198] (1/2) Epoch 15, batch 1550, loss[loss=0.2648, simple_loss=0.3091, pruned_loss=0.08409, ctc_loss=0.1681, cr_loss=0.4664, over 34385.00 frames. ], tot_loss[loss=0.2429, simple_loss=0.2863, pruned_loss=0.07596, ctc_loss=0.1532, cr_loss=0.426, over 6745995.69 frames. ], batch size: 105, lr: 8.09e-03, grad_scale: 16.0 2024-09-17 20:08:28,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=260488.66666666666, ans=0.0 2024-09-17 20:08:45,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=260535.33333333334, ans=0.125 2024-09-17 20:08:51,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=260535.33333333334, ans=0.2 2024-09-17 20:08:56,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=260582.0, ans=0.0 2024-09-17 20:09:09,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=260582.0, ans=0.125 2024-09-17 20:09:14,976 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2024-09-17 20:09:16,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=260628.66666666666, ans=0.2 2024-09-17 20:09:42,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=260675.33333333334, ans=0.125 2024-09-17 20:09:44,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=260722.0, ans=0.2 2024-09-17 20:09:45,500 INFO [train.py:1198] (1/2) Epoch 15, batch 1600, loss[loss=0.2454, simple_loss=0.2919, pruned_loss=0.07493, ctc_loss=0.1545, cr_loss=0.4512, over 34570.00 frames. ], tot_loss[loss=0.243, simple_loss=0.2862, pruned_loss=0.07603, ctc_loss=0.1534, cr_loss=0.4262, over 6725254.89 frames. ], batch size: 99, lr: 8.08e-03, grad_scale: 32.0 2024-09-17 20:10:10,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2024-09-17 20:10:32,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=260815.33333333334, ans=0.1 2024-09-17 20:10:42,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=260862.0, ans=0.125 2024-09-17 20:10:45,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=260862.0, ans=0.025 2024-09-17 20:10:47,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=260862.0, ans=0.2 2024-09-17 20:10:52,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=260908.66666666666, ans=0.2 2024-09-17 20:10:55,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=260908.66666666666, ans=0.125 2024-09-17 20:11:08,284 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.468e+02 3.254e+02 4.307e+02 7.354e+02, threshold=6.509e+02, percent-clipped=13.0 2024-09-17 20:11:08,306 INFO [train.py:1198] (1/2) Epoch 15, batch 1650, loss[loss=0.2382, simple_loss=0.2844, pruned_loss=0.07286, ctc_loss=0.1467, cr_loss=0.422, over 34395.00 frames. ], tot_loss[loss=0.243, simple_loss=0.2863, pruned_loss=0.07602, ctc_loss=0.1533, cr_loss=0.4258, over 6717763.37 frames. ], batch size: 103, lr: 8.08e-03, grad_scale: 32.0 2024-09-17 20:11:21,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=260955.33333333334, ans=0.125 2024-09-17 20:11:28,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=261002.0, ans=0.125 2024-09-17 20:11:33,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=261002.0, ans=0.5 2024-09-17 20:11:33,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=261002.0, ans=0.0 2024-09-17 20:11:34,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.12 vs. limit=15.0 2024-09-17 20:11:46,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=261048.66666666666, ans=0.125 2024-09-17 20:11:58,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=261095.33333333334, ans=0.125 2024-09-17 20:12:16,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=261142.0, ans=0.125 2024-09-17 20:12:25,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.22 vs. limit=22.5 2024-09-17 20:12:34,105 INFO [train.py:1198] (1/2) Epoch 15, batch 1700, loss[loss=0.2102, simple_loss=0.2542, pruned_loss=0.06228, ctc_loss=0.1306, cr_loss=0.3851, over 34327.00 frames. ], tot_loss[loss=0.2424, simple_loss=0.2859, pruned_loss=0.07568, ctc_loss=0.1527, cr_loss=0.4252, over 6744434.97 frames. ], batch size: 80, lr: 8.08e-03, grad_scale: 32.0 2024-09-17 20:12:39,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=261188.66666666666, ans=0.125 2024-09-17 20:13:08,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=261282.0, ans=0.0 2024-09-17 20:13:24,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.77 vs. limit=10.0 2024-09-17 20:13:42,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=261328.66666666666, ans=0.125 2024-09-17 20:13:46,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=261375.33333333334, ans=0.0 2024-09-17 20:13:52,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261375.33333333334, ans=0.1 2024-09-17 20:13:53,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=261375.33333333334, ans=0.1 2024-09-17 20:13:57,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=261375.33333333334, ans=0.0 2024-09-17 20:14:03,334 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.458e+02 2.859e+02 3.637e+02 9.605e+02, threshold=5.718e+02, percent-clipped=1.0 2024-09-17 20:14:03,371 INFO [train.py:1198] (1/2) Epoch 15, batch 1750, loss[loss=0.2216, simple_loss=0.2614, pruned_loss=0.06908, ctc_loss=0.1417, cr_loss=0.3807, over 34177.00 frames. ], tot_loss[loss=0.2422, simple_loss=0.2856, pruned_loss=0.07561, ctc_loss=0.1525, cr_loss=0.425, over 6753298.64 frames. ], batch size: 78, lr: 8.07e-03, grad_scale: 32.0 2024-09-17 20:14:05,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=261422.0, ans=0.1 2024-09-17 20:14:06,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2024-09-17 20:14:08,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=261422.0, ans=0.1 2024-09-17 20:14:29,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.60 vs. limit=5.0 2024-09-17 20:15:16,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.83 vs. limit=15.0 2024-09-17 20:15:25,081 INFO [train.py:1198] (1/2) Epoch 15, batch 1800, loss[loss=0.2592, simple_loss=0.3018, pruned_loss=0.08345, ctc_loss=0.1646, cr_loss=0.4217, over 34690.00 frames. ], tot_loss[loss=0.2426, simple_loss=0.2859, pruned_loss=0.07585, ctc_loss=0.1529, cr_loss=0.4256, over 6755767.38 frames. ], batch size: 97, lr: 8.07e-03, grad_scale: 32.0 2024-09-17 20:15:35,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=261655.33333333334, ans=0.125 2024-09-17 20:15:46,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=12.0 2024-09-17 20:15:54,814 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.92 vs. limit=22.5 2024-09-17 20:16:20,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=261795.33333333334, ans=0.2 2024-09-17 20:16:51,986 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.741e+02 3.874e+02 5.238e+02 7.916e+02, threshold=7.748e+02, percent-clipped=16.0 2024-09-17 20:16:52,007 INFO [train.py:1198] (1/2) Epoch 15, batch 1850, loss[loss=0.2478, simple_loss=0.2935, pruned_loss=0.07684, ctc_loss=0.1545, cr_loss=0.4374, over 34450.00 frames. ], tot_loss[loss=0.2423, simple_loss=0.2856, pruned_loss=0.0757, ctc_loss=0.1527, cr_loss=0.4256, over 6763532.97 frames. ], batch size: 100, lr: 8.07e-03, grad_scale: 32.0 2024-09-17 20:17:25,156 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:18:04,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.50 vs. limit=15.0 2024-09-17 20:18:13,936 INFO [train.py:1198] (1/2) Epoch 15, batch 1900, loss[loss=0.2326, simple_loss=0.2811, pruned_loss=0.06985, ctc_loss=0.1437, cr_loss=0.3937, over 34360.00 frames. ], tot_loss[loss=0.2431, simple_loss=0.2865, pruned_loss=0.07597, ctc_loss=0.1532, cr_loss=0.4267, over 6772307.61 frames. ], batch size: 103, lr: 8.06e-03, grad_scale: 32.0 2024-09-17 20:18:20,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=12.0 2024-09-17 20:18:32,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=262168.6666666667, ans=0.125 2024-09-17 20:18:42,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=262168.6666666667, ans=0.2 2024-09-17 20:19:12,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=262262.0, ans=0.2 2024-09-17 20:19:18,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2024-09-17 20:19:29,814 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-09-17 20:19:38,965 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.555e+02 2.954e+02 3.548e+02 5.549e+02, threshold=5.908e+02, percent-clipped=0.0 2024-09-17 20:19:38,986 INFO [train.py:1198] (1/2) Epoch 15, batch 1950, loss[loss=0.2426, simple_loss=0.2867, pruned_loss=0.07574, ctc_loss=0.1495, cr_loss=0.4287, over 34365.00 frames. ], tot_loss[loss=0.244, simple_loss=0.2876, pruned_loss=0.07629, ctc_loss=0.1538, cr_loss=0.4284, over 6788929.65 frames. ], batch size: 91, lr: 8.06e-03, grad_scale: 32.0 2024-09-17 20:19:54,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=262355.3333333333, ans=0.04949747468305833 2024-09-17 20:20:15,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=262448.6666666667, ans=0.125 2024-09-17 20:20:21,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.72 vs. limit=15.0 2024-09-17 20:20:38,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=262495.3333333333, ans=0.125 2024-09-17 20:20:57,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=262542.0, ans=0.0 2024-09-17 20:21:03,277 INFO [train.py:1198] (1/2) Epoch 15, batch 2000, loss[loss=0.2104, simple_loss=0.2513, pruned_loss=0.06392, ctc_loss=0.1299, cr_loss=0.3934, over 34172.00 frames. ], tot_loss[loss=0.2444, simple_loss=0.288, pruned_loss=0.07643, ctc_loss=0.1541, cr_loss=0.4282, over 6765423.52 frames. ], batch size: 78, lr: 8.06e-03, grad_scale: 32.0 2024-09-17 20:21:08,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=262588.6666666667, ans=0.2 2024-09-17 20:21:27,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=262635.3333333333, ans=0.125 2024-09-17 20:21:32,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=262635.3333333333, ans=0.5 2024-09-17 20:21:37,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.11 vs. limit=15.0 2024-09-17 20:21:40,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=262682.0, ans=0.125 2024-09-17 20:22:09,974 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:22:24,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=262822.0, ans=0.125 2024-09-17 20:22:26,059 INFO [train.py:1198] (1/2) Epoch 15, batch 2050, loss[loss=0.2005, simple_loss=0.2477, pruned_loss=0.05731, ctc_loss=0.1217, cr_loss=0.3569, over 34504.00 frames. ], tot_loss[loss=0.2436, simple_loss=0.2871, pruned_loss=0.07615, ctc_loss=0.1536, cr_loss=0.4272, over 6755593.92 frames. ], batch size: 82, lr: 8.05e-03, grad_scale: 16.0 2024-09-17 20:22:29,485 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.082e+02 2.529e+02 3.076e+02 3.701e+02 8.380e+02, threshold=6.151e+02, percent-clipped=7.0 2024-09-17 20:22:54,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=262868.6666666667, ans=0.035 2024-09-17 20:23:17,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=262962.0, ans=10.0 2024-09-17 20:23:26,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=262962.0, ans=0.125 2024-09-17 20:23:35,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.96 vs. limit=15.0 2024-09-17 20:23:43,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.22 vs. limit=22.5 2024-09-17 20:23:44,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=263008.6666666667, ans=0.07 2024-09-17 20:23:52,571 INFO [train.py:1198] (1/2) Epoch 15, batch 2100, loss[loss=0.2361, simple_loss=0.279, pruned_loss=0.07377, ctc_loss=0.1471, cr_loss=0.4074, over 34516.00 frames. ], tot_loss[loss=0.2427, simple_loss=0.2864, pruned_loss=0.07578, ctc_loss=0.1528, cr_loss=0.4253, over 6769847.25 frames. ], batch size: 94, lr: 8.05e-03, grad_scale: 16.0 2024-09-17 20:24:04,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=263055.3333333333, ans=0.125 2024-09-17 20:25:15,021 INFO [train.py:1198] (1/2) Epoch 15, batch 2150, loss[loss=0.2296, simple_loss=0.2723, pruned_loss=0.07065, ctc_loss=0.1436, cr_loss=0.425, over 34394.00 frames. ], tot_loss[loss=0.2416, simple_loss=0.2854, pruned_loss=0.07521, ctc_loss=0.1517, cr_loss=0.4238, over 6787912.45 frames. ], batch size: 91, lr: 8.04e-03, grad_scale: 16.0 2024-09-17 20:25:18,347 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.337e+02 2.935e+02 3.869e+02 8.576e+02, threshold=5.871e+02, percent-clipped=2.0 2024-09-17 20:25:23,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=263288.6666666667, ans=0.0 2024-09-17 20:25:30,565 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:25:38,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=263335.3333333333, ans=0.125 2024-09-17 20:25:48,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=263382.0, ans=0.125 2024-09-17 20:25:55,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=263382.0, ans=0.0 2024-09-17 20:26:10,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=263428.6666666667, ans=10.0 2024-09-17 20:26:19,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=263475.3333333333, ans=0.025 2024-09-17 20:26:26,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=263475.3333333333, ans=0.0 2024-09-17 20:26:33,088 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:26:37,645 INFO [train.py:1198] (1/2) Epoch 15, batch 2200, loss[loss=0.2444, simple_loss=0.2912, pruned_loss=0.07534, ctc_loss=0.1514, cr_loss=0.4191, over 34454.00 frames. ], tot_loss[loss=0.2414, simple_loss=0.2852, pruned_loss=0.07516, ctc_loss=0.1516, cr_loss=0.4237, over 6783675.28 frames. ], batch size: 100, lr: 8.04e-03, grad_scale: 16.0 2024-09-17 20:26:52,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=263568.6666666667, ans=0.02 2024-09-17 20:27:26,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=263615.3333333333, ans=0.125 2024-09-17 20:27:41,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=15.0 2024-09-17 20:27:48,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=263708.6666666667, ans=0.1 2024-09-17 20:27:53,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=263708.6666666667, ans=0.2 2024-09-17 20:27:54,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=263708.6666666667, ans=0.125 2024-09-17 20:27:59,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=263708.6666666667, ans=0.125 2024-09-17 20:28:04,205 INFO [train.py:1198] (1/2) Epoch 15, batch 2250, loss[loss=0.2312, simple_loss=0.2786, pruned_loss=0.06948, ctc_loss=0.1414, cr_loss=0.415, over 34414.00 frames. ], tot_loss[loss=0.241, simple_loss=0.285, pruned_loss=0.0749, ctc_loss=0.1513, cr_loss=0.4231, over 6780271.00 frames. ], batch size: 95, lr: 8.04e-03, grad_scale: 16.0 2024-09-17 20:28:07,485 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.963e+02 2.583e+02 3.496e+02 4.913e+02 9.484e+02, threshold=6.993e+02, percent-clipped=13.0 2024-09-17 20:28:07,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=263755.3333333333, ans=0.0 2024-09-17 20:28:16,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=263755.3333333333, ans=0.035 2024-09-17 20:28:21,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=263802.0, ans=0.125 2024-09-17 20:28:24,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=263802.0, ans=0.125 2024-09-17 20:28:47,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=263848.6666666667, ans=0.125 2024-09-17 20:28:54,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=263895.3333333333, ans=22.5 2024-09-17 20:29:05,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=263895.3333333333, ans=0.0 2024-09-17 20:29:26,742 INFO [train.py:1198] (1/2) Epoch 15, batch 2300, loss[loss=0.2229, simple_loss=0.2616, pruned_loss=0.07008, ctc_loss=0.139, cr_loss=0.4072, over 34277.00 frames. ], tot_loss[loss=0.2401, simple_loss=0.284, pruned_loss=0.07464, ctc_loss=0.1508, cr_loss=0.422, over 6767107.19 frames. ], batch size: 83, lr: 8.03e-03, grad_scale: 16.0 2024-09-17 20:29:57,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.34 vs. limit=15.0 2024-09-17 20:30:18,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264128.6666666667, ans=0.1 2024-09-17 20:30:25,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=264128.6666666667, ans=0.125 2024-09-17 20:30:25,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.47 vs. limit=15.0 2024-09-17 20:30:26,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264128.6666666667, ans=0.1 2024-09-17 20:30:36,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=264175.3333333333, ans=0.125 2024-09-17 20:30:36,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264175.3333333333, ans=0.1 2024-09-17 20:30:38,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=264175.3333333333, ans=0.125 2024-09-17 20:30:49,575 INFO [train.py:1198] (1/2) Epoch 15, batch 2350, loss[loss=0.263, simple_loss=0.3049, pruned_loss=0.08408, ctc_loss=0.1697, cr_loss=0.4742, over 34704.00 frames. ], tot_loss[loss=0.2407, simple_loss=0.2843, pruned_loss=0.07491, ctc_loss=0.1512, cr_loss=0.4237, over 6771482.99 frames. ], batch size: 97, lr: 8.03e-03, grad_scale: 16.0 2024-09-17 20:30:52,756 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 2.510e+02 3.028e+02 3.646e+02 5.773e+02, threshold=6.056e+02, percent-clipped=0.0 2024-09-17 20:31:43,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=22.5 2024-09-17 20:31:44,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=264362.0, ans=0.125 2024-09-17 20:32:06,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=264408.6666666667, ans=0.025 2024-09-17 20:32:15,594 INFO [train.py:1198] (1/2) Epoch 15, batch 2400, loss[loss=0.2315, simple_loss=0.2742, pruned_loss=0.07141, ctc_loss=0.1473, cr_loss=0.414, over 34606.00 frames. ], tot_loss[loss=0.2416, simple_loss=0.2851, pruned_loss=0.07533, ctc_loss=0.1521, cr_loss=0.425, over 6775682.17 frames. ], batch size: 89, lr: 8.03e-03, grad_scale: 32.0 2024-09-17 20:32:19,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=264455.3333333333, ans=0.0 2024-09-17 20:32:25,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=264455.3333333333, ans=0.125 2024-09-17 20:32:47,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=264548.6666666667, ans=0.0 2024-09-17 20:32:47,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=264548.6666666667, ans=0.2 2024-09-17 20:32:50,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264548.6666666667, ans=0.1 2024-09-17 20:33:04,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.64 vs. limit=15.0 2024-09-17 20:33:38,155 INFO [train.py:1198] (1/2) Epoch 15, batch 2450, loss[loss=0.2424, simple_loss=0.2891, pruned_loss=0.07423, ctc_loss=0.152, cr_loss=0.42, over 34420.00 frames. ], tot_loss[loss=0.2426, simple_loss=0.2861, pruned_loss=0.07571, ctc_loss=0.1528, cr_loss=0.4262, over 6750777.62 frames. ], batch size: 95, lr: 8.02e-03, grad_scale: 32.0 2024-09-17 20:33:41,302 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.650e+02 3.134e+02 4.055e+02 6.398e+02, threshold=6.269e+02, percent-clipped=2.0 2024-09-17 20:33:45,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.93 vs. limit=22.5 2024-09-17 20:33:51,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264688.6666666667, ans=0.1 2024-09-17 20:33:58,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=264735.3333333333, ans=0.125 2024-09-17 20:34:19,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=264782.0, ans=0.0 2024-09-17 20:34:31,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=264828.6666666667, ans=0.125 2024-09-17 20:34:45,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=264875.3333333333, ans=0.0 2024-09-17 20:34:58,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2024-09-17 20:34:59,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=264875.3333333333, ans=0.125 2024-09-17 20:35:03,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=264922.0, ans=0.0 2024-09-17 20:35:04,373 INFO [train.py:1198] (1/2) Epoch 15, batch 2500, loss[loss=0.2505, simple_loss=0.3007, pruned_loss=0.07628, ctc_loss=0.1559, cr_loss=0.4145, over 34453.00 frames. ], tot_loss[loss=0.2423, simple_loss=0.2859, pruned_loss=0.07554, ctc_loss=0.1525, cr_loss=0.4261, over 6761531.87 frames. ], batch size: 100, lr: 8.02e-03, grad_scale: 32.0 2024-09-17 20:35:21,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=264968.6666666667, ans=0.2 2024-09-17 20:35:26,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-09-17 20:35:40,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2024-09-17 20:35:45,042 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.33 vs. limit=15.0 2024-09-17 20:35:52,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=265062.0, ans=0.07 2024-09-17 20:35:56,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=265062.0, ans=0.5 2024-09-17 20:36:04,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=265062.0, ans=0.0 2024-09-17 20:36:13,729 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.88 vs. limit=15.0 2024-09-17 20:36:19,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=265108.6666666667, ans=0.09899494936611666 2024-09-17 20:36:27,590 INFO [train.py:1198] (1/2) Epoch 15, batch 2550, loss[loss=0.212, simple_loss=0.2561, pruned_loss=0.06335, ctc_loss=0.1307, cr_loss=0.376, over 34168.00 frames. ], tot_loss[loss=0.2422, simple_loss=0.2858, pruned_loss=0.07553, ctc_loss=0.1526, cr_loss=0.4263, over 6765001.43 frames. ], batch size: 78, lr: 8.02e-03, grad_scale: 32.0 2024-09-17 20:36:30,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-09-17 20:36:30,948 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.456e+02 2.888e+02 3.754e+02 7.780e+02, threshold=5.775e+02, percent-clipped=2.0 2024-09-17 20:36:43,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2024-09-17 20:37:05,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=265248.6666666667, ans=0.125 2024-09-17 20:37:08,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=265248.6666666667, ans=0.0 2024-09-17 20:37:13,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=265248.6666666667, ans=0.0 2024-09-17 20:37:34,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.07 vs. limit=10.0 2024-09-17 20:37:49,887 INFO [train.py:1198] (1/2) Epoch 15, batch 2600, loss[loss=0.2155, simple_loss=0.2679, pruned_loss=0.06138, ctc_loss=0.1269, cr_loss=0.3731, over 34345.00 frames. ], tot_loss[loss=0.2427, simple_loss=0.2863, pruned_loss=0.07569, ctc_loss=0.153, cr_loss=0.4267, over 6761162.09 frames. ], batch size: 91, lr: 8.01e-03, grad_scale: 32.0 2024-09-17 20:37:59,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=265388.6666666667, ans=0.0 2024-09-17 20:38:02,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.12 vs. limit=15.0 2024-09-17 20:38:30,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=265482.0, ans=0.2 2024-09-17 20:38:57,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=265575.3333333333, ans=0.125 2024-09-17 20:39:01,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=265575.3333333333, ans=0.125 2024-09-17 20:39:01,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.49 vs. limit=12.0 2024-09-17 20:39:12,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=265575.3333333333, ans=0.025 2024-09-17 20:39:14,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265622.0, ans=0.1 2024-09-17 20:39:15,377 INFO [train.py:1198] (1/2) Epoch 15, batch 2650, loss[loss=0.2573, simple_loss=0.306, pruned_loss=0.07934, ctc_loss=0.1627, cr_loss=0.4327, over 34282.00 frames. ], tot_loss[loss=0.2428, simple_loss=0.2866, pruned_loss=0.07571, ctc_loss=0.153, cr_loss=0.4271, over 6769424.74 frames. ], batch size: 117, lr: 8.01e-03, grad_scale: 32.0 2024-09-17 20:39:18,780 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.211e+02 2.529e+02 2.922e+02 3.742e+02 7.415e+02, threshold=5.844e+02, percent-clipped=5.0 2024-09-17 20:39:30,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=265668.6666666667, ans=0.1 2024-09-17 20:39:37,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=265668.6666666667, ans=0.125 2024-09-17 20:39:43,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=265668.6666666667, ans=0.125 2024-09-17 20:39:50,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=265715.3333333333, ans=0.125 2024-09-17 20:40:29,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=265808.6666666667, ans=0.0 2024-09-17 20:40:30,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=265808.6666666667, ans=0.0 2024-09-17 20:40:32,261 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.61 vs. limit=22.5 2024-09-17 20:40:37,812 INFO [train.py:1198] (1/2) Epoch 15, batch 2700, loss[loss=0.2522, simple_loss=0.2958, pruned_loss=0.0794, ctc_loss=0.1619, cr_loss=0.4339, over 34623.00 frames. ], tot_loss[loss=0.2428, simple_loss=0.2867, pruned_loss=0.07565, ctc_loss=0.1529, cr_loss=0.4269, over 6764038.28 frames. ], batch size: 102, lr: 8.01e-03, grad_scale: 32.0 2024-09-17 20:41:56,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=266042.0, ans=0.125 2024-09-17 20:42:01,107 INFO [train.py:1198] (1/2) Epoch 15, batch 2750, loss[loss=0.2406, simple_loss=0.2797, pruned_loss=0.07737, ctc_loss=0.1508, cr_loss=0.4142, over 34656.00 frames. ], tot_loss[loss=0.2412, simple_loss=0.2852, pruned_loss=0.07496, ctc_loss=0.1517, cr_loss=0.4245, over 6760976.90 frames. ], batch size: 88, lr: 8.00e-03, grad_scale: 32.0 2024-09-17 20:42:04,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.089e+02 2.504e+02 2.980e+02 3.713e+02 6.117e+02, threshold=5.959e+02, percent-clipped=1.0 2024-09-17 20:42:08,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=266088.6666666667, ans=0.125 2024-09-17 20:42:59,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=266228.6666666667, ans=0.125 2024-09-17 20:43:14,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=266275.3333333333, ans=0.125 2024-09-17 20:43:21,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=266275.3333333333, ans=0.125 2024-09-17 20:43:27,915 INFO [train.py:1198] (1/2) Epoch 15, batch 2800, loss[loss=0.2724, simple_loss=0.3071, pruned_loss=0.09108, ctc_loss=0.1891, cr_loss=0.4454, over 23290.00 frames. ], tot_loss[loss=0.2416, simple_loss=0.2853, pruned_loss=0.0752, ctc_loss=0.1523, cr_loss=0.4254, over 6739480.64 frames. ], batch size: 245, lr: 8.00e-03, grad_scale: 32.0 2024-09-17 20:43:31,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=266322.0, ans=0.0 2024-09-17 20:43:34,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=266322.0, ans=0.125 2024-09-17 20:44:04,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=266415.3333333333, ans=0.125 2024-09-17 20:44:50,151 INFO [train.py:1198] (1/2) Epoch 15, batch 2850, loss[loss=0.231, simple_loss=0.2755, pruned_loss=0.07058, ctc_loss=0.1438, cr_loss=0.4145, over 34446.00 frames. ], tot_loss[loss=0.2426, simple_loss=0.2861, pruned_loss=0.07574, ctc_loss=0.1532, cr_loss=0.4273, over 6724072.00 frames. ], batch size: 90, lr: 8.00e-03, grad_scale: 32.0 2024-09-17 20:44:53,316 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.560e+02 2.842e+02 3.355e+02 8.775e+02, threshold=5.684e+02, percent-clipped=2.0 2024-09-17 20:45:03,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=266555.3333333333, ans=0.125 2024-09-17 20:45:08,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=266602.0, ans=0.025 2024-09-17 20:45:24,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2024-09-17 20:45:39,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=266695.3333333333, ans=0.025 2024-09-17 20:46:14,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=266788.6666666667, ans=0.035 2024-09-17 20:46:15,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=266788.6666666667, ans=0.1 2024-09-17 20:46:16,223 INFO [train.py:1198] (1/2) Epoch 15, batch 2900, loss[loss=0.2487, simple_loss=0.2914, pruned_loss=0.07845, ctc_loss=0.158, cr_loss=0.4398, over 34540.00 frames. ], tot_loss[loss=0.2436, simple_loss=0.2872, pruned_loss=0.07605, ctc_loss=0.1537, cr_loss=0.4295, over 6753385.83 frames. ], batch size: 94, lr: 7.99e-03, grad_scale: 32.0 2024-09-17 20:46:29,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=266788.6666666667, ans=0.125 2024-09-17 20:46:46,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=266835.3333333333, ans=0.0 2024-09-17 20:46:59,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=266882.0, ans=0.0 2024-09-17 20:46:59,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=266882.0, ans=0.125 2024-09-17 20:47:02,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=266882.0, ans=0.125 2024-09-17 20:47:04,840 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=10.19 vs. limit=15.0 2024-09-17 20:47:25,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.82 vs. limit=10.0 2024-09-17 20:47:38,959 INFO [train.py:1198] (1/2) Epoch 15, batch 2950, loss[loss=0.2164, simple_loss=0.2622, pruned_loss=0.06478, ctc_loss=0.1304, cr_loss=0.373, over 34630.00 frames. ], tot_loss[loss=0.242, simple_loss=0.2855, pruned_loss=0.07541, ctc_loss=0.1526, cr_loss=0.4267, over 6749158.96 frames. ], batch size: 88, lr: 7.99e-03, grad_scale: 32.0 2024-09-17 20:47:42,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.649e+02 3.336e+02 4.302e+02 7.247e+02, threshold=6.672e+02, percent-clipped=6.0 2024-09-17 20:47:58,395 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.80 vs. limit=10.0 2024-09-17 20:48:01,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=267068.6666666667, ans=0.0 2024-09-17 20:48:23,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.22 vs. limit=15.0 2024-09-17 20:48:46,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=267208.6666666667, ans=0.125 2024-09-17 20:48:51,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=22.5 2024-09-17 20:48:54,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=267208.6666666667, ans=0.0 2024-09-17 20:49:02,315 INFO [train.py:1198] (1/2) Epoch 15, batch 3000, loss[loss=0.2412, simple_loss=0.2835, pruned_loss=0.07514, ctc_loss=0.1529, cr_loss=0.4487, over 34548.00 frames. ], tot_loss[loss=0.2416, simple_loss=0.2852, pruned_loss=0.07522, ctc_loss=0.1522, cr_loss=0.4258, over 6749925.35 frames. ], batch size: 94, lr: 7.99e-03, grad_scale: 32.0 2024-09-17 20:49:02,315 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 20:49:19,198 INFO [train.py:1230] (1/2) Epoch 15, validation: loss=0.1511, simple_loss=0.2487, pruned_loss=0.02232, ctc_loss=0.04382, cr_loss=1.733e-14, over 944034.00 frames. 2024-09-17 20:49:19,199 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 20:49:29,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=267255.3333333333, ans=0.0 2024-09-17 20:49:42,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=22.5 2024-09-17 20:49:52,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=267348.6666666667, ans=0.1 2024-09-17 20:49:55,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=267348.6666666667, ans=0.125 2024-09-17 20:50:13,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=12.0 2024-09-17 20:50:15,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=267395.3333333333, ans=0.2 2024-09-17 20:50:44,730 INFO [train.py:1198] (1/2) Epoch 15, batch 3050, loss[loss=0.2351, simple_loss=0.2743, pruned_loss=0.0749, ctc_loss=0.1469, cr_loss=0.4198, over 34571.00 frames. ], tot_loss[loss=0.2422, simple_loss=0.2859, pruned_loss=0.07548, ctc_loss=0.1528, cr_loss=0.427, over 6742665.52 frames. ], batch size: 89, lr: 7.98e-03, grad_scale: 32.0 2024-09-17 20:50:47,957 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.136e+02 2.362e+02 2.824e+02 3.621e+02 6.468e+02, threshold=5.649e+02, percent-clipped=0.0 2024-09-17 20:51:07,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=267535.3333333333, ans=0.125 2024-09-17 20:51:19,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=267582.0, ans=0.125 2024-09-17 20:51:27,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.13 vs. limit=12.0 2024-09-17 20:51:33,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=267628.6666666667, ans=0.125 2024-09-17 20:51:33,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2024-09-17 20:51:35,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=267628.6666666667, ans=0.0 2024-09-17 20:51:46,994 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.96 vs. limit=10.0 2024-09-17 20:51:57,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=267675.3333333333, ans=0.07 2024-09-17 20:52:05,509 INFO [train.py:1198] (1/2) Epoch 15, batch 3100, loss[loss=0.2552, simple_loss=0.3002, pruned_loss=0.08012, ctc_loss=0.1606, cr_loss=0.4464, over 34212.00 frames. ], tot_loss[loss=0.2419, simple_loss=0.2855, pruned_loss=0.07544, ctc_loss=0.1525, cr_loss=0.4259, over 6742641.99 frames. ], batch size: 117, lr: 7.98e-03, grad_scale: 32.0 2024-09-17 20:52:18,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=267722.0, ans=0.0 2024-09-17 20:52:23,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=267768.6666666667, ans=0.2 2024-09-17 20:52:38,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.41 vs. limit=15.0 2024-09-17 20:52:39,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2024-09-17 20:52:46,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=267815.3333333333, ans=0.125 2024-09-17 20:52:50,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.98 vs. limit=15.0 2024-09-17 20:53:27,125 INFO [train.py:1198] (1/2) Epoch 15, batch 3150, loss[loss=0.254, simple_loss=0.3033, pruned_loss=0.07797, ctc_loss=0.1567, cr_loss=0.4348, over 33899.00 frames. ], tot_loss[loss=0.2419, simple_loss=0.2854, pruned_loss=0.07541, ctc_loss=0.1525, cr_loss=0.4261, over 6747993.16 frames. ], batch size: 122, lr: 7.98e-03, grad_scale: 32.0 2024-09-17 20:53:30,381 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.919e+02 2.465e+02 2.988e+02 3.828e+02 7.868e+02, threshold=5.975e+02, percent-clipped=6.0 2024-09-17 20:53:38,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=267955.3333333333, ans=0.125 2024-09-17 20:53:59,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=268048.6666666667, ans=0.125 2024-09-17 20:54:10,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2024-09-17 20:54:12,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=268048.6666666667, ans=0.125 2024-09-17 20:54:22,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.62 vs. limit=15.0 2024-09-17 20:54:27,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=268095.3333333333, ans=0.1 2024-09-17 20:54:36,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=268142.0, ans=0.07 2024-09-17 20:54:43,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=268142.0, ans=0.125 2024-09-17 20:54:43,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=268142.0, ans=0.0 2024-09-17 20:54:47,719 INFO [train.py:1198] (1/2) Epoch 15, batch 3200, loss[loss=0.2399, simple_loss=0.2873, pruned_loss=0.07305, ctc_loss=0.1469, cr_loss=0.4239, over 34539.00 frames. ], tot_loss[loss=0.2412, simple_loss=0.2849, pruned_loss=0.07509, ctc_loss=0.1518, cr_loss=0.4252, over 6760502.55 frames. ], batch size: 94, lr: 7.97e-03, grad_scale: 32.0 2024-09-17 20:54:56,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=268188.6666666667, ans=0.025 2024-09-17 20:55:12,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=268235.3333333333, ans=0.0 2024-09-17 20:55:17,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=268235.3333333333, ans=0.2 2024-09-17 20:56:10,181 INFO [train.py:1198] (1/2) Epoch 15, batch 3250, loss[loss=0.2441, simple_loss=0.2875, pruned_loss=0.07646, ctc_loss=0.1533, cr_loss=0.43, over 34688.00 frames. ], tot_loss[loss=0.2415, simple_loss=0.2853, pruned_loss=0.07512, ctc_loss=0.152, cr_loss=0.4257, over 6770197.00 frames. ], batch size: 98, lr: 7.97e-03, grad_scale: 32.0 2024-09-17 20:56:13,484 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.703e+02 3.379e+02 4.201e+02 6.902e+02, threshold=6.757e+02, percent-clipped=5.0 2024-09-17 20:56:47,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=268515.3333333333, ans=0.09899494936611666 2024-09-17 20:57:05,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=268562.0, ans=0.125 2024-09-17 20:57:07,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=268562.0, ans=0.2 2024-09-17 20:57:25,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=268608.6666666667, ans=0.125 2024-09-17 20:57:32,852 INFO [train.py:1198] (1/2) Epoch 15, batch 3300, loss[loss=0.2471, simple_loss=0.2952, pruned_loss=0.0755, ctc_loss=0.1547, cr_loss=0.425, over 33020.00 frames. ], tot_loss[loss=0.2401, simple_loss=0.2839, pruned_loss=0.07463, ctc_loss=0.151, cr_loss=0.4233, over 6767856.73 frames. ], batch size: 130, lr: 7.97e-03, grad_scale: 32.0 2024-09-17 20:57:38,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268655.3333333333, ans=0.1 2024-09-17 20:57:49,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268702.0, ans=0.1 2024-09-17 20:58:21,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=268795.3333333333, ans=0.2 2024-09-17 20:58:28,465 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 20:58:30,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=268795.3333333333, ans=0.125 2024-09-17 20:58:33,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=268795.3333333333, ans=0.125 2024-09-17 20:58:46,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=268842.0, ans=15.0 2024-09-17 20:58:53,814 INFO [train.py:1198] (1/2) Epoch 15, batch 3350, loss[loss=0.2542, simple_loss=0.2971, pruned_loss=0.0806, ctc_loss=0.1623, cr_loss=0.4427, over 33832.00 frames. ], tot_loss[loss=0.2413, simple_loss=0.2848, pruned_loss=0.07516, ctc_loss=0.1519, cr_loss=0.4247, over 6741782.36 frames. ], batch size: 122, lr: 7.96e-03, grad_scale: 32.0 2024-09-17 20:58:57,049 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.536e+02 3.059e+02 3.530e+02 5.951e+02, threshold=6.118e+02, percent-clipped=0.0 2024-09-17 20:59:10,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.62 vs. limit=15.0 2024-09-17 20:59:41,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=12.27 vs. limit=15.0 2024-09-17 21:00:04,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=269075.3333333333, ans=0.1 2024-09-17 21:00:14,321 INFO [train.py:1198] (1/2) Epoch 15, batch 3400, loss[loss=0.219, simple_loss=0.2605, pruned_loss=0.06699, ctc_loss=0.141, cr_loss=0.3843, over 34143.00 frames. ], tot_loss[loss=0.2415, simple_loss=0.2849, pruned_loss=0.07535, ctc_loss=0.1522, cr_loss=0.4245, over 6731617.26 frames. ], batch size: 78, lr: 7.96e-03, grad_scale: 32.0 2024-09-17 21:00:29,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.28 vs. limit=15.0 2024-09-17 21:00:43,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=269168.6666666667, ans=0.09899494936611666 2024-09-17 21:01:12,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.95 vs. limit=6.0 2024-09-17 21:01:18,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.96 vs. limit=6.0 2024-09-17 21:01:32,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=269308.6666666667, ans=0.2 2024-09-17 21:01:36,999 INFO [train.py:1198] (1/2) Epoch 15, batch 3450, loss[loss=0.2474, simple_loss=0.2931, pruned_loss=0.07653, ctc_loss=0.1566, cr_loss=0.4303, over 33082.00 frames. ], tot_loss[loss=0.2418, simple_loss=0.2854, pruned_loss=0.07536, ctc_loss=0.1523, cr_loss=0.4252, over 6742988.71 frames. ], batch size: 130, lr: 7.96e-03, grad_scale: 32.0 2024-09-17 21:01:40,244 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.624e+02 3.183e+02 3.802e+02 6.476e+02, threshold=6.367e+02, percent-clipped=1.0 2024-09-17 21:01:43,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=269355.3333333333, ans=0.0 2024-09-17 21:02:14,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=269448.6666666667, ans=0.0 2024-09-17 21:02:19,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.52 vs. limit=15.0 2024-09-17 21:02:23,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=269495.3333333333, ans=0.0 2024-09-17 21:02:53,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-09-17 21:02:55,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=269588.6666666667, ans=0.1 2024-09-17 21:02:55,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=269588.6666666667, ans=0.125 2024-09-17 21:02:57,146 INFO [train.py:1198] (1/2) Epoch 15, batch 3500, loss[loss=0.2249, simple_loss=0.2745, pruned_loss=0.06594, ctc_loss=0.1364, cr_loss=0.4034, over 34464.00 frames. ], tot_loss[loss=0.2411, simple_loss=0.2847, pruned_loss=0.07509, ctc_loss=0.1518, cr_loss=0.4242, over 6745145.26 frames. ], batch size: 85, lr: 7.95e-03, grad_scale: 32.0 2024-09-17 21:02:59,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=269588.6666666667, ans=0.2 2024-09-17 21:03:01,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2024-09-17 21:03:15,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.25 vs. limit=15.0 2024-09-17 21:03:24,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=269635.3333333333, ans=0.025 2024-09-17 21:04:13,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=269775.3333333333, ans=0.1 2024-09-17 21:04:17,658 INFO [train.py:1198] (1/2) Epoch 15, batch 3550, loss[loss=0.2574, simple_loss=0.3032, pruned_loss=0.08, ctc_loss=0.1656, cr_loss=0.4629, over 34384.00 frames. ], tot_loss[loss=0.2414, simple_loss=0.285, pruned_loss=0.07519, ctc_loss=0.152, cr_loss=0.4247, over 6755440.86 frames. ], batch size: 103, lr: 7.95e-03, grad_scale: 32.0 2024-09-17 21:04:20,873 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.942e+02 2.581e+02 2.936e+02 3.638e+02 6.149e+02, threshold=5.872e+02, percent-clipped=0.0 2024-09-17 21:04:29,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_positive, batch_count=269822.0, ans=0.05 2024-09-17 21:04:43,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=269868.6666666667, ans=0.09899494936611666 2024-09-17 21:04:46,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=269868.6666666667, ans=0.125 2024-09-17 21:04:59,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=269915.3333333333, ans=0.1 2024-09-17 21:05:16,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=269962.0, ans=0.125 2024-09-17 21:05:23,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=270008.6666666667, ans=0.125 2024-09-17 21:05:27,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=270008.6666666667, ans=0.1 2024-09-17 21:05:29,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=270008.6666666667, ans=0.125 2024-09-17 21:05:35,307 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:05:38,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.84 vs. limit=15.0 2024-09-17 21:05:39,689 INFO [train.py:1198] (1/2) Epoch 15, batch 3600, loss[loss=0.2278, simple_loss=0.2806, pruned_loss=0.06621, ctc_loss=0.1345, cr_loss=0.3942, over 34456.00 frames. ], tot_loss[loss=0.242, simple_loss=0.2856, pruned_loss=0.0755, ctc_loss=0.1525, cr_loss=0.4258, over 6765097.10 frames. ], batch size: 90, lr: 7.95e-03, grad_scale: 32.0 2024-09-17 21:06:06,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2024-09-17 21:06:28,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=15.0 2024-09-17 21:06:32,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.66 vs. limit=15.0 2024-09-17 21:06:48,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=270242.0, ans=0.1 2024-09-17 21:06:53,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=270242.0, ans=0.125 2024-09-17 21:06:59,926 INFO [train.py:1198] (1/2) Epoch 15, batch 3650, loss[loss=0.252, simple_loss=0.2995, pruned_loss=0.07762, ctc_loss=0.1588, cr_loss=0.4382, over 34439.00 frames. ], tot_loss[loss=0.2411, simple_loss=0.2848, pruned_loss=0.075, ctc_loss=0.1516, cr_loss=0.4247, over 6768632.25 frames. ], batch size: 110, lr: 7.94e-03, grad_scale: 32.0 2024-09-17 21:07:03,158 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.613e+02 3.520e+02 4.192e+02 8.387e+02, threshold=7.041e+02, percent-clipped=7.0 2024-09-17 21:07:04,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=270288.6666666667, ans=0.125 2024-09-17 21:07:08,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=270288.6666666667, ans=0.125 2024-09-17 21:07:58,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=270428.6666666667, ans=0.0 2024-09-17 21:08:16,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=270475.3333333333, ans=0.125 2024-09-17 21:08:20,689 INFO [train.py:1198] (1/2) Epoch 15, batch 3700, loss[loss=0.2358, simple_loss=0.2807, pruned_loss=0.07251, ctc_loss=0.1459, cr_loss=0.4187, over 34613.00 frames. ], tot_loss[loss=0.2405, simple_loss=0.2847, pruned_loss=0.07461, ctc_loss=0.151, cr_loss=0.4239, over 6784396.46 frames. ], batch size: 102, lr: 7.94e-03, grad_scale: 32.0 2024-09-17 21:08:29,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.72 vs. limit=15.0 2024-09-17 21:09:20,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=270662.0, ans=0.025 2024-09-17 21:09:26,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=270708.6666666667, ans=0.0 2024-09-17 21:09:27,074 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:09:43,044 INFO [train.py:1198] (1/2) Epoch 15, batch 3750, loss[loss=0.2597, simple_loss=0.304, pruned_loss=0.08247, ctc_loss=0.1635, cr_loss=0.4438, over 34375.00 frames. ], tot_loss[loss=0.2439, simple_loss=0.2878, pruned_loss=0.07601, ctc_loss=0.1536, cr_loss=0.4292, over 6786176.45 frames. ], batch size: 113, lr: 7.93e-03, grad_scale: 32.0 2024-09-17 21:09:46,223 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.143e+02 2.401e+02 2.653e+02 3.107e+02 5.880e+02, threshold=5.306e+02, percent-clipped=0.0 2024-09-17 21:09:48,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=270755.3333333333, ans=0.0 2024-09-17 21:09:48,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2024-09-17 21:09:49,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=270755.3333333333, ans=0.125 2024-09-17 21:09:57,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=270802.0, ans=0.0 2024-09-17 21:10:09,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=270802.0, ans=0.09899494936611666 2024-09-17 21:10:22,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=270848.6666666667, ans=0.125 2024-09-17 21:10:25,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=270848.6666666667, ans=0.95 2024-09-17 21:10:45,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=270895.3333333333, ans=0.0 2024-09-17 21:11:04,285 INFO [train.py:1198] (1/2) Epoch 15, batch 3800, loss[loss=0.275, simple_loss=0.3049, pruned_loss=0.09392, ctc_loss=0.1865, cr_loss=0.4981, over 29971.00 frames. ], tot_loss[loss=0.2478, simple_loss=0.291, pruned_loss=0.07794, ctc_loss=0.1572, cr_loss=0.4341, over 6676789.15 frames. ], batch size: 175, lr: 7.93e-03, grad_scale: 32.0 2024-09-17 21:11:10,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.97 vs. limit=22.5 2024-09-17 21:11:21,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=271035.3333333333, ans=0.125 2024-09-17 21:11:38,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=271082.0, ans=0.2 2024-09-17 21:11:53,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=271128.6666666667, ans=0.125 2024-09-17 21:12:27,956 INFO [train.py:1198] (1/2) Epoch 15, batch 3850, loss[loss=0.2847, simple_loss=0.3155, pruned_loss=0.09855, ctc_loss=0.1951, cr_loss=0.4446, over 23189.00 frames. ], tot_loss[loss=0.2537, simple_loss=0.2946, pruned_loss=0.08124, ctc_loss=0.1637, cr_loss=0.4385, over 6249335.28 frames. ], batch size: 244, lr: 7.93e-03, grad_scale: 32.0 2024-09-17 21:12:31,226 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.141e+02 2.501e+02 2.759e+02 2.989e+02 5.188e+02, threshold=5.517e+02, percent-clipped=0.0 2024-09-17 21:12:35,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=271222.0, ans=0.0 2024-09-17 21:12:51,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2024-09-17 21:13:56,085 INFO [train.py:1198] (1/2) Epoch 16, batch 0, loss[loss=0.2221, simple_loss=0.2702, pruned_loss=0.06548, ctc_loss=0.1358, cr_loss=0.3961, over 34480.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2702, pruned_loss=0.06548, ctc_loss=0.1358, cr_loss=0.3961, over 34480.00 frames. ], batch size: 85, lr: 7.67e-03, grad_scale: 32.0 2024-09-17 21:13:56,085 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 21:14:03,452 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.4.self_attn_weights, attn_weights_entropy = tensor([3.3102, 2.4519, 2.7226, 3.1272, 2.1661, 2.7277, 2.8551, 3.0196], device='cuda:1') 2024-09-17 21:14:13,607 INFO [train.py:1230] (1/2) Epoch 16, validation: loss=0.1521, simple_loss=0.2506, pruned_loss=0.02237, ctc_loss=0.04404, cr_loss=1.715e-14, over 944034.00 frames. 2024-09-17 21:14:13,607 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 21:14:23,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=271348.0, ans=0.09899494936611666 2024-09-17 21:14:33,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=271394.6666666667, ans=0.125 2024-09-17 21:15:05,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=271488.0, ans=0.1 2024-09-17 21:15:24,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.75 vs. limit=22.5 2024-09-17 21:15:28,607 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:15:35,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=271534.6666666667, ans=0.125 2024-09-17 21:15:38,115 INFO [train.py:1198] (1/2) Epoch 16, batch 50, loss[loss=0.2179, simple_loss=0.2594, pruned_loss=0.06658, ctc_loss=0.1368, cr_loss=0.3993, over 34479.00 frames. ], tot_loss[loss=0.244, simple_loss=0.2871, pruned_loss=0.07636, ctc_loss=0.1545, cr_loss=0.4304, over 1482086.35 frames. ], batch size: 82, lr: 7.67e-03, grad_scale: 32.0 2024-09-17 21:15:59,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.77 vs. limit=15.0 2024-09-17 21:16:19,559 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.024e+02 2.649e+02 3.115e+02 3.911e+02 9.097e+02, threshold=6.230e+02, percent-clipped=6.0 2024-09-17 21:16:19,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=271674.6666666667, ans=0.2 2024-09-17 21:16:26,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=271721.3333333333, ans=0.125 2024-09-17 21:16:29,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=271721.3333333333, ans=0.1 2024-09-17 21:17:00,674 INFO [train.py:1198] (1/2) Epoch 16, batch 100, loss[loss=0.2317, simple_loss=0.2726, pruned_loss=0.07278, ctc_loss=0.1464, cr_loss=0.3974, over 34594.00 frames. ], tot_loss[loss=0.2443, simple_loss=0.2879, pruned_loss=0.07637, ctc_loss=0.1542, cr_loss=0.4301, over 2630657.99 frames. ], batch size: 89, lr: 7.67e-03, grad_scale: 32.0 2024-09-17 21:17:10,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=271814.6666666667, ans=0.2 2024-09-17 21:17:54,401 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.97 vs. limit=22.5 2024-09-17 21:18:23,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=272048.0, ans=0.125 2024-09-17 21:18:24,257 INFO [train.py:1198] (1/2) Epoch 16, batch 150, loss[loss=0.205, simple_loss=0.2503, pruned_loss=0.05982, ctc_loss=0.1245, cr_loss=0.3792, over 34452.00 frames. ], tot_loss[loss=0.2413, simple_loss=0.2856, pruned_loss=0.07485, ctc_loss=0.1514, cr_loss=0.4255, over 3558313.32 frames. ], batch size: 82, lr: 7.66e-03, grad_scale: 64.0 2024-09-17 21:18:31,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=272048.0, ans=22.5 2024-09-17 21:18:39,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=272094.6666666667, ans=0.04949747468305833 2024-09-17 21:18:41,021 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:18:46,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.31 vs. limit=15.0 2024-09-17 21:18:49,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=272094.6666666667, ans=0.1 2024-09-17 21:18:51,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=272094.6666666667, ans=0.1 2024-09-17 21:18:54,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=272094.6666666667, ans=0.125 2024-09-17 21:19:07,081 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 2.478e+02 3.127e+02 3.762e+02 6.933e+02, threshold=6.254e+02, percent-clipped=1.0 2024-09-17 21:19:09,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=272141.3333333333, ans=0.2 2024-09-17 21:19:09,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.11 vs. limit=22.5 2024-09-17 21:19:15,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=272188.0, ans=0.1 2024-09-17 21:19:25,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=272188.0, ans=0.2 2024-09-17 21:19:29,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=272188.0, ans=0.025 2024-09-17 21:19:30,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=272234.6666666667, ans=0.125 2024-09-17 21:19:48,257 INFO [train.py:1198] (1/2) Epoch 16, batch 200, loss[loss=0.2656, simple_loss=0.3066, pruned_loss=0.08602, ctc_loss=0.1712, cr_loss=0.4583, over 31972.00 frames. ], tot_loss[loss=0.2396, simple_loss=0.2839, pruned_loss=0.07418, ctc_loss=0.15, cr_loss=0.4222, over 4273006.07 frames. ], batch size: 146, lr: 7.66e-03, grad_scale: 64.0 2024-09-17 21:19:51,145 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.94 vs. limit=15.0 2024-09-17 21:20:26,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=272374.6666666667, ans=0.0 2024-09-17 21:20:34,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=272374.6666666667, ans=0.125 2024-09-17 21:21:11,083 INFO [train.py:1198] (1/2) Epoch 16, batch 250, loss[loss=0.2493, simple_loss=0.2948, pruned_loss=0.07745, ctc_loss=0.155, cr_loss=0.4495, over 34214.00 frames. ], tot_loss[loss=0.2389, simple_loss=0.2836, pruned_loss=0.07376, ctc_loss=0.1492, cr_loss=0.4219, over 4834929.59 frames. ], batch size: 117, lr: 7.66e-03, grad_scale: 64.0 2024-09-17 21:21:30,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=272561.3333333333, ans=0.0 2024-09-17 21:21:42,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=272561.3333333333, ans=0.0 2024-09-17 21:21:55,102 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.510e+02 3.122e+02 4.503e+02 6.680e+02, threshold=6.243e+02, percent-clipped=3.0 2024-09-17 21:22:00,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=272608.0, ans=0.0 2024-09-17 21:22:08,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=272654.6666666667, ans=0.125 2024-09-17 21:22:28,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=272701.3333333333, ans=0.0 2024-09-17 21:22:38,406 INFO [train.py:1198] (1/2) Epoch 16, batch 300, loss[loss=0.2425, simple_loss=0.2899, pruned_loss=0.07427, ctc_loss=0.1482, cr_loss=0.4231, over 34356.00 frames. ], tot_loss[loss=0.2394, simple_loss=0.2837, pruned_loss=0.07406, ctc_loss=0.1499, cr_loss=0.4228, over 5263563.14 frames. ], batch size: 107, lr: 7.65e-03, grad_scale: 64.0 2024-09-17 21:24:00,499 INFO [train.py:1198] (1/2) Epoch 16, batch 350, loss[loss=0.2337, simple_loss=0.2762, pruned_loss=0.0725, ctc_loss=0.146, cr_loss=0.424, over 34289.00 frames. ], tot_loss[loss=0.2398, simple_loss=0.2843, pruned_loss=0.07417, ctc_loss=0.1502, cr_loss=0.4235, over 5598530.70 frames. ], batch size: 83, lr: 7.65e-03, grad_scale: 32.0 2024-09-17 21:24:09,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=272981.3333333333, ans=0.025 2024-09-17 21:24:09,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=272981.3333333333, ans=0.125 2024-09-17 21:24:40,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=273074.6666666667, ans=0.0 2024-09-17 21:24:43,287 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.927e+02 2.432e+02 2.728e+02 3.450e+02 6.880e+02, threshold=5.455e+02, percent-clipped=1.0 2024-09-17 21:24:56,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=273121.3333333333, ans=0.2 2024-09-17 21:25:10,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-09-17 21:25:14,416 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=5.92 vs. limit=15.0 2024-09-17 21:25:25,324 INFO [train.py:1198] (1/2) Epoch 16, batch 400, loss[loss=0.2428, simple_loss=0.2875, pruned_loss=0.07509, ctc_loss=0.1534, cr_loss=0.4336, over 34404.00 frames. ], tot_loss[loss=0.2394, simple_loss=0.284, pruned_loss=0.07401, ctc_loss=0.1498, cr_loss=0.4236, over 5864919.01 frames. ], batch size: 95, lr: 7.65e-03, grad_scale: 32.0 2024-09-17 21:25:37,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=273214.6666666667, ans=0.0 2024-09-17 21:25:51,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=273261.3333333333, ans=0.0 2024-09-17 21:25:55,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.01 vs. limit=15.0 2024-09-17 21:25:59,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.49 vs. limit=15.0 2024-09-17 21:26:02,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=273308.0, ans=0.125 2024-09-17 21:26:11,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.24 vs. limit=15.0 2024-09-17 21:26:31,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=273354.6666666667, ans=0.2 2024-09-17 21:26:34,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=273401.3333333333, ans=0.125 2024-09-17 21:26:49,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=273448.0, ans=0.5 2024-09-17 21:26:50,686 INFO [train.py:1198] (1/2) Epoch 16, batch 450, loss[loss=0.2386, simple_loss=0.2888, pruned_loss=0.07128, ctc_loss=0.1456, cr_loss=0.4197, over 34698.00 frames. ], tot_loss[loss=0.24, simple_loss=0.2845, pruned_loss=0.0742, ctc_loss=0.1503, cr_loss=0.4241, over 6055725.55 frames. ], batch size: 97, lr: 7.64e-03, grad_scale: 16.0 2024-09-17 21:26:59,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=273448.0, ans=0.0 2024-09-17 21:27:11,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=273494.6666666667, ans=0.025 2024-09-17 21:27:27,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=273541.3333333333, ans=0.0 2024-09-17 21:27:35,770 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.069e+02 2.430e+02 2.737e+02 3.576e+02 6.205e+02, threshold=5.473e+02, percent-clipped=5.0 2024-09-17 21:27:52,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=273588.0, ans=0.0 2024-09-17 21:28:13,916 INFO [train.py:1198] (1/2) Epoch 16, batch 500, loss[loss=0.2529, simple_loss=0.2973, pruned_loss=0.07931, ctc_loss=0.1594, cr_loss=0.4495, over 34406.00 frames. ], tot_loss[loss=0.2391, simple_loss=0.2835, pruned_loss=0.0739, ctc_loss=0.1496, cr_loss=0.4228, over 6222587.58 frames. ], batch size: 110, lr: 7.64e-03, grad_scale: 16.0 2024-09-17 21:28:15,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=273681.3333333333, ans=0.125 2024-09-17 21:28:15,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=273681.3333333333, ans=0.125 2024-09-17 21:28:32,729 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.14 vs. limit=10.0 2024-09-17 21:28:37,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2024-09-17 21:28:58,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=273774.6666666667, ans=0.1 2024-09-17 21:29:00,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=273774.6666666667, ans=0.125 2024-09-17 21:29:02,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=273821.3333333333, ans=0.125 2024-09-17 21:29:14,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=12.0 2024-09-17 21:29:24,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.21 vs. limit=22.5 2024-09-17 21:29:27,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=273868.0, ans=0.2 2024-09-17 21:29:31,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2024-09-17 21:29:38,823 INFO [train.py:1198] (1/2) Epoch 16, batch 550, loss[loss=0.2493, simple_loss=0.295, pruned_loss=0.0775, ctc_loss=0.159, cr_loss=0.4227, over 33860.00 frames. ], tot_loss[loss=0.2395, simple_loss=0.2838, pruned_loss=0.07416, ctc_loss=0.1501, cr_loss=0.4234, over 6332392.72 frames. ], batch size: 122, lr: 7.64e-03, grad_scale: 16.0 2024-09-17 21:29:58,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=273961.3333333333, ans=0.0 2024-09-17 21:30:15,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=274008.0, ans=0.0 2024-09-17 21:30:20,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=274008.0, ans=0.125 2024-09-17 21:30:25,134 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.418e+02 2.778e+02 3.727e+02 7.141e+02, threshold=5.556e+02, percent-clipped=4.0 2024-09-17 21:31:03,196 INFO [train.py:1198] (1/2) Epoch 16, batch 600, loss[loss=0.2599, simple_loss=0.304, pruned_loss=0.08227, ctc_loss=0.1659, cr_loss=0.4502, over 34261.00 frames. ], tot_loss[loss=0.2395, simple_loss=0.284, pruned_loss=0.07402, ctc_loss=0.15, cr_loss=0.4232, over 6431946.98 frames. ], batch size: 117, lr: 7.63e-03, grad_scale: 16.0 2024-09-17 21:31:05,584 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.31 vs. limit=10.0 2024-09-17 21:31:23,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=274194.6666666667, ans=0.125 2024-09-17 21:31:44,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=274241.3333333333, ans=0.125 2024-09-17 21:31:46,523 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.85 vs. limit=22.5 2024-09-17 21:31:53,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=274288.0, ans=0.2 2024-09-17 21:31:55,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=274288.0, ans=0.0 2024-09-17 21:32:20,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=274334.6666666667, ans=0.0 2024-09-17 21:32:24,955 INFO [train.py:1198] (1/2) Epoch 16, batch 650, loss[loss=0.257, simple_loss=0.2977, pruned_loss=0.08253, ctc_loss=0.1649, cr_loss=0.457, over 34521.00 frames. ], tot_loss[loss=0.2389, simple_loss=0.2836, pruned_loss=0.07366, ctc_loss=0.1496, cr_loss=0.4225, over 6523647.58 frames. ], batch size: 94, lr: 7.63e-03, grad_scale: 16.0 2024-09-17 21:32:27,921 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2024-09-17 21:32:33,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=274381.3333333333, ans=0.2 2024-09-17 21:33:05,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=274474.6666666667, ans=0.1 2024-09-17 21:33:11,979 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.569e+02 3.085e+02 4.303e+02 8.109e+02, threshold=6.170e+02, percent-clipped=11.0 2024-09-17 21:33:14,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=274474.6666666667, ans=0.0 2024-09-17 21:33:49,817 INFO [train.py:1198] (1/2) Epoch 16, batch 700, loss[loss=0.2231, simple_loss=0.2668, pruned_loss=0.06829, ctc_loss=0.136, cr_loss=0.392, over 34580.00 frames. ], tot_loss[loss=0.2392, simple_loss=0.2839, pruned_loss=0.07383, ctc_loss=0.1498, cr_loss=0.4231, over 6580356.25 frames. ], batch size: 89, lr: 7.63e-03, grad_scale: 16.0 2024-09-17 21:34:01,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=274614.6666666667, ans=0.0 2024-09-17 21:34:22,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=15.0 2024-09-17 21:34:26,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=274708.0, ans=0.125 2024-09-17 21:34:31,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=274708.0, ans=0.0 2024-09-17 21:34:39,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=274754.6666666667, ans=0.0 2024-09-17 21:34:58,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=274801.3333333333, ans=0.1 2024-09-17 21:35:12,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=274848.0, ans=0.125 2024-09-17 21:35:14,097 INFO [train.py:1198] (1/2) Epoch 16, batch 750, loss[loss=0.2515, simple_loss=0.2966, pruned_loss=0.07833, ctc_loss=0.1586, cr_loss=0.453, over 34394.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.2833, pruned_loss=0.07357, ctc_loss=0.1492, cr_loss=0.4224, over 6624501.16 frames. ], batch size: 95, lr: 7.62e-03, grad_scale: 16.0 2024-09-17 21:35:34,209 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:35:58,632 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.866e+02 2.506e+02 2.789e+02 3.866e+02 7.156e+02, threshold=5.578e+02, percent-clipped=4.0 2024-09-17 21:35:59,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.93 vs. limit=15.0 2024-09-17 21:36:01,042 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2024-09-17 21:36:01,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=15.0 2024-09-17 21:36:04,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=274988.0, ans=0.05 2024-09-17 21:36:19,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=275034.6666666667, ans=0.125 2024-09-17 21:36:23,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=275034.6666666667, ans=0.125 2024-09-17 21:36:30,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=275034.6666666667, ans=0.125 2024-09-17 21:36:33,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=275034.6666666667, ans=0.125 2024-09-17 21:36:36,725 INFO [train.py:1198] (1/2) Epoch 16, batch 800, loss[loss=0.2059, simple_loss=0.2513, pruned_loss=0.06049, ctc_loss=0.1233, cr_loss=0.3742, over 34487.00 frames. ], tot_loss[loss=0.2385, simple_loss=0.2832, pruned_loss=0.07358, ctc_loss=0.1491, cr_loss=0.4223, over 6660138.80 frames. ], batch size: 85, lr: 7.62e-03, grad_scale: 32.0 2024-09-17 21:37:20,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=275174.6666666667, ans=0.0 2024-09-17 21:37:25,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=275174.6666666667, ans=0.0 2024-09-17 21:37:55,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=275268.0, ans=0.0 2024-09-17 21:38:02,699 INFO [train.py:1198] (1/2) Epoch 16, batch 850, loss[loss=0.2383, simple_loss=0.287, pruned_loss=0.07189, ctc_loss=0.1468, cr_loss=0.4099, over 34408.00 frames. ], tot_loss[loss=0.2385, simple_loss=0.2833, pruned_loss=0.07355, ctc_loss=0.1491, cr_loss=0.4224, over 6692899.28 frames. ], batch size: 103, lr: 7.62e-03, grad_scale: 32.0 2024-09-17 21:38:13,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=275314.6666666667, ans=0.125 2024-09-17 21:38:13,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=275314.6666666667, ans=0.2 2024-09-17 21:38:42,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.00 vs. limit=22.5 2024-09-17 21:38:47,772 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.393e+02 2.939e+02 3.668e+02 6.166e+02, threshold=5.878e+02, percent-clipped=2.0 2024-09-17 21:39:04,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=275454.6666666667, ans=0.125 2024-09-17 21:39:06,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=275454.6666666667, ans=0.025 2024-09-17 21:39:11,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=275501.3333333333, ans=0.2 2024-09-17 21:39:26,195 INFO [train.py:1198] (1/2) Epoch 16, batch 900, loss[loss=0.2215, simple_loss=0.2665, pruned_loss=0.06644, ctc_loss=0.1377, cr_loss=0.3983, over 34517.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2833, pruned_loss=0.0737, ctc_loss=0.1494, cr_loss=0.4223, over 6698684.40 frames. ], batch size: 85, lr: 7.61e-03, grad_scale: 32.0 2024-09-17 21:39:44,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=275594.6666666667, ans=0.0 2024-09-17 21:39:59,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=275641.3333333333, ans=0.025 2024-09-17 21:40:14,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=275688.0, ans=0.125 2024-09-17 21:40:22,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=275688.0, ans=0.025 2024-09-17 21:40:22,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=275688.0, ans=0.1 2024-09-17 21:40:29,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.16 vs. limit=22.5 2024-09-17 21:40:46,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=275781.3333333333, ans=0.125 2024-09-17 21:40:50,435 INFO [train.py:1198] (1/2) Epoch 16, batch 950, loss[loss=0.227, simple_loss=0.2716, pruned_loss=0.06846, ctc_loss=0.1427, cr_loss=0.422, over 34709.00 frames. ], tot_loss[loss=0.2388, simple_loss=0.2834, pruned_loss=0.07368, ctc_loss=0.1494, cr_loss=0.4225, over 6701904.78 frames. ], batch size: 87, lr: 7.61e-03, grad_scale: 32.0 2024-09-17 21:40:57,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=275781.3333333333, ans=0.125 2024-09-17 21:41:24,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=275874.6666666667, ans=0.025 2024-09-17 21:41:38,534 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.485e+02 3.045e+02 3.620e+02 8.045e+02, threshold=6.090e+02, percent-clipped=2.0 2024-09-17 21:41:40,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=275921.3333333333, ans=0.1 2024-09-17 21:41:49,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=275921.3333333333, ans=0.0 2024-09-17 21:42:14,762 INFO [train.py:1198] (1/2) Epoch 16, batch 1000, loss[loss=0.2387, simple_loss=0.2797, pruned_loss=0.0756, ctc_loss=0.1481, cr_loss=0.4217, over 34506.00 frames. ], tot_loss[loss=0.2395, simple_loss=0.284, pruned_loss=0.07404, ctc_loss=0.15, cr_loss=0.4236, over 6694537.12 frames. ], batch size: 90, lr: 7.61e-03, grad_scale: 16.0 2024-09-17 21:42:26,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=276014.6666666667, ans=0.0 2024-09-17 21:43:37,590 INFO [train.py:1198] (1/2) Epoch 16, batch 1050, loss[loss=0.2479, simple_loss=0.2935, pruned_loss=0.07675, ctc_loss=0.1549, cr_loss=0.4435, over 34575.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2831, pruned_loss=0.07371, ctc_loss=0.1494, cr_loss=0.4223, over 6702956.72 frames. ], batch size: 99, lr: 7.60e-03, grad_scale: 16.0 2024-09-17 21:43:58,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2024-09-17 21:44:04,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=276294.6666666667, ans=0.5 2024-09-17 21:44:24,039 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.348e+02 2.534e+02 3.099e+02 1.280e+03, threshold=5.068e+02, percent-clipped=1.0 2024-09-17 21:44:29,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=276388.0, ans=0.125 2024-09-17 21:45:04,049 INFO [train.py:1198] (1/2) Epoch 16, batch 1100, loss[loss=0.2331, simple_loss=0.2777, pruned_loss=0.07159, ctc_loss=0.1422, cr_loss=0.4213, over 34354.00 frames. ], tot_loss[loss=0.2388, simple_loss=0.2832, pruned_loss=0.07379, ctc_loss=0.1495, cr_loss=0.4225, over 6715751.37 frames. ], batch size: 91, lr: 7.60e-03, grad_scale: 16.0 2024-09-17 21:45:27,500 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:45:52,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=276621.3333333333, ans=0.0 2024-09-17 21:46:26,532 INFO [train.py:1198] (1/2) Epoch 16, batch 1150, loss[loss=0.2363, simple_loss=0.2784, pruned_loss=0.07391, ctc_loss=0.1474, cr_loss=0.4208, over 34356.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2832, pruned_loss=0.07377, ctc_loss=0.1493, cr_loss=0.4217, over 6713426.97 frames. ], batch size: 91, lr: 7.60e-03, grad_scale: 16.0 2024-09-17 21:46:41,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=276761.3333333333, ans=0.015 2024-09-17 21:47:01,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=276808.0, ans=0.2 2024-09-17 21:47:10,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=276808.0, ans=0.2 2024-09-17 21:47:11,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=276808.0, ans=0.125 2024-09-17 21:47:13,311 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.406e+02 3.011e+02 3.912e+02 7.868e+02, threshold=6.021e+02, percent-clipped=9.0 2024-09-17 21:47:15,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=276854.6666666667, ans=0.2 2024-09-17 21:47:20,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=276854.6666666667, ans=0.125 2024-09-17 21:47:43,661 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.02 vs. limit=6.0 2024-09-17 21:47:48,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=276948.0, ans=0.125 2024-09-17 21:47:49,422 INFO [train.py:1198] (1/2) Epoch 16, batch 1200, loss[loss=0.2491, simple_loss=0.2966, pruned_loss=0.07683, ctc_loss=0.1527, cr_loss=0.4366, over 34551.00 frames. ], tot_loss[loss=0.2397, simple_loss=0.2843, pruned_loss=0.07411, ctc_loss=0.15, cr_loss=0.423, over 6705948.69 frames. ], batch size: 99, lr: 7.60e-03, grad_scale: 32.0 2024-09-17 21:48:14,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=276994.6666666667, ans=0.125 2024-09-17 21:48:18,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=22.5 2024-09-17 21:48:23,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=277041.3333333333, ans=0.0 2024-09-17 21:48:37,202 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:49:16,549 INFO [train.py:1198] (1/2) Epoch 16, batch 1250, loss[loss=0.2604, simple_loss=0.3044, pruned_loss=0.08257, ctc_loss=0.162, cr_loss=0.4721, over 34342.00 frames. ], tot_loss[loss=0.2399, simple_loss=0.2846, pruned_loss=0.07413, ctc_loss=0.1502, cr_loss=0.4244, over 6739127.75 frames. ], batch size: 107, lr: 7.59e-03, grad_scale: 32.0 2024-09-17 21:49:16,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=277181.3333333333, ans=0.0 2024-09-17 21:49:34,359 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.11 vs. limit=6.0 2024-09-17 21:49:40,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=277228.0, ans=0.025 2024-09-17 21:49:58,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=277274.6666666667, ans=0.125 2024-09-17 21:50:04,371 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.510e+02 2.880e+02 3.295e+02 6.595e+02, threshold=5.760e+02, percent-clipped=2.0 2024-09-17 21:50:17,736 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.89 vs. limit=15.0 2024-09-17 21:50:18,939 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:50:21,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.53 vs. limit=15.0 2024-09-17 21:50:26,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=277368.0, ans=0.125 2024-09-17 21:50:28,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=277368.0, ans=0.05 2024-09-17 21:50:36,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=277368.0, ans=0.0 2024-09-17 21:50:39,481 INFO [train.py:1198] (1/2) Epoch 16, batch 1300, loss[loss=0.2492, simple_loss=0.2957, pruned_loss=0.07731, ctc_loss=0.1547, cr_loss=0.4295, over 33281.00 frames. ], tot_loss[loss=0.239, simple_loss=0.2836, pruned_loss=0.07373, ctc_loss=0.1495, cr_loss=0.4231, over 6744756.78 frames. ], batch size: 130, lr: 7.59e-03, grad_scale: 16.0 2024-09-17 21:50:53,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=277414.6666666667, ans=0.125 2024-09-17 21:50:57,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=277461.3333333333, ans=0.125 2024-09-17 21:51:04,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=277461.3333333333, ans=0.2 2024-09-17 21:51:11,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277508.0, ans=0.1 2024-09-17 21:51:29,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=277554.6666666667, ans=0.2 2024-09-17 21:51:52,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=277601.3333333333, ans=0.125 2024-09-17 21:52:02,007 INFO [train.py:1198] (1/2) Epoch 16, batch 1350, loss[loss=0.2402, simple_loss=0.2841, pruned_loss=0.07493, ctc_loss=0.15, cr_loss=0.4121, over 34548.00 frames. ], tot_loss[loss=0.2382, simple_loss=0.2829, pruned_loss=0.07336, ctc_loss=0.1488, cr_loss=0.4218, over 6765889.89 frames. ], batch size: 94, lr: 7.59e-03, grad_scale: 16.0 2024-09-17 21:52:02,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277648.0, ans=0.1 2024-09-17 21:52:06,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=11.19 vs. limit=15.0 2024-09-17 21:52:24,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=277694.6666666667, ans=0.025 2024-09-17 21:52:30,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=277694.6666666667, ans=0.0 2024-09-17 21:52:40,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=277741.3333333333, ans=0.035 2024-09-17 21:52:42,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=277741.3333333333, ans=0.0 2024-09-17 21:52:53,177 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.018e+02 2.405e+02 2.734e+02 3.471e+02 5.712e+02, threshold=5.467e+02, percent-clipped=0.0 2024-09-17 21:53:27,878 INFO [train.py:1198] (1/2) Epoch 16, batch 1400, loss[loss=0.2126, simple_loss=0.2593, pruned_loss=0.06254, ctc_loss=0.1281, cr_loss=0.3795, over 34275.00 frames. ], tot_loss[loss=0.2381, simple_loss=0.283, pruned_loss=0.0733, ctc_loss=0.1488, cr_loss=0.4217, over 6777780.94 frames. ], batch size: 80, lr: 7.58e-03, grad_scale: 16.0 2024-09-17 21:53:33,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=277881.3333333333, ans=0.0 2024-09-17 21:53:39,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=277881.3333333333, ans=0.0 2024-09-17 21:53:44,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=277928.0, ans=0.0 2024-09-17 21:54:17,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=278021.3333333333, ans=0.07 2024-09-17 21:54:50,690 INFO [train.py:1198] (1/2) Epoch 16, batch 1450, loss[loss=0.2597, simple_loss=0.3045, pruned_loss=0.08191, ctc_loss=0.1653, cr_loss=0.4493, over 34454.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2837, pruned_loss=0.07352, ctc_loss=0.1491, cr_loss=0.4227, over 6773673.23 frames. ], batch size: 110, lr: 7.58e-03, grad_scale: 16.0 2024-09-17 21:54:52,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=278114.6666666667, ans=0.125 2024-09-17 21:54:52,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=278114.6666666667, ans=0.125 2024-09-17 21:54:59,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=278114.6666666667, ans=0.125 2024-09-17 21:55:38,900 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.399e+02 2.680e+02 3.553e+02 6.874e+02, threshold=5.360e+02, percent-clipped=3.0 2024-09-17 21:55:44,238 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 21:56:15,596 INFO [train.py:1198] (1/2) Epoch 16, batch 1500, loss[loss=0.2584, simple_loss=0.3007, pruned_loss=0.08229, ctc_loss=0.1645, cr_loss=0.4626, over 34450.00 frames. ], tot_loss[loss=0.2391, simple_loss=0.2841, pruned_loss=0.07363, ctc_loss=0.1494, cr_loss=0.4235, over 6773333.10 frames. ], batch size: 100, lr: 7.58e-03, grad_scale: 16.0 2024-09-17 21:56:37,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=278394.6666666667, ans=0.015 2024-09-17 21:56:53,555 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.41 vs. limit=15.0 2024-09-17 21:57:04,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=278441.3333333333, ans=0.125 2024-09-17 21:57:10,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.85 vs. limit=15.0 2024-09-17 21:57:26,330 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.16 vs. limit=15.0 2024-09-17 21:57:32,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=278534.6666666667, ans=0.0 2024-09-17 21:57:40,002 INFO [train.py:1198] (1/2) Epoch 16, batch 1550, loss[loss=0.246, simple_loss=0.291, pruned_loss=0.07692, ctc_loss=0.1533, cr_loss=0.4099, over 34392.00 frames. ], tot_loss[loss=0.2398, simple_loss=0.2844, pruned_loss=0.07408, ctc_loss=0.1501, cr_loss=0.4239, over 6744611.06 frames. ], batch size: 105, lr: 7.57e-03, grad_scale: 16.0 2024-09-17 21:57:53,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=278581.3333333333, ans=0.04949747468305833 2024-09-17 21:57:56,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=278628.0, ans=0.1 2024-09-17 21:58:11,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2024-09-17 21:58:27,182 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.458e+02 3.051e+02 4.165e+02 7.892e+02, threshold=6.102e+02, percent-clipped=8.0 2024-09-17 21:58:32,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=278721.3333333333, ans=0.1 2024-09-17 21:58:39,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=278721.3333333333, ans=0.125 2024-09-17 21:58:51,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=278768.0, ans=0.05 2024-09-17 21:58:59,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.64 vs. limit=12.0 2024-09-17 21:58:59,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.68 vs. limit=22.5 2024-09-17 21:59:02,236 INFO [train.py:1198] (1/2) Epoch 16, batch 1600, loss[loss=0.2453, simple_loss=0.2892, pruned_loss=0.07676, ctc_loss=0.1522, cr_loss=0.4398, over 34586.00 frames. ], tot_loss[loss=0.2399, simple_loss=0.2842, pruned_loss=0.07421, ctc_loss=0.1504, cr_loss=0.4249, over 6725126.12 frames. ], batch size: 99, lr: 7.57e-03, grad_scale: 32.0 2024-09-17 21:59:15,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=278814.6666666667, ans=0.125 2024-09-17 21:59:22,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=278861.3333333333, ans=0.025 2024-09-17 21:59:25,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=278861.3333333333, ans=0.0 2024-09-17 21:59:37,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=278908.0, ans=0.125 2024-09-17 21:59:42,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=278908.0, ans=0.125 2024-09-17 21:59:56,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=278954.6666666667, ans=0.0 2024-09-17 21:59:56,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.53 vs. limit=22.5 2024-09-17 22:00:28,139 INFO [train.py:1198] (1/2) Epoch 16, batch 1650, loss[loss=0.2521, simple_loss=0.3024, pruned_loss=0.07649, ctc_loss=0.1552, cr_loss=0.4444, over 34389.00 frames. ], tot_loss[loss=0.2394, simple_loss=0.2839, pruned_loss=0.07399, ctc_loss=0.15, cr_loss=0.4239, over 6716899.35 frames. ], batch size: 103, lr: 7.57e-03, grad_scale: 32.0 2024-09-17 22:00:35,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.53 vs. limit=15.0 2024-09-17 22:00:48,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=279094.6666666667, ans=0.07 2024-09-17 22:00:50,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=279094.6666666667, ans=0.125 2024-09-17 22:01:00,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.46 vs. limit=15.0 2024-09-17 22:01:15,997 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.468e+02 3.182e+02 3.939e+02 6.895e+02, threshold=6.365e+02, percent-clipped=2.0 2024-09-17 22:01:44,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=279234.6666666667, ans=0.125 2024-09-17 22:01:50,628 INFO [train.py:1198] (1/2) Epoch 16, batch 1700, loss[loss=0.2033, simple_loss=0.2466, pruned_loss=0.06061, ctc_loss=0.1199, cr_loss=0.3693, over 34284.00 frames. ], tot_loss[loss=0.2391, simple_loss=0.2838, pruned_loss=0.07377, ctc_loss=0.1495, cr_loss=0.4227, over 6742345.55 frames. ], batch size: 80, lr: 7.56e-03, grad_scale: 32.0 2024-09-17 22:01:52,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=279281.3333333333, ans=0.125 2024-09-17 22:02:09,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.80 vs. limit=22.5 2024-09-17 22:02:44,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=279421.3333333333, ans=0.125 2024-09-17 22:02:59,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279468.0, ans=0.1 2024-09-17 22:03:13,701 INFO [train.py:1198] (1/2) Epoch 16, batch 1750, loss[loss=0.2069, simple_loss=0.2533, pruned_loss=0.06012, ctc_loss=0.1252, cr_loss=0.381, over 34162.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2834, pruned_loss=0.07362, ctc_loss=0.1492, cr_loss=0.4228, over 6751295.32 frames. ], batch size: 78, lr: 7.56e-03, grad_scale: 32.0 2024-09-17 22:03:17,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=279514.6666666667, ans=0.1 2024-09-17 22:03:25,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=279514.6666666667, ans=0.125 2024-09-17 22:04:05,369 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.884e+02 2.537e+02 3.160e+02 4.329e+02 6.407e+02, threshold=6.320e+02, percent-clipped=1.0 2024-09-17 22:04:27,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=279701.3333333333, ans=0.125 2024-09-17 22:04:39,914 INFO [train.py:1198] (1/2) Epoch 16, batch 1800, loss[loss=0.2463, simple_loss=0.2922, pruned_loss=0.07621, ctc_loss=0.1532, cr_loss=0.4343, over 34687.00 frames. ], tot_loss[loss=0.2388, simple_loss=0.2836, pruned_loss=0.07358, ctc_loss=0.1491, cr_loss=0.4228, over 6754342.24 frames. ], batch size: 97, lr: 7.56e-03, grad_scale: 32.0 2024-09-17 22:04:42,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=279748.0, ans=0.0 2024-09-17 22:05:25,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.91 vs. limit=15.0 2024-09-17 22:05:31,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=279888.0, ans=0.2 2024-09-17 22:05:33,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.92 vs. limit=10.0 2024-09-17 22:05:34,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=279888.0, ans=0.0 2024-09-17 22:05:40,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.75 vs. limit=15.0 2024-09-17 22:05:54,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=279934.6666666667, ans=0.1 2024-09-17 22:06:02,423 INFO [train.py:1198] (1/2) Epoch 16, batch 1850, loss[loss=0.2445, simple_loss=0.2925, pruned_loss=0.0744, ctc_loss=0.1521, cr_loss=0.4317, over 34451.00 frames. ], tot_loss[loss=0.2382, simple_loss=0.2832, pruned_loss=0.07328, ctc_loss=0.1487, cr_loss=0.4225, over 6762250.14 frames. ], batch size: 100, lr: 7.55e-03, grad_scale: 32.0 2024-09-17 22:06:35,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=280028.0, ans=0.125 2024-09-17 22:06:37,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=280028.0, ans=0.125 2024-09-17 22:06:39,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.47 vs. limit=15.0 2024-09-17 22:06:56,674 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.154e+02 2.778e+02 3.361e+02 5.070e+02 7.291e+02, threshold=6.723e+02, percent-clipped=10.0 2024-09-17 22:06:57,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=280121.3333333333, ans=0.0 2024-09-17 22:07:01,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=280121.3333333333, ans=0.04949747468305833 2024-09-17 22:07:06,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=280121.3333333333, ans=0.125 2024-09-17 22:07:20,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.71 vs. limit=15.0 2024-09-17 22:07:32,608 INFO [train.py:1198] (1/2) Epoch 16, batch 1900, loss[loss=0.2573, simple_loss=0.3053, pruned_loss=0.07943, ctc_loss=0.1623, cr_loss=0.448, over 34381.00 frames. ], tot_loss[loss=0.2389, simple_loss=0.2839, pruned_loss=0.0735, ctc_loss=0.1491, cr_loss=0.4239, over 6771695.42 frames. ], batch size: 103, lr: 7.55e-03, grad_scale: 32.0 2024-09-17 22:08:16,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=280308.0, ans=0.0 2024-09-17 22:08:32,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=280354.6666666667, ans=0.2 2024-09-17 22:08:56,976 INFO [train.py:1198] (1/2) Epoch 16, batch 1950, loss[loss=0.2429, simple_loss=0.286, pruned_loss=0.07655, ctc_loss=0.1509, cr_loss=0.4125, over 34370.00 frames. ], tot_loss[loss=0.2401, simple_loss=0.2852, pruned_loss=0.07401, ctc_loss=0.1501, cr_loss=0.4261, over 6788451.40 frames. ], batch size: 91, lr: 7.55e-03, grad_scale: 32.0 2024-09-17 22:09:13,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=280494.6666666667, ans=0.035 2024-09-17 22:09:14,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.57 vs. limit=22.5 2024-09-17 22:09:15,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=280494.6666666667, ans=0.0 2024-09-17 22:09:46,540 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.479e+02 2.945e+02 3.890e+02 6.410e+02, threshold=5.889e+02, percent-clipped=0.0 2024-09-17 22:09:52,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=12.0 2024-09-17 22:10:16,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=280634.6666666667, ans=10.0 2024-09-17 22:10:19,434 INFO [train.py:1198] (1/2) Epoch 16, batch 2000, loss[loss=0.2165, simple_loss=0.2582, pruned_loss=0.06606, ctc_loss=0.1356, cr_loss=0.391, over 34167.00 frames. ], tot_loss[loss=0.2409, simple_loss=0.2858, pruned_loss=0.07434, ctc_loss=0.1508, cr_loss=0.4269, over 6765385.41 frames. ], batch size: 78, lr: 7.55e-03, grad_scale: 32.0 2024-09-17 22:10:44,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=280728.0, ans=0.125 2024-09-17 22:10:46,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=280728.0, ans=0.1 2024-09-17 22:11:00,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=280774.6666666667, ans=0.125 2024-09-17 22:11:04,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=280774.6666666667, ans=0.1 2024-09-17 22:11:45,943 INFO [train.py:1198] (1/2) Epoch 16, batch 2050, loss[loss=0.2199, simple_loss=0.2648, pruned_loss=0.06617, ctc_loss=0.1355, cr_loss=0.3912, over 34493.00 frames. ], tot_loss[loss=0.2398, simple_loss=0.2848, pruned_loss=0.07394, ctc_loss=0.1501, cr_loss=0.4249, over 6754164.05 frames. ], batch size: 82, lr: 7.54e-03, grad_scale: 32.0 2024-09-17 22:12:00,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-09-17 22:12:23,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=281008.0, ans=0.125 2024-09-17 22:12:26,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=281008.0, ans=0.125 2024-09-17 22:12:32,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.79 vs. limit=15.0 2024-09-17 22:12:32,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=281008.0, ans=0.1 2024-09-17 22:12:35,851 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.475e+02 2.834e+02 3.968e+02 8.355e+02, threshold=5.669e+02, percent-clipped=3.0 2024-09-17 22:12:36,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=281054.6666666667, ans=0.0 2024-09-17 22:12:51,871 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=15.0 2024-09-17 22:12:52,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=281101.3333333333, ans=0.2 2024-09-17 22:13:08,611 INFO [train.py:1198] (1/2) Epoch 16, batch 2100, loss[loss=0.2313, simple_loss=0.2784, pruned_loss=0.06992, ctc_loss=0.1423, cr_loss=0.3974, over 34534.00 frames. ], tot_loss[loss=0.2388, simple_loss=0.2839, pruned_loss=0.07345, ctc_loss=0.1491, cr_loss=0.4231, over 6768764.02 frames. ], batch size: 94, lr: 7.54e-03, grad_scale: 32.0 2024-09-17 22:13:12,241 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:13:21,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=281148.0, ans=0.025 2024-09-17 22:13:25,231 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:13:46,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=281241.3333333333, ans=0.125 2024-09-17 22:13:48,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=281241.3333333333, ans=0.07 2024-09-17 22:13:57,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=281288.0, ans=0.125 2024-09-17 22:14:30,037 INFO [train.py:1198] (1/2) Epoch 16, batch 2150, loss[loss=0.2212, simple_loss=0.2619, pruned_loss=0.06849, ctc_loss=0.1371, cr_loss=0.4022, over 34362.00 frames. ], tot_loss[loss=0.2376, simple_loss=0.2828, pruned_loss=0.07299, ctc_loss=0.1483, cr_loss=0.4209, over 6788236.75 frames. ], batch size: 91, lr: 7.54e-03, grad_scale: 32.0 2024-09-17 22:14:33,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=281381.3333333333, ans=0.0 2024-09-17 22:14:47,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.73 vs. limit=22.5 2024-09-17 22:15:00,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=281428.0, ans=0.125 2024-09-17 22:15:05,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=281474.6666666667, ans=0.0 2024-09-17 22:15:23,529 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.514e+02 3.031e+02 4.069e+02 8.162e+02, threshold=6.062e+02, percent-clipped=4.0 2024-09-17 22:15:30,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=281521.3333333333, ans=0.125 2024-09-17 22:15:33,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=281521.3333333333, ans=0.125 2024-09-17 22:15:56,109 INFO [train.py:1198] (1/2) Epoch 16, batch 2200, loss[loss=0.2502, simple_loss=0.2944, pruned_loss=0.0782, ctc_loss=0.1607, cr_loss=0.4393, over 34437.00 frames. ], tot_loss[loss=0.2376, simple_loss=0.2826, pruned_loss=0.07306, ctc_loss=0.1483, cr_loss=0.4209, over 6784994.73 frames. ], batch size: 100, lr: 7.53e-03, grad_scale: 32.0 2024-09-17 22:15:56,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=281614.6666666667, ans=0.07 2024-09-17 22:16:03,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=281614.6666666667, ans=0.0 2024-09-17 22:16:11,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=281661.3333333333, ans=0.1 2024-09-17 22:16:44,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=281754.6666666667, ans=0.5 2024-09-17 22:17:08,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2024-09-17 22:17:09,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=281801.3333333333, ans=0.125 2024-09-17 22:17:18,952 INFO [train.py:1198] (1/2) Epoch 16, batch 2250, loss[loss=0.226, simple_loss=0.2791, pruned_loss=0.06523, ctc_loss=0.1352, cr_loss=0.3858, over 34407.00 frames. ], tot_loss[loss=0.2375, simple_loss=0.2825, pruned_loss=0.07301, ctc_loss=0.1482, cr_loss=0.4207, over 6781453.90 frames. ], batch size: 95, lr: 7.53e-03, grad_scale: 16.0 2024-09-17 22:17:21,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=281848.0, ans=0.125 2024-09-17 22:17:31,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=281848.0, ans=0.04949747468305833 2024-09-17 22:17:36,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=281894.6666666667, ans=0.07 2024-09-17 22:17:45,265 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.75 vs. limit=15.0 2024-09-17 22:18:02,602 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:18:10,606 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.743e+02 3.771e+02 4.942e+02 8.819e+02, threshold=7.543e+02, percent-clipped=11.0 2024-09-17 22:18:22,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=281988.0, ans=0.125 2024-09-17 22:18:27,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=282034.6666666667, ans=0.0 2024-09-17 22:18:43,890 INFO [train.py:1198] (1/2) Epoch 16, batch 2300, loss[loss=0.208, simple_loss=0.2566, pruned_loss=0.05956, ctc_loss=0.1269, cr_loss=0.3736, over 34221.00 frames. ], tot_loss[loss=0.2366, simple_loss=0.2816, pruned_loss=0.07266, ctc_loss=0.1476, cr_loss=0.4194, over 6766336.24 frames. ], batch size: 83, lr: 7.53e-03, grad_scale: 16.0 2024-09-17 22:19:00,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=282128.0, ans=0.2 2024-09-17 22:19:12,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=282128.0, ans=0.125 2024-09-17 22:19:27,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=282174.6666666667, ans=0.125 2024-09-17 22:19:39,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=282221.3333333333, ans=0.125 2024-09-17 22:20:08,628 INFO [train.py:1198] (1/2) Epoch 16, batch 2350, loss[loss=0.2397, simple_loss=0.2915, pruned_loss=0.07103, ctc_loss=0.145, cr_loss=0.4221, over 34684.00 frames. ], tot_loss[loss=0.2369, simple_loss=0.2818, pruned_loss=0.07282, ctc_loss=0.1479, cr_loss=0.4204, over 6773024.03 frames. ], batch size: 97, lr: 7.52e-03, grad_scale: 16.0 2024-09-17 22:20:25,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.72 vs. limit=22.5 2024-09-17 22:20:56,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=282454.6666666667, ans=10.0 2024-09-17 22:20:59,515 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.112e+02 2.494e+02 2.908e+02 3.728e+02 5.298e+02, threshold=5.817e+02, percent-clipped=0.0 2024-09-17 22:21:06,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=282454.6666666667, ans=0.1 2024-09-17 22:21:31,157 INFO [train.py:1198] (1/2) Epoch 16, batch 2400, loss[loss=0.2373, simple_loss=0.2827, pruned_loss=0.07259, ctc_loss=0.1481, cr_loss=0.4296, over 34595.00 frames. ], tot_loss[loss=0.2375, simple_loss=0.2824, pruned_loss=0.07305, ctc_loss=0.1483, cr_loss=0.4217, over 6776388.72 frames. ], batch size: 89, lr: 7.52e-03, grad_scale: 32.0 2024-09-17 22:21:39,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=282548.0, ans=0.1 2024-09-17 22:22:41,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=282734.6666666667, ans=0.0 2024-09-17 22:22:55,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=282734.6666666667, ans=0.0 2024-09-17 22:22:58,257 INFO [train.py:1198] (1/2) Epoch 16, batch 2450, loss[loss=0.2467, simple_loss=0.2935, pruned_loss=0.07595, ctc_loss=0.1525, cr_loss=0.4409, over 34414.00 frames. ], tot_loss[loss=0.2391, simple_loss=0.2838, pruned_loss=0.07371, ctc_loss=0.1497, cr_loss=0.4235, over 6749734.16 frames. ], batch size: 95, lr: 7.52e-03, grad_scale: 32.0 2024-09-17 22:23:36,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=282874.6666666667, ans=0.125 2024-09-17 22:23:48,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=282921.3333333333, ans=0.0 2024-09-17 22:23:49,517 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.492e+02 2.917e+02 3.752e+02 9.534e+02, threshold=5.835e+02, percent-clipped=4.0 2024-09-17 22:23:56,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=282921.3333333333, ans=0.0 2024-09-17 22:24:11,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=282968.0, ans=0.125 2024-09-17 22:24:21,045 INFO [train.py:1198] (1/2) Epoch 16, batch 2500, loss[loss=0.2339, simple_loss=0.2849, pruned_loss=0.06891, ctc_loss=0.1445, cr_loss=0.4035, over 34433.00 frames. ], tot_loss[loss=0.2389, simple_loss=0.2835, pruned_loss=0.0737, ctc_loss=0.1496, cr_loss=0.4236, over 6762338.92 frames. ], batch size: 100, lr: 7.51e-03, grad_scale: 32.0 2024-09-17 22:24:21,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=283014.6666666667, ans=0.125 2024-09-17 22:24:23,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.79 vs. limit=6.0 2024-09-17 22:24:53,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.92 vs. limit=15.0 2024-09-17 22:25:40,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=283201.3333333333, ans=0.0 2024-09-17 22:25:42,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=283248.0, ans=0.125 2024-09-17 22:25:43,955 INFO [train.py:1198] (1/2) Epoch 16, batch 2550, loss[loss=0.2048, simple_loss=0.2503, pruned_loss=0.05972, ctc_loss=0.1232, cr_loss=0.3802, over 34189.00 frames. ], tot_loss[loss=0.2389, simple_loss=0.2836, pruned_loss=0.07367, ctc_loss=0.1495, cr_loss=0.4232, over 6766397.31 frames. ], batch size: 78, lr: 7.51e-03, grad_scale: 32.0 2024-09-17 22:25:56,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=283248.0, ans=10.0 2024-09-17 22:25:58,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.14 vs. limit=15.0 2024-09-17 22:26:18,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2024-09-17 22:26:20,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283341.3333333333, ans=0.1 2024-09-17 22:26:21,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.07 vs. limit=15.0 2024-09-17 22:26:36,839 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.416e+02 2.726e+02 3.809e+02 7.008e+02, threshold=5.453e+02, percent-clipped=3.0 2024-09-17 22:27:10,690 INFO [train.py:1198] (1/2) Epoch 16, batch 2600, loss[loss=0.2332, simple_loss=0.2797, pruned_loss=0.07099, ctc_loss=0.1431, cr_loss=0.401, over 34363.00 frames. ], tot_loss[loss=0.2392, simple_loss=0.284, pruned_loss=0.07377, ctc_loss=0.1496, cr_loss=0.4233, over 6761742.70 frames. ], batch size: 91, lr: 7.51e-03, grad_scale: 32.0 2024-09-17 22:27:25,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=283528.0, ans=0.2 2024-09-17 22:27:27,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.67 vs. limit=10.0 2024-09-17 22:27:38,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=283528.0, ans=0.0 2024-09-17 22:27:50,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-17 22:28:10,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=283621.3333333333, ans=0.125 2024-09-17 22:28:23,303 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:28:32,513 INFO [train.py:1198] (1/2) Epoch 16, batch 2650, loss[loss=0.2492, simple_loss=0.299, pruned_loss=0.07483, ctc_loss=0.1548, cr_loss=0.4675, over 34162.00 frames. ], tot_loss[loss=0.239, simple_loss=0.284, pruned_loss=0.07364, ctc_loss=0.1494, cr_loss=0.4235, over 6769001.57 frames. ], batch size: 117, lr: 7.51e-03, grad_scale: 32.0 2024-09-17 22:28:35,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=283714.6666666667, ans=15.0 2024-09-17 22:28:37,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=283714.6666666667, ans=0.0 2024-09-17 22:28:43,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=283714.6666666667, ans=0.0 2024-09-17 22:28:57,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=283761.3333333333, ans=0.125 2024-09-17 22:29:04,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=283808.0, ans=0.125 2024-09-17 22:29:23,291 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.053e+02 2.557e+02 2.963e+02 3.774e+02 7.160e+02, threshold=5.927e+02, percent-clipped=8.0 2024-09-17 22:29:28,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=283854.6666666667, ans=0.125 2024-09-17 22:29:33,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=283854.6666666667, ans=0.125 2024-09-17 22:29:43,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=283901.3333333333, ans=0.125 2024-09-17 22:29:55,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=283948.0, ans=0.07 2024-09-17 22:29:56,633 INFO [train.py:1198] (1/2) Epoch 16, batch 2700, loss[loss=0.2336, simple_loss=0.2865, pruned_loss=0.0681, ctc_loss=0.143, cr_loss=0.3968, over 34599.00 frames. ], tot_loss[loss=0.2394, simple_loss=0.2843, pruned_loss=0.0738, ctc_loss=0.1497, cr_loss=0.4242, over 6764035.29 frames. ], batch size: 102, lr: 7.50e-03, grad_scale: 32.0 2024-09-17 22:30:05,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=283948.0, ans=0.0 2024-09-17 22:30:05,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=283948.0, ans=0.125 2024-09-17 22:30:15,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=283994.6666666667, ans=0.125 2024-09-17 22:30:17,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283994.6666666667, ans=0.1 2024-09-17 22:30:25,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-09-17 22:30:47,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=284088.0, ans=0.125 2024-09-17 22:30:47,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=284088.0, ans=0.2 2024-09-17 22:30:49,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=284088.0, ans=0.025 2024-09-17 22:30:50,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=284088.0, ans=0.0 2024-09-17 22:30:57,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=284088.0, ans=0.125 2024-09-17 22:30:57,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=284088.0, ans=0.125 2024-09-17 22:30:59,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=12.0 2024-09-17 22:31:21,704 INFO [train.py:1198] (1/2) Epoch 16, batch 2750, loss[loss=0.2294, simple_loss=0.2703, pruned_loss=0.07147, ctc_loss=0.1439, cr_loss=0.4192, over 34626.00 frames. ], tot_loss[loss=0.2379, simple_loss=0.2829, pruned_loss=0.07317, ctc_loss=0.1487, cr_loss=0.4223, over 6762804.07 frames. ], batch size: 88, lr: 7.50e-03, grad_scale: 32.0 2024-09-17 22:31:23,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=284181.3333333333, ans=0.1 2024-09-17 22:31:23,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=284181.3333333333, ans=0.0 2024-09-17 22:31:27,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=284181.3333333333, ans=0.0 2024-09-17 22:31:34,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2024-09-17 22:31:35,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=284181.3333333333, ans=0.1 2024-09-17 22:31:40,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.92 vs. limit=15.0 2024-09-17 22:31:41,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2024-09-17 22:32:12,335 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.448e+02 2.825e+02 3.645e+02 6.147e+02, threshold=5.650e+02, percent-clipped=2.0 2024-09-17 22:32:17,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=284321.3333333333, ans=0.0 2024-09-17 22:32:27,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=284368.0, ans=0.025 2024-09-17 22:32:31,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=284368.0, ans=0.1 2024-09-17 22:32:43,937 INFO [train.py:1198] (1/2) Epoch 16, batch 2800, loss[loss=0.2776, simple_loss=0.3121, pruned_loss=0.09327, ctc_loss=0.192, cr_loss=0.4546, over 23997.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.283, pruned_loss=0.07339, ctc_loss=0.1492, cr_loss=0.4234, over 6741811.71 frames. ], batch size: 245, lr: 7.50e-03, grad_scale: 32.0 2024-09-17 22:32:50,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=284414.6666666667, ans=0.125 2024-09-17 22:33:05,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=284461.3333333333, ans=0.1 2024-09-17 22:33:15,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=284508.0, ans=10.0 2024-09-17 22:33:24,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.23 vs. limit=15.0 2024-09-17 22:33:50,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=284601.3333333333, ans=0.1 2024-09-17 22:34:10,209 INFO [train.py:1198] (1/2) Epoch 16, batch 2850, loss[loss=0.2385, simple_loss=0.2837, pruned_loss=0.07409, ctc_loss=0.1465, cr_loss=0.3942, over 34487.00 frames. ], tot_loss[loss=0.2389, simple_loss=0.2835, pruned_loss=0.07372, ctc_loss=0.1497, cr_loss=0.4231, over 6725118.50 frames. ], batch size: 90, lr: 7.49e-03, grad_scale: 16.0 2024-09-17 22:34:39,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=284694.6666666667, ans=0.1 2024-09-17 22:34:42,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=284741.3333333333, ans=0.125 2024-09-17 22:34:52,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=284741.3333333333, ans=0.125 2024-09-17 22:34:59,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=284788.0, ans=0.125 2024-09-17 22:35:03,658 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.505e+02 2.981e+02 4.095e+02 8.233e+02, threshold=5.963e+02, percent-clipped=7.0 2024-09-17 22:35:17,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=284834.6666666667, ans=0.025 2024-09-17 22:35:30,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=284834.6666666667, ans=0.125 2024-09-17 22:35:30,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=284834.6666666667, ans=0.125 2024-09-17 22:35:33,228 INFO [train.py:1198] (1/2) Epoch 16, batch 2900, loss[loss=0.2313, simple_loss=0.2785, pruned_loss=0.06979, ctc_loss=0.143, cr_loss=0.4012, over 34533.00 frames. ], tot_loss[loss=0.2394, simple_loss=0.2843, pruned_loss=0.07383, ctc_loss=0.1499, cr_loss=0.4246, over 6755477.47 frames. ], batch size: 94, lr: 7.49e-03, grad_scale: 16.0 2024-09-17 22:35:35,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=284881.3333333333, ans=0.125 2024-09-17 22:35:46,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=284881.3333333333, ans=0.0 2024-09-17 22:35:54,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=284928.0, ans=0.125 2024-09-17 22:36:22,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=285021.3333333333, ans=0.1 2024-09-17 22:36:32,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=285021.3333333333, ans=0.125 2024-09-17 22:36:34,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=285021.3333333333, ans=0.125 2024-09-17 22:36:55,763 INFO [train.py:1198] (1/2) Epoch 16, batch 2950, loss[loss=0.229, simple_loss=0.2701, pruned_loss=0.07084, ctc_loss=0.1462, cr_loss=0.4246, over 34646.00 frames. ], tot_loss[loss=0.2382, simple_loss=0.2829, pruned_loss=0.07335, ctc_loss=0.149, cr_loss=0.4226, over 6748474.95 frames. ], batch size: 88, lr: 7.49e-03, grad_scale: 16.0 2024-09-17 22:36:56,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=285114.6666666667, ans=0.125 2024-09-17 22:37:16,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=285161.3333333333, ans=0.1 2024-09-17 22:37:24,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=285161.3333333333, ans=0.025 2024-09-17 22:37:30,135 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=22.5 2024-09-17 22:37:50,407 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.503e+02 2.963e+02 3.844e+02 1.308e+03, threshold=5.927e+02, percent-clipped=7.0 2024-09-17 22:37:58,072 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 22:38:22,718 INFO [train.py:1198] (1/2) Epoch 16, batch 3000, loss[loss=0.2476, simple_loss=0.2875, pruned_loss=0.07928, ctc_loss=0.1575, cr_loss=0.4418, over 34526.00 frames. ], tot_loss[loss=0.238, simple_loss=0.2828, pruned_loss=0.07324, ctc_loss=0.1488, cr_loss=0.4228, over 6750090.84 frames. ], batch size: 94, lr: 7.48e-03, grad_scale: 16.0 2024-09-17 22:38:22,718 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 22:38:39,728 INFO [train.py:1230] (1/2) Epoch 16, validation: loss=0.15, simple_loss=0.2477, pruned_loss=0.02191, ctc_loss=0.0425, cr_loss=1.679e-14, over 944034.00 frames. 2024-09-17 22:38:39,728 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 22:38:52,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=285348.0, ans=0.125 2024-09-17 22:39:00,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.07 vs. limit=22.5 2024-09-17 22:39:29,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=285488.0, ans=0.125 2024-09-17 22:39:53,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=285534.6666666667, ans=0.0 2024-09-17 22:40:00,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.53 vs. limit=10.0 2024-09-17 22:40:01,280 INFO [train.py:1198] (1/2) Epoch 16, batch 3050, loss[loss=0.2234, simple_loss=0.2661, pruned_loss=0.06874, ctc_loss=0.1365, cr_loss=0.397, over 34576.00 frames. ], tot_loss[loss=0.2392, simple_loss=0.2838, pruned_loss=0.07386, ctc_loss=0.1498, cr_loss=0.4245, over 6742608.36 frames. ], batch size: 89, lr: 7.48e-03, grad_scale: 16.0 2024-09-17 22:40:06,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=285581.3333333333, ans=0.2 2024-09-17 22:40:39,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.65 vs. limit=15.0 2024-09-17 22:40:53,620 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.387e+02 2.701e+02 3.354e+02 4.967e+02, threshold=5.402e+02, percent-clipped=0.0 2024-09-17 22:40:57,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=285721.3333333333, ans=0.125 2024-09-17 22:41:00,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=285721.3333333333, ans=0.125 2024-09-17 22:41:14,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=285768.0, ans=0.0 2024-09-17 22:41:19,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.21 vs. limit=10.0 2024-09-17 22:41:22,710 INFO [train.py:1198] (1/2) Epoch 16, batch 3100, loss[loss=0.2513, simple_loss=0.3026, pruned_loss=0.07567, ctc_loss=0.1581, cr_loss=0.4239, over 34217.00 frames. ], tot_loss[loss=0.2389, simple_loss=0.2835, pruned_loss=0.07368, ctc_loss=0.1495, cr_loss=0.4242, over 6742489.43 frames. ], batch size: 117, lr: 7.48e-03, grad_scale: 16.0 2024-09-17 22:41:34,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.84 vs. limit=15.0 2024-09-17 22:41:42,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=285861.3333333333, ans=0.0 2024-09-17 22:42:01,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=285908.0, ans=0.5 2024-09-17 22:42:06,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=285908.0, ans=0.125 2024-09-17 22:42:16,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=12.0 2024-09-17 22:42:23,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=285954.6666666667, ans=0.125 2024-09-17 22:42:26,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=285954.6666666667, ans=0.015 2024-09-17 22:42:34,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.75 vs. limit=15.0 2024-09-17 22:42:35,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=286001.3333333333, ans=0.0 2024-09-17 22:42:45,364 INFO [train.py:1198] (1/2) Epoch 16, batch 3150, loss[loss=0.2511, simple_loss=0.2974, pruned_loss=0.07755, ctc_loss=0.1618, cr_loss=0.4316, over 33772.00 frames. ], tot_loss[loss=0.2392, simple_loss=0.284, pruned_loss=0.07377, ctc_loss=0.1499, cr_loss=0.4249, over 6747105.83 frames. ], batch size: 122, lr: 7.48e-03, grad_scale: 16.0 2024-09-17 22:42:58,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=286048.0, ans=0.125 2024-09-17 22:43:00,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=286094.6666666667, ans=0.125 2024-09-17 22:43:02,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=286094.6666666667, ans=0.125 2024-09-17 22:43:31,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=286141.3333333333, ans=0.2 2024-09-17 22:43:38,966 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.525e+02 3.256e+02 3.976e+02 8.389e+02, threshold=6.512e+02, percent-clipped=5.0 2024-09-17 22:43:46,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2024-09-17 22:43:55,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=286234.6666666667, ans=0.125 2024-09-17 22:44:08,074 INFO [train.py:1198] (1/2) Epoch 16, batch 3200, loss[loss=0.2266, simple_loss=0.2776, pruned_loss=0.06639, ctc_loss=0.1358, cr_loss=0.3923, over 34530.00 frames. ], tot_loss[loss=0.2385, simple_loss=0.2831, pruned_loss=0.07349, ctc_loss=0.1492, cr_loss=0.4241, over 6761577.10 frames. ], batch size: 94, lr: 7.47e-03, grad_scale: 32.0 2024-09-17 22:44:51,234 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.19 vs. limit=10.0 2024-09-17 22:44:58,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=286421.3333333333, ans=0.125 2024-09-17 22:45:03,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=286421.3333333333, ans=0.125 2024-09-17 22:45:06,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=286421.3333333333, ans=0.2 2024-09-17 22:45:19,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=286468.0, ans=0.125 2024-09-17 22:45:24,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=286468.0, ans=0.125 2024-09-17 22:45:29,160 INFO [train.py:1198] (1/2) Epoch 16, batch 3250, loss[loss=0.2239, simple_loss=0.2759, pruned_loss=0.06458, ctc_loss=0.1352, cr_loss=0.3944, over 34648.00 frames. ], tot_loss[loss=0.239, simple_loss=0.2837, pruned_loss=0.07368, ctc_loss=0.1494, cr_loss=0.4242, over 6770950.09 frames. ], batch size: 98, lr: 7.47e-03, grad_scale: 32.0 2024-09-17 22:45:30,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.00 vs. limit=15.0 2024-09-17 22:45:46,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.50 vs. limit=15.0 2024-09-17 22:45:59,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2024-09-17 22:46:11,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=286608.0, ans=0.1 2024-09-17 22:46:14,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=286608.0, ans=0.0 2024-09-17 22:46:20,959 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.694e+02 3.494e+02 4.344e+02 7.048e+02, threshold=6.988e+02, percent-clipped=2.0 2024-09-17 22:46:49,962 INFO [train.py:1198] (1/2) Epoch 16, batch 3300, loss[loss=0.252, simple_loss=0.2961, pruned_loss=0.07931, ctc_loss=0.1607, cr_loss=0.4284, over 33000.00 frames. ], tot_loss[loss=0.2374, simple_loss=0.2822, pruned_loss=0.07301, ctc_loss=0.1483, cr_loss=0.4216, over 6768623.90 frames. ], batch size: 130, lr: 7.47e-03, grad_scale: 32.0 2024-09-17 22:46:55,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=286748.0, ans=0.0 2024-09-17 22:47:08,655 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.47 vs. limit=15.0 2024-09-17 22:47:35,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=286841.3333333333, ans=0.125 2024-09-17 22:48:11,715 INFO [train.py:1198] (1/2) Epoch 16, batch 3350, loss[loss=0.2564, simple_loss=0.2981, pruned_loss=0.08181, ctc_loss=0.1639, cr_loss=0.4563, over 33757.00 frames. ], tot_loss[loss=0.2386, simple_loss=0.2831, pruned_loss=0.07362, ctc_loss=0.1493, cr_loss=0.4232, over 6743875.17 frames. ], batch size: 122, lr: 7.46e-03, grad_scale: 32.0 2024-09-17 22:48:25,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=286981.3333333333, ans=0.0 2024-09-17 22:48:57,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=287074.6666666667, ans=0.1 2024-09-17 22:49:02,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=287121.3333333333, ans=0.125 2024-09-17 22:49:04,869 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.431e+02 2.762e+02 3.206e+02 5.943e+02, threshold=5.523e+02, percent-clipped=0.0 2024-09-17 22:49:07,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=287121.3333333333, ans=0.125 2024-09-17 22:49:10,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=287121.3333333333, ans=0.0 2024-09-17 22:49:10,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=287121.3333333333, ans=0.0 2024-09-17 22:49:18,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=287168.0, ans=0.2 2024-09-17 22:49:26,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=287168.0, ans=0.1 2024-09-17 22:49:34,024 INFO [train.py:1198] (1/2) Epoch 16, batch 3400, loss[loss=0.2084, simple_loss=0.2542, pruned_loss=0.06121, ctc_loss=0.1269, cr_loss=0.3681, over 34167.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2833, pruned_loss=0.0737, ctc_loss=0.1495, cr_loss=0.4229, over 6733525.99 frames. ], batch size: 78, lr: 7.46e-03, grad_scale: 32.0 2024-09-17 22:49:34,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=287214.6666666667, ans=0.0 2024-09-17 22:49:42,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=287214.6666666667, ans=0.125 2024-09-17 22:50:15,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=287308.0, ans=0.125 2024-09-17 22:50:36,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.05 vs. limit=22.5 2024-09-17 22:50:43,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=287401.3333333333, ans=0.0 2024-09-17 22:50:54,728 INFO [train.py:1198] (1/2) Epoch 16, batch 3450, loss[loss=0.2467, simple_loss=0.2929, pruned_loss=0.0763, ctc_loss=0.1566, cr_loss=0.4143, over 33059.00 frames. ], tot_loss[loss=0.2389, simple_loss=0.2836, pruned_loss=0.07363, ctc_loss=0.1495, cr_loss=0.4224, over 6744978.04 frames. ], batch size: 130, lr: 7.46e-03, grad_scale: 32.0 2024-09-17 22:51:38,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=287541.3333333333, ans=0.1 2024-09-17 22:51:46,765 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.027e+02 2.499e+02 2.884e+02 3.667e+02 8.037e+02, threshold=5.768e+02, percent-clipped=3.0 2024-09-17 22:52:05,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=287634.6666666667, ans=0.1 2024-09-17 22:52:13,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=287634.6666666667, ans=0.125 2024-09-17 22:52:16,734 INFO [train.py:1198] (1/2) Epoch 16, batch 3500, loss[loss=0.2113, simple_loss=0.2596, pruned_loss=0.06154, ctc_loss=0.1252, cr_loss=0.3736, over 34497.00 frames. ], tot_loss[loss=0.2378, simple_loss=0.2827, pruned_loss=0.07314, ctc_loss=0.1486, cr_loss=0.4207, over 6747954.80 frames. ], batch size: 85, lr: 7.45e-03, grad_scale: 32.0 2024-09-17 22:53:25,355 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.77 vs. limit=15.0 2024-09-17 22:53:26,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.01 vs. limit=6.0 2024-09-17 22:53:38,222 INFO [train.py:1198] (1/2) Epoch 16, batch 3550, loss[loss=0.2519, simple_loss=0.2986, pruned_loss=0.07849, ctc_loss=0.1554, cr_loss=0.426, over 34402.00 frames. ], tot_loss[loss=0.2379, simple_loss=0.2829, pruned_loss=0.07319, ctc_loss=0.1487, cr_loss=0.4213, over 6756608.54 frames. ], batch size: 103, lr: 7.45e-03, grad_scale: 32.0 2024-09-17 22:53:40,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2024-09-17 22:53:41,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=287914.6666666667, ans=0.0 2024-09-17 22:54:23,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=288008.0, ans=0.09899494936611666 2024-09-17 22:54:29,613 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.601e+02 3.156e+02 3.924e+02 6.556e+02, threshold=6.311e+02, percent-clipped=2.0 2024-09-17 22:54:33,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.59 vs. limit=22.5 2024-09-17 22:54:45,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=288101.3333333333, ans=0.2 2024-09-17 22:54:58,372 INFO [train.py:1198] (1/2) Epoch 16, batch 3600, loss[loss=0.2218, simple_loss=0.2691, pruned_loss=0.06577, ctc_loss=0.1366, cr_loss=0.3934, over 34483.00 frames. ], tot_loss[loss=0.2383, simple_loss=0.2832, pruned_loss=0.07335, ctc_loss=0.1489, cr_loss=0.4218, over 6765544.82 frames. ], batch size: 90, lr: 7.45e-03, grad_scale: 32.0 2024-09-17 22:55:29,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=288241.3333333333, ans=0.125 2024-09-17 22:55:40,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=288241.3333333333, ans=0.0 2024-09-17 22:55:41,972 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.45 vs. limit=15.0 2024-09-17 22:56:18,762 INFO [train.py:1198] (1/2) Epoch 16, batch 3650, loss[loss=0.2435, simple_loss=0.2931, pruned_loss=0.07355, ctc_loss=0.1511, cr_loss=0.4154, over 34464.00 frames. ], tot_loss[loss=0.2378, simple_loss=0.2827, pruned_loss=0.07315, ctc_loss=0.1485, cr_loss=0.4216, over 6768703.97 frames. ], batch size: 110, lr: 7.45e-03, grad_scale: 16.0 2024-09-17 22:56:21,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-17 22:56:37,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=288428.0, ans=0.0 2024-09-17 22:56:40,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=288428.0, ans=0.125 2024-09-17 22:56:57,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=288474.6666666667, ans=0.125 2024-09-17 22:57:12,672 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.018e+02 2.509e+02 3.042e+02 3.870e+02 7.991e+02, threshold=6.085e+02, percent-clipped=1.0 2024-09-17 22:57:39,849 INFO [train.py:1198] (1/2) Epoch 16, batch 3700, loss[loss=0.2403, simple_loss=0.2882, pruned_loss=0.07294, ctc_loss=0.1488, cr_loss=0.4208, over 34612.00 frames. ], tot_loss[loss=0.2373, simple_loss=0.2826, pruned_loss=0.07282, ctc_loss=0.148, cr_loss=0.421, over 6783493.15 frames. ], batch size: 102, lr: 7.44e-03, grad_scale: 8.0 2024-09-17 22:58:02,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=288661.3333333333, ans=0.125 2024-09-17 22:58:22,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=288708.0, ans=0.125 2024-09-17 22:58:37,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=288754.6666666667, ans=0.125 2024-09-17 22:58:42,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.44 vs. limit=6.0 2024-09-17 22:59:01,577 INFO [train.py:1198] (1/2) Epoch 16, batch 3750, loss[loss=0.2553, simple_loss=0.3017, pruned_loss=0.07942, ctc_loss=0.1579, cr_loss=0.4596, over 34332.00 frames. ], tot_loss[loss=0.2407, simple_loss=0.286, pruned_loss=0.07413, ctc_loss=0.1504, cr_loss=0.4265, over 6785143.80 frames. ], batch size: 113, lr: 7.44e-03, grad_scale: 8.0 2024-09-17 22:59:07,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=288848.0, ans=15.0 2024-09-17 22:59:18,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=288894.6666666667, ans=0.04949747468305833 2024-09-17 22:59:32,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=288941.3333333333, ans=0.2 2024-09-17 22:59:34,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=288941.3333333333, ans=0.0 2024-09-17 22:59:38,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=288941.3333333333, ans=0.0 2024-09-17 22:59:41,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288941.3333333333, ans=0.1 2024-09-17 22:59:48,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=288988.0, ans=0.0 2024-09-17 22:59:57,021 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.045e+02 2.345e+02 2.569e+02 2.922e+02 4.715e+02, threshold=5.137e+02, percent-clipped=0.0 2024-09-17 23:00:02,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=288988.0, ans=0.025 2024-09-17 23:00:07,868 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2024-09-17 23:00:18,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=289034.6666666667, ans=0.0 2024-09-17 23:00:18,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=289034.6666666667, ans=0.125 2024-09-17 23:00:20,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=289034.6666666667, ans=0.125 2024-09-17 23:00:23,116 INFO [train.py:1198] (1/2) Epoch 16, batch 3800, loss[loss=0.2612, simple_loss=0.299, pruned_loss=0.08552, ctc_loss=0.1706, cr_loss=0.4571, over 30190.00 frames. ], tot_loss[loss=0.2445, simple_loss=0.2889, pruned_loss=0.07604, ctc_loss=0.154, cr_loss=0.4319, over 6675922.18 frames. ], batch size: 175, lr: 7.44e-03, grad_scale: 8.0 2024-09-17 23:01:02,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=289174.6666666667, ans=0.125 2024-09-17 23:01:08,328 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.39 vs. limit=10.0 2024-09-17 23:01:18,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=289221.3333333333, ans=0.125 2024-09-17 23:01:47,317 INFO [train.py:1198] (1/2) Epoch 16, batch 3850, loss[loss=0.2829, simple_loss=0.3144, pruned_loss=0.09694, ctc_loss=0.1915, cr_loss=0.4811, over 24105.00 frames. ], tot_loss[loss=0.2498, simple_loss=0.2922, pruned_loss=0.07901, ctc_loss=0.1601, cr_loss=0.4351, over 6245063.91 frames. ], batch size: 245, lr: 7.43e-03, grad_scale: 8.0 2024-09-17 23:03:24,877 INFO [train.py:1198] (1/2) Epoch 17, batch 0, loss[loss=0.2174, simple_loss=0.2659, pruned_loss=0.06394, ctc_loss=0.1307, cr_loss=0.3724, over 34456.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2659, pruned_loss=0.06394, ctc_loss=0.1307, cr_loss=0.3724, over 34456.00 frames. ], batch size: 85, lr: 7.21e-03, grad_scale: 16.0 2024-09-17 23:03:24,878 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-17 23:03:41,809 INFO [train.py:1230] (1/2) Epoch 17, validation: loss=0.1515, simple_loss=0.2499, pruned_loss=0.02219, ctc_loss=0.04343, cr_loss=1.78e-14, over 944034.00 frames. 2024-09-17 23:03:41,809 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-17 23:03:54,792 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.609e+02 2.767e+02 2.984e+02 4.966e+02, threshold=5.533e+02, percent-clipped=0.0 2024-09-17 23:04:01,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=289482.6666666667, ans=0.1 2024-09-17 23:04:03,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=289482.6666666667, ans=0.125 2024-09-17 23:04:46,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=9.32 vs. limit=15.0 2024-09-17 23:04:56,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=289622.6666666667, ans=0.2 2024-09-17 23:05:04,390 INFO [train.py:1198] (1/2) Epoch 17, batch 50, loss[loss=0.2245, simple_loss=0.2642, pruned_loss=0.07022, ctc_loss=0.1414, cr_loss=0.4002, over 34516.00 frames. ], tot_loss[loss=0.2388, simple_loss=0.2839, pruned_loss=0.07342, ctc_loss=0.1492, cr_loss=0.4226, over 1481806.52 frames. ], batch size: 82, lr: 7.20e-03, grad_scale: 16.0 2024-09-17 23:05:11,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=289669.3333333333, ans=0.125 2024-09-17 23:05:17,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=289669.3333333333, ans=0.125 2024-09-17 23:05:19,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=289716.0, ans=0.09899494936611666 2024-09-17 23:05:40,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=289762.6666666667, ans=0.0 2024-09-17 23:06:09,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=289809.3333333333, ans=0.125 2024-09-17 23:06:29,344 INFO [train.py:1198] (1/2) Epoch 17, batch 100, loss[loss=0.225, simple_loss=0.2692, pruned_loss=0.06809, ctc_loss=0.1404, cr_loss=0.4143, over 34573.00 frames. ], tot_loss[loss=0.2407, simple_loss=0.2857, pruned_loss=0.07422, ctc_loss=0.1507, cr_loss=0.427, over 2630477.70 frames. ], batch size: 89, lr: 7.20e-03, grad_scale: 16.0 2024-09-17 23:06:44,167 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.083e+02 2.568e+02 3.231e+02 4.127e+02 6.999e+02, threshold=6.461e+02, percent-clipped=5.0 2024-09-17 23:07:22,161 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:07:45,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.80 vs. limit=22.5 2024-09-17 23:07:52,733 INFO [train.py:1198] (1/2) Epoch 17, batch 150, loss[loss=0.2101, simple_loss=0.2548, pruned_loss=0.06239, ctc_loss=0.1281, cr_loss=0.3758, over 34489.00 frames. ], tot_loss[loss=0.2378, simple_loss=0.2833, pruned_loss=0.0729, ctc_loss=0.1483, cr_loss=0.4224, over 3557145.42 frames. ], batch size: 82, lr: 7.20e-03, grad_scale: 16.0 2024-09-17 23:07:53,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=290136.0, ans=0.125 2024-09-17 23:07:53,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=290136.0, ans=0.1 2024-09-17 23:08:11,711 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2024-09-17 23:08:20,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.94 vs. limit=22.5 2024-09-17 23:08:58,058 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.31 vs. limit=12.0 2024-09-17 23:09:17,865 INFO [train.py:1198] (1/2) Epoch 17, batch 200, loss[loss=0.2365, simple_loss=0.2833, pruned_loss=0.07196, ctc_loss=0.1471, cr_loss=0.4117, over 32043.00 frames. ], tot_loss[loss=0.2368, simple_loss=0.2822, pruned_loss=0.07257, ctc_loss=0.1476, cr_loss=0.4213, over 4271825.16 frames. ], batch size: 145, lr: 7.20e-03, grad_scale: 16.0 2024-09-17 23:09:31,047 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.495e+02 3.025e+02 3.761e+02 6.164e+02, threshold=6.049e+02, percent-clipped=0.0 2024-09-17 23:09:37,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=290416.0, ans=0.1 2024-09-17 23:09:47,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=290416.0, ans=0.0 2024-09-17 23:10:07,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=290509.3333333333, ans=0.125 2024-09-17 23:10:22,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=290556.0, ans=0.125 2024-09-17 23:10:40,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=290602.6666666667, ans=0.125 2024-09-17 23:10:42,144 INFO [train.py:1198] (1/2) Epoch 17, batch 250, loss[loss=0.2526, simple_loss=0.2956, pruned_loss=0.07922, ctc_loss=0.1638, cr_loss=0.4567, over 34321.00 frames. ], tot_loss[loss=0.2362, simple_loss=0.2818, pruned_loss=0.0722, ctc_loss=0.1468, cr_loss=0.4205, over 4833957.20 frames. ], batch size: 117, lr: 7.19e-03, grad_scale: 16.0 2024-09-17 23:10:47,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=290602.6666666667, ans=0.0 2024-09-17 23:10:53,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=290602.6666666667, ans=0.0 2024-09-17 23:11:02,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=290649.3333333333, ans=0.125 2024-09-17 23:11:02,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=290649.3333333333, ans=0.2 2024-09-17 23:11:07,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=290649.3333333333, ans=0.025 2024-09-17 23:11:20,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=290696.0, ans=0.0 2024-09-17 23:11:35,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=290742.6666666667, ans=0.125 2024-09-17 23:11:51,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=290789.3333333333, ans=0.0 2024-09-17 23:12:04,479 INFO [train.py:1198] (1/2) Epoch 17, batch 300, loss[loss=0.2718, simple_loss=0.309, pruned_loss=0.08998, ctc_loss=0.1748, cr_loss=0.49, over 34329.00 frames. ], tot_loss[loss=0.2363, simple_loss=0.2816, pruned_loss=0.07233, ctc_loss=0.147, cr_loss=0.4213, over 5263075.23 frames. ], batch size: 107, lr: 7.19e-03, grad_scale: 16.0 2024-09-17 23:12:17,780 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.448e+02 2.804e+02 3.644e+02 5.732e+02, threshold=5.608e+02, percent-clipped=0.0 2024-09-17 23:12:34,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.54 vs. limit=15.0 2024-09-17 23:12:37,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=290929.3333333333, ans=0.0 2024-09-17 23:12:39,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290929.3333333333, ans=0.1 2024-09-17 23:12:47,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=290929.3333333333, ans=0.125 2024-09-17 23:12:53,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=290976.0, ans=0.0 2024-09-17 23:13:12,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2024-09-17 23:13:13,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=291022.6666666667, ans=0.1 2024-09-17 23:13:23,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=291022.6666666667, ans=0.125 2024-09-17 23:13:28,567 INFO [train.py:1198] (1/2) Epoch 17, batch 350, loss[loss=0.2118, simple_loss=0.2556, pruned_loss=0.06355, ctc_loss=0.1294, cr_loss=0.3737, over 34277.00 frames. ], tot_loss[loss=0.2368, simple_loss=0.2824, pruned_loss=0.07243, ctc_loss=0.1474, cr_loss=0.4225, over 5596849.64 frames. ], batch size: 83, lr: 7.19e-03, grad_scale: 16.0 2024-09-17 23:13:28,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=291069.3333333333, ans=0.125 2024-09-17 23:13:33,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=291069.3333333333, ans=0.0 2024-09-17 23:13:36,108 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.39 vs. limit=15.0 2024-09-17 23:13:36,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2024-09-17 23:13:45,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=291116.0, ans=0.0 2024-09-17 23:14:25,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.83 vs. limit=15.0 2024-09-17 23:14:26,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=291209.3333333333, ans=0.0 2024-09-17 23:14:38,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=291256.0, ans=0.0 2024-09-17 23:14:40,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=291256.0, ans=0.125 2024-09-17 23:14:53,088 INFO [train.py:1198] (1/2) Epoch 17, batch 400, loss[loss=0.2468, simple_loss=0.2937, pruned_loss=0.0763, ctc_loss=0.15, cr_loss=0.436, over 34446.00 frames. ], tot_loss[loss=0.2362, simple_loss=0.2817, pruned_loss=0.07221, ctc_loss=0.147, cr_loss=0.4216, over 5863384.84 frames. ], batch size: 95, lr: 7.18e-03, grad_scale: 32.0 2024-09-17 23:14:55,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=291302.6666666667, ans=0.125 2024-09-17 23:15:06,273 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.028e+02 2.497e+02 3.017e+02 3.759e+02 7.611e+02, threshold=6.033e+02, percent-clipped=7.0 2024-09-17 23:15:29,767 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:16:14,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=291536.0, ans=0.125 2024-09-17 23:16:15,644 INFO [train.py:1198] (1/2) Epoch 17, batch 450, loss[loss=0.2493, simple_loss=0.2927, pruned_loss=0.07831, ctc_loss=0.16, cr_loss=0.4334, over 34688.00 frames. ], tot_loss[loss=0.2362, simple_loss=0.2818, pruned_loss=0.07216, ctc_loss=0.1469, cr_loss=0.4214, over 6053169.28 frames. ], batch size: 97, lr: 7.18e-03, grad_scale: 32.0 2024-09-17 23:16:24,300 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:16:24,867 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.53 vs. limit=22.5 2024-09-17 23:16:30,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=291582.6666666667, ans=0.0 2024-09-17 23:16:34,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=291582.6666666667, ans=0.125 2024-09-17 23:16:40,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-09-17 23:16:53,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=291629.3333333333, ans=0.05 2024-09-17 23:16:57,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=291629.3333333333, ans=0.125 2024-09-17 23:17:40,190 INFO [train.py:1198] (1/2) Epoch 17, batch 500, loss[loss=0.26, simple_loss=0.3045, pruned_loss=0.08144, ctc_loss=0.1683, cr_loss=0.4742, over 34450.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2811, pruned_loss=0.07188, ctc_loss=0.1464, cr_loss=0.4207, over 6220861.38 frames. ], batch size: 110, lr: 7.18e-03, grad_scale: 32.0 2024-09-17 23:17:53,360 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.390e+02 2.908e+02 3.999e+02 6.654e+02, threshold=5.816e+02, percent-clipped=5.0 2024-09-17 23:18:02,527 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.18 vs. limit=10.0 2024-09-17 23:18:39,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=291909.3333333333, ans=0.125 2024-09-17 23:18:44,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=291909.3333333333, ans=0.125 2024-09-17 23:19:03,886 INFO [train.py:1198] (1/2) Epoch 17, batch 550, loss[loss=0.2448, simple_loss=0.2941, pruned_loss=0.07364, ctc_loss=0.1546, cr_loss=0.4331, over 33921.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2812, pruned_loss=0.07207, ctc_loss=0.1469, cr_loss=0.4211, over 6329587.38 frames. ], batch size: 122, lr: 7.18e-03, grad_scale: 32.0 2024-09-17 23:19:26,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=292049.3333333333, ans=0.0 2024-09-17 23:19:32,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=292049.3333333333, ans=0.1 2024-09-17 23:19:51,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=292096.0, ans=0.2 2024-09-17 23:19:52,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=292142.6666666667, ans=0.0 2024-09-17 23:20:07,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=292142.6666666667, ans=0.025 2024-09-17 23:20:13,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=292189.3333333333, ans=0.09899494936611666 2024-09-17 23:20:25,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=292236.0, ans=0.0 2024-09-17 23:20:26,805 INFO [train.py:1198] (1/2) Epoch 17, batch 600, loss[loss=0.2615, simple_loss=0.3062, pruned_loss=0.08236, ctc_loss=0.1673, cr_loss=0.4679, over 34220.00 frames. ], tot_loss[loss=0.236, simple_loss=0.2816, pruned_loss=0.07208, ctc_loss=0.1468, cr_loss=0.4208, over 6431933.91 frames. ], batch size: 117, lr: 7.17e-03, grad_scale: 32.0 2024-09-17 23:20:30,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=292236.0, ans=0.125 2024-09-17 23:20:38,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=292236.0, ans=0.125 2024-09-17 23:20:39,953 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.050e+02 2.474e+02 2.911e+02 3.976e+02 5.843e+02, threshold=5.823e+02, percent-clipped=1.0 2024-09-17 23:20:41,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=292282.6666666667, ans=0.015 2024-09-17 23:20:43,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=292282.6666666667, ans=0.025 2024-09-17 23:20:50,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=292282.6666666667, ans=0.125 2024-09-17 23:20:57,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=292282.6666666667, ans=0.0 2024-09-17 23:21:18,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=292376.0, ans=0.1 2024-09-17 23:21:40,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=292422.6666666667, ans=0.09899494936611666 2024-09-17 23:21:53,140 INFO [train.py:1198] (1/2) Epoch 17, batch 650, loss[loss=0.2226, simple_loss=0.2752, pruned_loss=0.06393, ctc_loss=0.1325, cr_loss=0.3883, over 34547.00 frames. ], tot_loss[loss=0.2352, simple_loss=0.281, pruned_loss=0.07165, ctc_loss=0.146, cr_loss=0.4191, over 6522903.11 frames. ], batch size: 94, lr: 7.17e-03, grad_scale: 32.0 2024-09-17 23:21:53,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=292469.3333333333, ans=0.125 2024-09-17 23:22:01,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=292469.3333333333, ans=0.125 2024-09-17 23:22:02,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=292469.3333333333, ans=15.0 2024-09-17 23:22:03,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=292469.3333333333, ans=0.125 2024-09-17 23:22:12,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.95 vs. limit=10.0 2024-09-17 23:22:25,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=292562.6666666667, ans=0.0 2024-09-17 23:22:25,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=292562.6666666667, ans=0.1 2024-09-17 23:22:31,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=292562.6666666667, ans=0.125 2024-09-17 23:22:49,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=292609.3333333333, ans=0.125 2024-09-17 23:22:53,939 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.43 vs. limit=15.0 2024-09-17 23:23:15,953 INFO [train.py:1198] (1/2) Epoch 17, batch 700, loss[loss=0.2383, simple_loss=0.2791, pruned_loss=0.0752, ctc_loss=0.15, cr_loss=0.4301, over 34621.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2813, pruned_loss=0.07173, ctc_loss=0.146, cr_loss=0.4195, over 6578843.46 frames. ], batch size: 89, lr: 7.17e-03, grad_scale: 32.0 2024-09-17 23:23:18,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=292702.6666666667, ans=0.125 2024-09-17 23:23:29,269 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.995e+02 2.442e+02 2.972e+02 4.101e+02 8.072e+02, threshold=5.944e+02, percent-clipped=6.0 2024-09-17 23:23:37,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=292749.3333333333, ans=0.1 2024-09-17 23:23:47,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=292796.0, ans=0.2 2024-09-17 23:24:17,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=292842.6666666667, ans=0.125 2024-09-17 23:24:35,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=292889.3333333333, ans=0.125 2024-09-17 23:24:38,613 INFO [train.py:1198] (1/2) Epoch 17, batch 750, loss[loss=0.2289, simple_loss=0.2842, pruned_loss=0.06536, ctc_loss=0.1352, cr_loss=0.3946, over 34381.00 frames. ], tot_loss[loss=0.235, simple_loss=0.2808, pruned_loss=0.07158, ctc_loss=0.1458, cr_loss=0.4191, over 6620046.45 frames. ], batch size: 95, lr: 7.17e-03, grad_scale: 32.0 2024-09-17 23:24:52,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2024-09-17 23:25:02,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.77 vs. limit=15.0 2024-09-17 23:25:07,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-09-17 23:25:10,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=292982.6666666667, ans=0.125 2024-09-17 23:25:23,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=293029.3333333333, ans=0.125 2024-09-17 23:25:25,561 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.533e-02 2024-09-17 23:25:37,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=293076.0, ans=0.025 2024-09-17 23:26:05,466 INFO [train.py:1198] (1/2) Epoch 17, batch 800, loss[loss=0.2009, simple_loss=0.2506, pruned_loss=0.05674, ctc_loss=0.1166, cr_loss=0.3598, over 34487.00 frames. ], tot_loss[loss=0.2348, simple_loss=0.2808, pruned_loss=0.07144, ctc_loss=0.1455, cr_loss=0.419, over 6656592.29 frames. ], batch size: 85, lr: 7.16e-03, grad_scale: 32.0 2024-09-17 23:26:18,579 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.028e+02 2.404e+02 2.684e+02 3.423e+02 6.968e+02, threshold=5.368e+02, percent-clipped=1.0 2024-09-17 23:26:32,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=293216.0, ans=0.05 2024-09-17 23:26:56,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=293309.3333333333, ans=0.0 2024-09-17 23:27:18,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=15.0 2024-09-17 23:27:27,626 INFO [train.py:1198] (1/2) Epoch 17, batch 850, loss[loss=0.2394, simple_loss=0.2878, pruned_loss=0.07202, ctc_loss=0.1486, cr_loss=0.4298, over 34385.00 frames. ], tot_loss[loss=0.2344, simple_loss=0.2805, pruned_loss=0.07128, ctc_loss=0.1452, cr_loss=0.4182, over 6690425.90 frames. ], batch size: 103, lr: 7.16e-03, grad_scale: 32.0 2024-09-17 23:27:42,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=293449.3333333333, ans=0.09899494936611666 2024-09-17 23:27:54,277 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:28:34,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2024-09-17 23:28:46,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=293589.3333333333, ans=0.0 2024-09-17 23:28:52,488 INFO [train.py:1198] (1/2) Epoch 17, batch 900, loss[loss=0.2028, simple_loss=0.2514, pruned_loss=0.05742, ctc_loss=0.1207, cr_loss=0.3812, over 34503.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.281, pruned_loss=0.07163, ctc_loss=0.1459, cr_loss=0.4187, over 6696521.95 frames. ], batch size: 85, lr: 7.16e-03, grad_scale: 32.0 2024-09-17 23:29:05,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.380e+02 2.867e+02 3.594e+02 5.933e+02, threshold=5.733e+02, percent-clipped=2.0 2024-09-17 23:29:14,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2024-09-17 23:29:27,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=293729.3333333333, ans=0.125 2024-09-17 23:29:27,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.69 vs. limit=12.0 2024-09-17 23:29:48,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.78 vs. limit=10.0 2024-09-17 23:30:16,120 INFO [train.py:1198] (1/2) Epoch 17, batch 950, loss[loss=0.2091, simple_loss=0.2595, pruned_loss=0.05933, ctc_loss=0.1245, cr_loss=0.3799, over 34707.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.281, pruned_loss=0.07162, ctc_loss=0.1459, cr_loss=0.4188, over 6702781.25 frames. ], batch size: 87, lr: 7.15e-03, grad_scale: 32.0 2024-09-17 23:30:29,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=293869.3333333333, ans=0.125 2024-09-17 23:30:30,204 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2024-09-17 23:30:44,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2024-09-17 23:30:50,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=293962.6666666667, ans=0.125 2024-09-17 23:30:55,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=293962.6666666667, ans=0.125 2024-09-17 23:30:57,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=293962.6666666667, ans=0.0 2024-09-17 23:31:21,572 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2024-09-17 23:31:34,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=294056.0, ans=0.025 2024-09-17 23:31:38,502 INFO [train.py:1198] (1/2) Epoch 17, batch 1000, loss[loss=0.2316, simple_loss=0.2792, pruned_loss=0.0698, ctc_loss=0.1386, cr_loss=0.4154, over 34481.00 frames. ], tot_loss[loss=0.236, simple_loss=0.2819, pruned_loss=0.07206, ctc_loss=0.1465, cr_loss=0.4199, over 6695982.77 frames. ], batch size: 90, lr: 7.15e-03, grad_scale: 32.0 2024-09-17 23:31:47,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=294102.6666666667, ans=0.2 2024-09-17 23:31:51,633 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.020e+02 2.388e+02 2.935e+02 3.906e+02 6.515e+02, threshold=5.869e+02, percent-clipped=2.0 2024-09-17 23:32:00,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff3.min_abs, batch_count=294149.3333333333, ans=0.2 2024-09-17 23:32:13,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=294196.0, ans=0.125 2024-09-17 23:32:13,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=294196.0, ans=0.125 2024-09-17 23:32:32,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=294242.6666666667, ans=0.125 2024-09-17 23:32:42,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=294242.6666666667, ans=0.2 2024-09-17 23:33:05,141 INFO [train.py:1198] (1/2) Epoch 17, batch 1050, loss[loss=0.2374, simple_loss=0.2864, pruned_loss=0.07101, ctc_loss=0.1478, cr_loss=0.4208, over 34567.00 frames. ], tot_loss[loss=0.2357, simple_loss=0.2812, pruned_loss=0.07201, ctc_loss=0.1465, cr_loss=0.4202, over 6705358.20 frames. ], batch size: 99, lr: 7.15e-03, grad_scale: 32.0 2024-09-17 23:33:21,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=294382.6666666667, ans=0.125 2024-09-17 23:33:26,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=294382.6666666667, ans=0.2 2024-09-17 23:33:30,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=294382.6666666667, ans=0.1 2024-09-17 23:34:20,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=294522.6666666667, ans=0.125 2024-09-17 23:34:27,892 INFO [train.py:1198] (1/2) Epoch 17, batch 1100, loss[loss=0.213, simple_loss=0.2609, pruned_loss=0.06212, ctc_loss=0.1299, cr_loss=0.3745, over 34354.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2811, pruned_loss=0.07192, ctc_loss=0.1464, cr_loss=0.4199, over 6717977.28 frames. ], batch size: 91, lr: 7.15e-03, grad_scale: 32.0 2024-09-17 23:34:40,954 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.370e+02 2.632e+02 3.463e+02 5.965e+02, threshold=5.264e+02, percent-clipped=1.0 2024-09-17 23:34:41,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=294569.3333333333, ans=0.125 2024-09-17 23:34:47,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=294616.0, ans=0.1 2024-09-17 23:34:56,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=294616.0, ans=0.0 2024-09-17 23:35:22,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=294709.3333333333, ans=0.1 2024-09-17 23:35:37,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=294756.0, ans=0.0 2024-09-17 23:35:37,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=294756.0, ans=0.025 2024-09-17 23:35:42,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=294756.0, ans=0.125 2024-09-17 23:35:50,395 INFO [train.py:1198] (1/2) Epoch 17, batch 1150, loss[loss=0.2395, simple_loss=0.2898, pruned_loss=0.07146, ctc_loss=0.1454, cr_loss=0.4296, over 34380.00 frames. ], tot_loss[loss=0.2359, simple_loss=0.2814, pruned_loss=0.07211, ctc_loss=0.1467, cr_loss=0.4203, over 6715739.21 frames. ], batch size: 91, lr: 7.14e-03, grad_scale: 32.0 2024-09-17 23:36:02,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=294802.6666666667, ans=0.2 2024-09-17 23:36:14,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=294849.3333333333, ans=0.025 2024-09-17 23:36:15,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2024-09-17 23:36:18,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=294849.3333333333, ans=0.1 2024-09-17 23:36:36,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=294896.0, ans=0.125 2024-09-17 23:36:58,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=294942.6666666667, ans=0.125 2024-09-17 23:37:17,833 INFO [train.py:1198] (1/2) Epoch 17, batch 1200, loss[loss=0.258, simple_loss=0.3038, pruned_loss=0.08114, ctc_loss=0.1627, cr_loss=0.4359, over 34546.00 frames. ], tot_loss[loss=0.2372, simple_loss=0.2828, pruned_loss=0.07257, ctc_loss=0.1478, cr_loss=0.4222, over 6709810.27 frames. ], batch size: 99, lr: 7.14e-03, grad_scale: 32.0 2024-09-17 23:37:32,630 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.114e+02 2.569e+02 2.980e+02 3.731e+02 6.240e+02, threshold=5.960e+02, percent-clipped=5.0 2024-09-17 23:38:07,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=295176.0, ans=0.0 2024-09-17 23:38:40,624 INFO [train.py:1198] (1/2) Epoch 17, batch 1250, loss[loss=0.2644, simple_loss=0.3087, pruned_loss=0.0837, ctc_loss=0.1667, cr_loss=0.4847, over 34353.00 frames. ], tot_loss[loss=0.2372, simple_loss=0.2829, pruned_loss=0.0725, ctc_loss=0.1477, cr_loss=0.423, over 6742857.65 frames. ], batch size: 107, lr: 7.14e-03, grad_scale: 32.0 2024-09-17 23:39:02,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=295316.0, ans=0.125 2024-09-17 23:39:03,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295316.0, ans=0.1 2024-09-17 23:39:07,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=295316.0, ans=0.0 2024-09-17 23:39:33,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=295409.3333333333, ans=0.0 2024-09-17 23:39:43,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=295409.3333333333, ans=0.0 2024-09-17 23:39:48,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=295456.0, ans=0.125 2024-09-17 23:40:04,971 INFO [train.py:1198] (1/2) Epoch 17, batch 1300, loss[loss=0.2437, simple_loss=0.2901, pruned_loss=0.07527, ctc_loss=0.1507, cr_loss=0.4158, over 32941.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.2822, pruned_loss=0.07217, ctc_loss=0.1471, cr_loss=0.4221, over 6746568.89 frames. ], batch size: 130, lr: 7.13e-03, grad_scale: 32.0 2024-09-17 23:40:15,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2024-09-17 23:40:19,774 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.432e+02 2.796e+02 3.678e+02 7.197e+02, threshold=5.592e+02, percent-clipped=2.0 2024-09-17 23:40:51,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=295596.0, ans=0.07 2024-09-17 23:41:01,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.06 vs. limit=15.0 2024-09-17 23:41:19,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=295689.3333333333, ans=0.125 2024-09-17 23:41:19,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=295689.3333333333, ans=0.125 2024-09-17 23:41:22,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-17 23:41:29,065 INFO [train.py:1198] (1/2) Epoch 17, batch 1350, loss[loss=0.2372, simple_loss=0.2836, pruned_loss=0.07204, ctc_loss=0.1452, cr_loss=0.4418, over 34525.00 frames. ], tot_loss[loss=0.2361, simple_loss=0.2819, pruned_loss=0.07202, ctc_loss=0.1467, cr_loss=0.4218, over 6765343.53 frames. ], batch size: 94, lr: 7.13e-03, grad_scale: 32.0 2024-09-17 23:41:29,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=295736.0, ans=0.0 2024-09-17 23:41:43,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=295782.6666666667, ans=0.125 2024-09-17 23:41:48,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.70 vs. limit=6.0 2024-09-17 23:41:48,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=295782.6666666667, ans=0.125 2024-09-17 23:41:48,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=295782.6666666667, ans=0.1 2024-09-17 23:42:08,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.52 vs. limit=15.0 2024-09-17 23:42:18,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=295876.0, ans=0.0 2024-09-17 23:42:21,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=295876.0, ans=0.0 2024-09-17 23:42:27,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2024-09-17 23:42:51,276 INFO [train.py:1198] (1/2) Epoch 17, batch 1400, loss[loss=0.205, simple_loss=0.2475, pruned_loss=0.06132, ctc_loss=0.1242, cr_loss=0.3757, over 34287.00 frames. ], tot_loss[loss=0.2356, simple_loss=0.2815, pruned_loss=0.0718, ctc_loss=0.1463, cr_loss=0.4208, over 6777378.57 frames. ], batch size: 80, lr: 7.13e-03, grad_scale: 32.0 2024-09-17 23:42:55,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.70 vs. limit=10.0 2024-09-17 23:43:01,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=295969.3333333333, ans=0.0 2024-09-17 23:43:05,942 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.978e+02 2.441e+02 2.766e+02 3.517e+02 5.709e+02, threshold=5.532e+02, percent-clipped=1.0 2024-09-17 23:43:09,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296016.0, ans=0.1 2024-09-17 23:43:11,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=15.0 2024-09-17 23:43:12,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=296016.0, ans=0.125 2024-09-17 23:43:21,736 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.29 vs. limit=6.0 2024-09-17 23:43:28,051 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.55 vs. limit=22.5 2024-09-17 23:43:41,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.20 vs. limit=15.0 2024-09-17 23:44:15,597 INFO [train.py:1198] (1/2) Epoch 17, batch 1450, loss[loss=0.2566, simple_loss=0.3038, pruned_loss=0.07909, ctc_loss=0.165, cr_loss=0.4581, over 34429.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2819, pruned_loss=0.07181, ctc_loss=0.1464, cr_loss=0.421, over 6773139.87 frames. ], batch size: 110, lr: 7.13e-03, grad_scale: 32.0 2024-09-17 23:44:19,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=296202.6666666667, ans=0.125 2024-09-17 23:44:27,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=296202.6666666667, ans=0.025 2024-09-17 23:45:03,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=296296.0, ans=0.0 2024-09-17 23:45:24,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=296389.3333333333, ans=0.035 2024-09-17 23:45:29,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=296389.3333333333, ans=0.125 2024-09-17 23:45:39,388 INFO [train.py:1198] (1/2) Epoch 17, batch 1500, loss[loss=0.2502, simple_loss=0.2983, pruned_loss=0.07672, ctc_loss=0.1552, cr_loss=0.4432, over 34479.00 frames. ], tot_loss[loss=0.2369, simple_loss=0.2827, pruned_loss=0.07231, ctc_loss=0.1473, cr_loss=0.4228, over 6773131.74 frames. ], batch size: 100, lr: 7.12e-03, grad_scale: 32.0 2024-09-17 23:45:54,354 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.551e+02 2.925e+02 3.370e+02 5.254e+02, threshold=5.850e+02, percent-clipped=0.0 2024-09-17 23:45:55,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.73 vs. limit=12.0 2024-09-17 23:46:04,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=296482.6666666667, ans=0.2 2024-09-17 23:46:47,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=296622.6666666667, ans=0.0 2024-09-17 23:46:55,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.50 vs. limit=15.0 2024-09-17 23:47:02,296 INFO [train.py:1198] (1/2) Epoch 17, batch 1550, loss[loss=0.258, simple_loss=0.299, pruned_loss=0.08312, ctc_loss=0.1636, cr_loss=0.4502, over 34385.00 frames. ], tot_loss[loss=0.237, simple_loss=0.2827, pruned_loss=0.07247, ctc_loss=0.1477, cr_loss=0.423, over 6745083.24 frames. ], batch size: 105, lr: 7.12e-03, grad_scale: 32.0 2024-09-17 23:47:07,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.81 vs. limit=6.0 2024-09-17 23:47:09,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=296669.3333333333, ans=0.1 2024-09-17 23:47:30,590 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:48:16,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296856.0, ans=0.1 2024-09-17 23:48:19,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=296856.0, ans=0.125 2024-09-17 23:48:19,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=296856.0, ans=0.125 2024-09-17 23:48:29,311 INFO [train.py:1198] (1/2) Epoch 17, batch 1600, loss[loss=0.2361, simple_loss=0.2866, pruned_loss=0.06989, ctc_loss=0.1432, cr_loss=0.4286, over 34561.00 frames. ], tot_loss[loss=0.2369, simple_loss=0.2826, pruned_loss=0.07242, ctc_loss=0.1476, cr_loss=0.4226, over 6723873.85 frames. ], batch size: 99, lr: 7.12e-03, grad_scale: 32.0 2024-09-17 23:48:36,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2024-09-17 23:48:44,084 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.478e+02 2.975e+02 4.007e+02 6.954e+02, threshold=5.950e+02, percent-clipped=4.0 2024-09-17 23:48:47,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=296949.3333333333, ans=0.0 2024-09-17 23:48:49,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=296949.3333333333, ans=0.1 2024-09-17 23:49:06,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=296996.0, ans=0.1 2024-09-17 23:49:06,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2024-09-17 23:49:09,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=296996.0, ans=0.2 2024-09-17 23:49:24,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=297042.6666666667, ans=0.125 2024-09-17 23:49:35,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=297089.3333333333, ans=0.02 2024-09-17 23:49:52,013 INFO [train.py:1198] (1/2) Epoch 17, batch 1650, loss[loss=0.2491, simple_loss=0.297, pruned_loss=0.07651, ctc_loss=0.1528, cr_loss=0.4423, over 34390.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.282, pruned_loss=0.07226, ctc_loss=0.1473, cr_loss=0.4219, over 6717096.06 frames. ], batch size: 103, lr: 7.11e-03, grad_scale: 32.0 2024-09-17 23:50:04,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=297136.0, ans=0.125 2024-09-17 23:50:06,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.37 vs. limit=10.0 2024-09-17 23:50:08,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.58 vs. limit=15.0 2024-09-17 23:50:14,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=297182.6666666667, ans=0.125 2024-09-17 23:50:30,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=297229.3333333333, ans=0.025 2024-09-17 23:50:41,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=297276.0, ans=0.2 2024-09-17 23:50:43,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=297276.0, ans=0.125 2024-09-17 23:51:01,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=297322.6666666667, ans=0.125 2024-09-17 23:51:06,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=297322.6666666667, ans=0.5 2024-09-17 23:51:10,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2024-09-17 23:51:11,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=297322.6666666667, ans=0.0 2024-09-17 23:51:14,340 INFO [train.py:1198] (1/2) Epoch 17, batch 1700, loss[loss=0.2046, simple_loss=0.2498, pruned_loss=0.05957, ctc_loss=0.124, cr_loss=0.3848, over 34291.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2815, pruned_loss=0.07171, ctc_loss=0.1463, cr_loss=0.4204, over 6743660.58 frames. ], batch size: 80, lr: 7.11e-03, grad_scale: 32.0 2024-09-17 23:51:18,490 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.55 vs. limit=22.5 2024-09-17 23:51:21,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=297369.3333333333, ans=0.0 2024-09-17 23:51:31,318 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.052e+02 2.520e+02 2.901e+02 3.830e+02 6.945e+02, threshold=5.802e+02, percent-clipped=2.0 2024-09-17 23:51:38,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=297416.0, ans=0.0 2024-09-17 23:51:53,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=297462.6666666667, ans=0.0 2024-09-17 23:52:05,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.23 vs. limit=12.0 2024-09-17 23:52:06,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=297509.3333333333, ans=0.125 2024-09-17 23:52:11,232 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-17 23:52:12,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=297509.3333333333, ans=0.0 2024-09-17 23:52:17,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=297509.3333333333, ans=0.2 2024-09-17 23:52:26,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=297556.0, ans=0.0 2024-09-17 23:52:34,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297556.0, ans=0.1 2024-09-17 23:52:40,661 INFO [train.py:1198] (1/2) Epoch 17, batch 1750, loss[loss=0.1976, simple_loss=0.2474, pruned_loss=0.05532, ctc_loss=0.1131, cr_loss=0.3639, over 34174.00 frames. ], tot_loss[loss=0.2353, simple_loss=0.2812, pruned_loss=0.0717, ctc_loss=0.1461, cr_loss=0.4199, over 6752899.36 frames. ], batch size: 78, lr: 7.11e-03, grad_scale: 32.0 2024-09-17 23:52:44,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=297602.6666666667, ans=0.125 2024-09-17 23:52:50,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=297602.6666666667, ans=0.125 2024-09-17 23:53:40,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=297742.6666666667, ans=0.125 2024-09-17 23:53:48,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297789.3333333333, ans=0.1 2024-09-17 23:54:03,198 INFO [train.py:1198] (1/2) Epoch 17, batch 1800, loss[loss=0.2465, simple_loss=0.2931, pruned_loss=0.07582, ctc_loss=0.1529, cr_loss=0.443, over 34696.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2815, pruned_loss=0.07192, ctc_loss=0.1465, cr_loss=0.421, over 6757135.45 frames. ], batch size: 97, lr: 7.11e-03, grad_scale: 32.0 2024-09-17 23:54:18,060 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.068e+02 2.590e+02 3.129e+02 3.865e+02 7.530e+02, threshold=6.258e+02, percent-clipped=2.0 2024-09-17 23:54:18,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297882.6666666667, ans=0.1 2024-09-17 23:54:24,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297882.6666666667, ans=0.1 2024-09-17 23:54:30,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=297882.6666666667, ans=0.025 2024-09-17 23:54:31,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=297882.6666666667, ans=0.07 2024-09-17 23:54:39,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=297929.3333333333, ans=0.0 2024-09-17 23:54:43,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=297929.3333333333, ans=0.125 2024-09-17 23:54:51,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=297976.0, ans=0.0 2024-09-17 23:54:59,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=297976.0, ans=0.125 2024-09-17 23:55:11,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=298022.6666666667, ans=0.125 2024-09-17 23:55:24,108 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.13 vs. limit=10.0 2024-09-17 23:55:26,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.78 vs. limit=15.0 2024-09-17 23:55:29,805 INFO [train.py:1198] (1/2) Epoch 17, batch 1850, loss[loss=0.246, simple_loss=0.2902, pruned_loss=0.07605, ctc_loss=0.158, cr_loss=0.451, over 34452.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2815, pruned_loss=0.07204, ctc_loss=0.1466, cr_loss=0.4209, over 6763258.57 frames. ], batch size: 100, lr: 7.10e-03, grad_scale: 16.0 2024-09-17 23:56:08,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=298162.6666666667, ans=0.125 2024-09-17 23:56:26,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=298209.3333333333, ans=0.0 2024-09-17 23:56:40,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=298256.0, ans=0.125 2024-09-17 23:56:51,974 INFO [train.py:1198] (1/2) Epoch 17, batch 1900, loss[loss=0.2448, simple_loss=0.2939, pruned_loss=0.0743, ctc_loss=0.1497, cr_loss=0.4286, over 34362.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.282, pruned_loss=0.07222, ctc_loss=0.147, cr_loss=0.4214, over 6771426.03 frames. ], batch size: 103, lr: 7.10e-03, grad_scale: 16.0 2024-09-17 23:57:05,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=298302.6666666667, ans=0.125 2024-09-17 23:57:07,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=298349.3333333333, ans=0.0 2024-09-17 23:57:08,489 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.097e+02 2.652e+02 3.437e+02 4.428e+02 1.627e+03, threshold=6.875e+02, percent-clipped=8.0 2024-09-17 23:57:32,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=298396.0, ans=0.125 2024-09-17 23:57:53,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=298442.6666666667, ans=0.015 2024-09-17 23:58:13,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=298536.0, ans=0.125 2024-09-17 23:58:14,567 INFO [train.py:1198] (1/2) Epoch 17, batch 1950, loss[loss=0.2366, simple_loss=0.2822, pruned_loss=0.07187, ctc_loss=0.1498, cr_loss=0.4325, over 34359.00 frames. ], tot_loss[loss=0.2373, simple_loss=0.2832, pruned_loss=0.07251, ctc_loss=0.1475, cr_loss=0.4225, over 6788777.22 frames. ], batch size: 91, lr: 7.10e-03, grad_scale: 16.0 2024-09-17 23:58:18,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=298536.0, ans=0.02 2024-09-17 23:58:26,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=298536.0, ans=0.0 2024-09-17 23:58:59,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=298629.3333333333, ans=10.0 2024-09-17 23:59:34,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=298722.6666666667, ans=0.1 2024-09-17 23:59:38,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.95 vs. limit=10.0 2024-09-17 23:59:47,653 INFO [train.py:1198] (1/2) Epoch 17, batch 2000, loss[loss=0.1944, simple_loss=0.2427, pruned_loss=0.05445, ctc_loss=0.1157, cr_loss=0.3513, over 34179.00 frames. ], tot_loss[loss=0.2381, simple_loss=0.2838, pruned_loss=0.07288, ctc_loss=0.1482, cr_loss=0.4235, over 6764472.20 frames. ], batch size: 78, lr: 7.10e-03, grad_scale: 32.0 2024-09-18 00:00:04,524 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.413e+02 2.775e+02 3.429e+02 7.593e+02, threshold=5.550e+02, percent-clipped=1.0 2024-09-18 00:00:06,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=298816.0, ans=0.125 2024-09-18 00:00:17,956 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:00:18,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=298816.0, ans=0.125 2024-09-18 00:00:19,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=298862.6666666667, ans=0.0 2024-09-18 00:00:41,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=298909.3333333333, ans=10.0 2024-09-18 00:00:42,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=298909.3333333333, ans=0.1 2024-09-18 00:00:47,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=298909.3333333333, ans=0.1 2024-09-18 00:00:54,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.54 vs. limit=15.0 2024-09-18 00:00:57,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=298956.0, ans=0.1 2024-09-18 00:01:01,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.19 vs. limit=6.0 2024-09-18 00:01:10,574 INFO [train.py:1198] (1/2) Epoch 17, batch 2050, loss[loss=0.2229, simple_loss=0.2628, pruned_loss=0.06945, ctc_loss=0.1409, cr_loss=0.395, over 34478.00 frames. ], tot_loss[loss=0.237, simple_loss=0.2826, pruned_loss=0.07251, ctc_loss=0.1476, cr_loss=0.4218, over 6756650.10 frames. ], batch size: 82, lr: 7.09e-03, grad_scale: 32.0 2024-09-18 00:01:12,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=299002.6666666667, ans=0.2 2024-09-18 00:02:13,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=299142.6666666667, ans=0.1 2024-09-18 00:02:32,795 INFO [train.py:1198] (1/2) Epoch 17, batch 2100, loss[loss=0.2317, simple_loss=0.2763, pruned_loss=0.07065, ctc_loss=0.1432, cr_loss=0.4298, over 34534.00 frames. ], tot_loss[loss=0.2357, simple_loss=0.2815, pruned_loss=0.07193, ctc_loss=0.1465, cr_loss=0.4197, over 6769575.44 frames. ], batch size: 94, lr: 7.09e-03, grad_scale: 32.0 2024-09-18 00:02:48,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=299282.6666666667, ans=0.09899494936611666 2024-09-18 00:02:49,198 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.511e+02 3.325e+02 3.960e+02 6.784e+02, threshold=6.650e+02, percent-clipped=9.0 2024-09-18 00:03:21,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299329.3333333333, ans=0.1 2024-09-18 00:03:23,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=299329.3333333333, ans=0.125 2024-09-18 00:03:37,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299376.0, ans=0.1 2024-09-18 00:03:56,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=299422.6666666667, ans=0.025 2024-09-18 00:03:57,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=299469.3333333333, ans=0.2 2024-09-18 00:03:59,141 INFO [train.py:1198] (1/2) Epoch 17, batch 2150, loss[loss=0.2198, simple_loss=0.2684, pruned_loss=0.06433, ctc_loss=0.1329, cr_loss=0.4, over 34362.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.2806, pruned_loss=0.07129, ctc_loss=0.1454, cr_loss=0.4184, over 6788970.53 frames. ], batch size: 91, lr: 7.09e-03, grad_scale: 32.0 2024-09-18 00:04:02,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=299469.3333333333, ans=0.125 2024-09-18 00:04:20,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=12.0 2024-09-18 00:04:40,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.24 vs. limit=15.0 2024-09-18 00:04:44,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=299562.6666666667, ans=0.0 2024-09-18 00:05:22,529 INFO [train.py:1198] (1/2) Epoch 17, batch 2200, loss[loss=0.2448, simple_loss=0.2938, pruned_loss=0.07423, ctc_loss=0.1507, cr_loss=0.4317, over 34446.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.2805, pruned_loss=0.07121, ctc_loss=0.1451, cr_loss=0.4178, over 6783676.46 frames. ], batch size: 100, lr: 7.08e-03, grad_scale: 32.0 2024-09-18 00:05:39,063 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.537e+02 3.286e+02 4.556e+02 8.344e+02, threshold=6.572e+02, percent-clipped=4.0 2024-09-18 00:05:40,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=299749.3333333333, ans=0.125 2024-09-18 00:05:44,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=299749.3333333333, ans=0.025 2024-09-18 00:05:52,718 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:06:02,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=299796.0, ans=0.125 2024-09-18 00:06:09,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=299796.0, ans=0.0 2024-09-18 00:06:27,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=299889.3333333333, ans=0.125 2024-09-18 00:06:27,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=299889.3333333333, ans=0.125 2024-09-18 00:06:30,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=299889.3333333333, ans=0.2 2024-09-18 00:06:35,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=299889.3333333333, ans=0.0 2024-09-18 00:06:38,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=299889.3333333333, ans=0.025 2024-09-18 00:06:44,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=299889.3333333333, ans=0.125 2024-09-18 00:06:46,967 INFO [train.py:1198] (1/2) Epoch 17, batch 2250, loss[loss=0.2215, simple_loss=0.2722, pruned_loss=0.06385, ctc_loss=0.1352, cr_loss=0.3989, over 34402.00 frames. ], tot_loss[loss=0.2344, simple_loss=0.2806, pruned_loss=0.07124, ctc_loss=0.1452, cr_loss=0.4179, over 6780436.52 frames. ], batch size: 95, lr: 7.08e-03, grad_scale: 32.0 2024-09-18 00:06:47,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=299936.0, ans=0.2 2024-09-18 00:07:15,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=299982.6666666667, ans=0.0 2024-09-18 00:07:17,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=299982.6666666667, ans=0.125 2024-09-18 00:07:21,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.95 vs. limit=10.0 2024-09-18 00:08:11,330 INFO [train.py:1198] (1/2) Epoch 17, batch 2300, loss[loss=0.2185, simple_loss=0.2626, pruned_loss=0.06551, ctc_loss=0.1382, cr_loss=0.3918, over 34270.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.2796, pruned_loss=0.07094, ctc_loss=0.1448, cr_loss=0.4171, over 6764856.87 frames. ], batch size: 83, lr: 7.08e-03, grad_scale: 32.0 2024-09-18 00:08:26,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=300216.0, ans=0.0 2024-09-18 00:08:27,577 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.715e+02 3.440e+02 4.177e+02 9.365e+02, threshold=6.881e+02, percent-clipped=6.0 2024-09-18 00:08:49,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=300262.6666666667, ans=0.0 2024-09-18 00:09:06,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=300309.3333333333, ans=0.07 2024-09-18 00:09:21,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=300356.0, ans=0.025 2024-09-18 00:09:33,915 INFO [train.py:1198] (1/2) Epoch 17, batch 2350, loss[loss=0.2399, simple_loss=0.2866, pruned_loss=0.07292, ctc_loss=0.1503, cr_loss=0.4305, over 34721.00 frames. ], tot_loss[loss=0.234, simple_loss=0.2798, pruned_loss=0.07116, ctc_loss=0.1452, cr_loss=0.4182, over 6771861.04 frames. ], batch size: 97, lr: 7.08e-03, grad_scale: 32.0 2024-09-18 00:09:52,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=300449.3333333333, ans=0.125 2024-09-18 00:10:06,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=300496.0, ans=15.0 2024-09-18 00:10:12,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=300496.0, ans=0.125 2024-09-18 00:10:40,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=300589.3333333333, ans=0.2 2024-09-18 00:10:44,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.82 vs. limit=15.0 2024-09-18 00:11:00,933 INFO [train.py:1198] (1/2) Epoch 17, batch 2400, loss[loss=0.2345, simple_loss=0.279, pruned_loss=0.0717, ctc_loss=0.1471, cr_loss=0.4277, over 34576.00 frames. ], tot_loss[loss=0.2346, simple_loss=0.2804, pruned_loss=0.07145, ctc_loss=0.1457, cr_loss=0.4187, over 6777427.80 frames. ], batch size: 89, lr: 7.07e-03, grad_scale: 32.0 2024-09-18 00:11:06,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=300636.0, ans=0.2 2024-09-18 00:11:13,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.35 vs. limit=15.0 2024-09-18 00:11:17,618 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.537e+02 2.903e+02 3.809e+02 6.296e+02, threshold=5.807e+02, percent-clipped=0.0 2024-09-18 00:11:17,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=300682.6666666667, ans=0.2 2024-09-18 00:11:21,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=300682.6666666667, ans=0.1 2024-09-18 00:11:24,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=300682.6666666667, ans=0.025 2024-09-18 00:11:34,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=300729.3333333333, ans=0.09899494936611666 2024-09-18 00:12:23,859 INFO [train.py:1198] (1/2) Epoch 17, batch 2450, loss[loss=0.2384, simple_loss=0.286, pruned_loss=0.07222, ctc_loss=0.1496, cr_loss=0.4137, over 34398.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2815, pruned_loss=0.072, ctc_loss=0.1466, cr_loss=0.4205, over 6751481.94 frames. ], batch size: 95, lr: 7.07e-03, grad_scale: 32.0 2024-09-18 00:12:34,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.61 vs. limit=12.0 2024-09-18 00:13:33,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.43 vs. limit=15.0 2024-09-18 00:13:42,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2024-09-18 00:13:46,126 INFO [train.py:1198] (1/2) Epoch 17, batch 2500, loss[loss=0.2304, simple_loss=0.282, pruned_loss=0.06758, ctc_loss=0.1373, cr_loss=0.4035, over 34462.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2813, pruned_loss=0.07175, ctc_loss=0.1462, cr_loss=0.4199, over 6763391.02 frames. ], batch size: 100, lr: 7.07e-03, grad_scale: 32.0 2024-09-18 00:13:55,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.23 vs. limit=10.0 2024-09-18 00:13:59,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=301102.6666666667, ans=0.125 2024-09-18 00:14:01,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=301149.3333333333, ans=0.07 2024-09-18 00:14:02,661 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.024e+02 2.424e+02 2.832e+02 3.515e+02 7.754e+02, threshold=5.663e+02, percent-clipped=5.0 2024-09-18 00:14:16,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=301149.3333333333, ans=0.2 2024-09-18 00:14:26,475 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:14:33,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=301196.0, ans=0.0 2024-09-18 00:14:50,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=301242.6666666667, ans=0.1 2024-09-18 00:14:52,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=22.5 2024-09-18 00:14:58,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=301289.3333333333, ans=0.0 2024-09-18 00:15:13,109 INFO [train.py:1198] (1/2) Epoch 17, batch 2550, loss[loss=0.2059, simple_loss=0.25, pruned_loss=0.06079, ctc_loss=0.1272, cr_loss=0.3682, over 34177.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.2812, pruned_loss=0.07148, ctc_loss=0.1459, cr_loss=0.4193, over 6765880.04 frames. ], batch size: 78, lr: 7.07e-03, grad_scale: 32.0 2024-09-18 00:15:18,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=301336.0, ans=0.2 2024-09-18 00:15:19,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.79 vs. limit=6.0 2024-09-18 00:15:29,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=301382.6666666667, ans=0.025 2024-09-18 00:15:36,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=301382.6666666667, ans=0.1 2024-09-18 00:15:38,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.82 vs. limit=15.0 2024-09-18 00:15:40,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.24 vs. limit=22.5 2024-09-18 00:16:00,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=301429.3333333333, ans=0.07 2024-09-18 00:16:09,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=301476.0, ans=0.025 2024-09-18 00:16:20,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.41 vs. limit=6.0 2024-09-18 00:16:23,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=301522.6666666667, ans=0.2 2024-09-18 00:16:25,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-09-18 00:16:33,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=301522.6666666667, ans=0.125 2024-09-18 00:16:36,099 INFO [train.py:1198] (1/2) Epoch 17, batch 2600, loss[loss=0.2247, simple_loss=0.2719, pruned_loss=0.06685, ctc_loss=0.1354, cr_loss=0.4198, over 34357.00 frames. ], tot_loss[loss=0.2353, simple_loss=0.2814, pruned_loss=0.07155, ctc_loss=0.1461, cr_loss=0.4199, over 6761583.98 frames. ], batch size: 91, lr: 7.06e-03, grad_scale: 32.0 2024-09-18 00:16:41,877 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.71 vs. limit=15.0 2024-09-18 00:16:45,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.46 vs. limit=15.0 2024-09-18 00:16:52,298 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.079e+02 2.508e+02 3.330e+02 4.291e+02 8.681e+02, threshold=6.660e+02, percent-clipped=11.0 2024-09-18 00:16:57,500 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:17:52,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=301756.0, ans=0.125 2024-09-18 00:17:58,986 INFO [train.py:1198] (1/2) Epoch 17, batch 2650, loss[loss=0.2468, simple_loss=0.2962, pruned_loss=0.07403, ctc_loss=0.1574, cr_loss=0.4481, over 34296.00 frames. ], tot_loss[loss=0.2357, simple_loss=0.2819, pruned_loss=0.07176, ctc_loss=0.1462, cr_loss=0.4205, over 6767418.38 frames. ], batch size: 117, lr: 7.06e-03, grad_scale: 32.0 2024-09-18 00:18:04,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=301802.6666666667, ans=0.1 2024-09-18 00:18:09,533 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:18:10,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-09-18 00:18:14,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=301849.3333333333, ans=0.125 2024-09-18 00:18:14,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.18 vs. limit=12.0 2024-09-18 00:18:53,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=301942.6666666667, ans=0.0 2024-09-18 00:19:00,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=301942.6666666667, ans=0.125 2024-09-18 00:19:12,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=301989.3333333333, ans=0.0 2024-09-18 00:19:16,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.49 vs. limit=10.0 2024-09-18 00:19:18,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=301989.3333333333, ans=0.0 2024-09-18 00:19:23,390 INFO [train.py:1198] (1/2) Epoch 17, batch 2700, loss[loss=0.2353, simple_loss=0.2866, pruned_loss=0.06917, ctc_loss=0.1463, cr_loss=0.4113, over 34619.00 frames. ], tot_loss[loss=0.2364, simple_loss=0.2825, pruned_loss=0.07206, ctc_loss=0.1468, cr_loss=0.4213, over 6761873.17 frames. ], batch size: 102, lr: 7.06e-03, grad_scale: 32.0 2024-09-18 00:19:23,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=302036.0, ans=0.125 2024-09-18 00:19:39,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 2.616e+02 3.085e+02 3.711e+02 6.176e+02, threshold=6.171e+02, percent-clipped=0.0 2024-09-18 00:19:51,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=302082.6666666667, ans=0.125 2024-09-18 00:20:04,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.77 vs. limit=10.0 2024-09-18 00:20:04,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=12.0 2024-09-18 00:20:15,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=302176.0, ans=0.0 2024-09-18 00:20:21,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=302176.0, ans=0.125 2024-09-18 00:20:38,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=302222.6666666667, ans=0.125 2024-09-18 00:20:40,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=302222.6666666667, ans=0.125 2024-09-18 00:20:46,544 INFO [train.py:1198] (1/2) Epoch 17, batch 2750, loss[loss=0.2249, simple_loss=0.2713, pruned_loss=0.06724, ctc_loss=0.1389, cr_loss=0.4058, over 34639.00 frames. ], tot_loss[loss=0.235, simple_loss=0.2811, pruned_loss=0.07149, ctc_loss=0.1458, cr_loss=0.419, over 6759468.46 frames. ], batch size: 88, lr: 7.06e-03, grad_scale: 32.0 2024-09-18 00:20:46,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=302269.3333333333, ans=0.0 2024-09-18 00:20:53,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=302269.3333333333, ans=0.0 2024-09-18 00:21:06,721 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:21:31,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=302362.6666666667, ans=0.125 2024-09-18 00:21:35,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=302362.6666666667, ans=0.125 2024-09-18 00:21:50,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=302409.3333333333, ans=0.125 2024-09-18 00:22:02,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=302456.0, ans=0.2 2024-09-18 00:22:03,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=302456.0, ans=10.0 2024-09-18 00:22:14,002 INFO [train.py:1198] (1/2) Epoch 17, batch 2800, loss[loss=0.2674, simple_loss=0.2995, pruned_loss=0.09027, ctc_loss=0.1844, cr_loss=0.4444, over 23493.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2813, pruned_loss=0.07174, ctc_loss=0.1463, cr_loss=0.4202, over 6738386.45 frames. ], batch size: 244, lr: 7.05e-03, grad_scale: 32.0 2024-09-18 00:22:25,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=302502.6666666667, ans=0.125 2024-09-18 00:22:25,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=302502.6666666667, ans=0.0 2024-09-18 00:22:30,579 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.060e+02 2.477e+02 2.777e+02 3.421e+02 6.506e+02, threshold=5.554e+02, percent-clipped=2.0 2024-09-18 00:22:35,108 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=12.0 2024-09-18 00:22:39,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=302549.3333333333, ans=0.0 2024-09-18 00:22:42,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=302549.3333333333, ans=0.2 2024-09-18 00:22:48,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.76 vs. limit=15.0 2024-09-18 00:22:49,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302596.0, ans=0.1 2024-09-18 00:23:05,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=302642.6666666667, ans=0.0 2024-09-18 00:23:13,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=302642.6666666667, ans=0.1 2024-09-18 00:23:13,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=302642.6666666667, ans=0.0 2024-09-18 00:23:23,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=302689.3333333333, ans=0.125 2024-09-18 00:23:35,005 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:23:36,169 INFO [train.py:1198] (1/2) Epoch 17, batch 2850, loss[loss=0.2299, simple_loss=0.2813, pruned_loss=0.06743, ctc_loss=0.137, cr_loss=0.4044, over 34490.00 frames. ], tot_loss[loss=0.2365, simple_loss=0.2822, pruned_loss=0.07223, ctc_loss=0.1472, cr_loss=0.4215, over 6721767.75 frames. ], batch size: 90, lr: 7.05e-03, grad_scale: 32.0 2024-09-18 00:24:04,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=302782.6666666667, ans=0.125 2024-09-18 00:24:43,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=302922.6666666667, ans=0.0 2024-09-18 00:24:48,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.75 vs. limit=12.0 2024-09-18 00:24:49,620 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.99 vs. limit=22.5 2024-09-18 00:24:58,374 INFO [train.py:1198] (1/2) Epoch 17, batch 2900, loss[loss=0.2399, simple_loss=0.2848, pruned_loss=0.07381, ctc_loss=0.1491, cr_loss=0.4376, over 34521.00 frames. ], tot_loss[loss=0.2369, simple_loss=0.283, pruned_loss=0.07224, ctc_loss=0.1471, cr_loss=0.4223, over 6752890.62 frames. ], batch size: 94, lr: 7.05e-03, grad_scale: 16.0 2024-09-18 00:24:58,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=302969.3333333333, ans=0.035 2024-09-18 00:25:00,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=302969.3333333333, ans=0.125 2024-09-18 00:25:12,157 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:25:15,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=303016.0, ans=0.0 2024-09-18 00:25:18,329 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.337e+02 2.690e+02 3.182e+02 7.481e+02, threshold=5.379e+02, percent-clipped=1.0 2024-09-18 00:25:24,694 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2024-09-18 00:25:25,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=303016.0, ans=0.125 2024-09-18 00:25:28,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=303016.0, ans=0.0 2024-09-18 00:25:30,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=303016.0, ans=0.025 2024-09-18 00:26:25,521 INFO [train.py:1198] (1/2) Epoch 17, batch 2950, loss[loss=0.2134, simple_loss=0.2623, pruned_loss=0.06186, ctc_loss=0.1273, cr_loss=0.3853, over 34635.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2814, pruned_loss=0.07166, ctc_loss=0.146, cr_loss=0.4201, over 6748298.49 frames. ], batch size: 88, lr: 7.04e-03, grad_scale: 16.0 2024-09-18 00:26:38,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=303202.6666666667, ans=0.125 2024-09-18 00:27:09,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=12.0 2024-09-18 00:27:14,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=303342.6666666667, ans=0.1 2024-09-18 00:27:38,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=303389.3333333333, ans=0.2 2024-09-18 00:27:48,598 INFO [train.py:1198] (1/2) Epoch 17, batch 3000, loss[loss=0.2352, simple_loss=0.2856, pruned_loss=0.06962, ctc_loss=0.1433, cr_loss=0.4252, over 34546.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2814, pruned_loss=0.07167, ctc_loss=0.146, cr_loss=0.4201, over 6749009.32 frames. ], batch size: 94, lr: 7.04e-03, grad_scale: 16.0 2024-09-18 00:27:48,598 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 00:27:52,083 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([3.7547, 3.5366, 3.4587, 3.3582], device='cuda:1') 2024-09-18 00:28:05,516 INFO [train.py:1230] (1/2) Epoch 17, validation: loss=0.1506, simple_loss=0.2476, pruned_loss=0.02247, ctc_loss=0.04316, cr_loss=1.691e-14, over 944034.00 frames. 2024-09-18 00:28:05,516 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 00:28:10,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=303436.0, ans=0.125 2024-09-18 00:28:17,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=303436.0, ans=0.1 2024-09-18 00:28:17,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=303436.0, ans=0.1 2024-09-18 00:28:23,784 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.524e+02 3.126e+02 3.885e+02 7.198e+02, threshold=6.252e+02, percent-clipped=8.0 2024-09-18 00:28:25,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=303482.6666666667, ans=0.0 2024-09-18 00:28:55,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=303576.0, ans=0.2 2024-09-18 00:29:10,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.54 vs. limit=15.0 2024-09-18 00:29:25,208 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.27 vs. limit=15.0 2024-09-18 00:29:29,184 INFO [train.py:1198] (1/2) Epoch 17, batch 3050, loss[loss=0.2327, simple_loss=0.2766, pruned_loss=0.07168, ctc_loss=0.1423, cr_loss=0.4269, over 34571.00 frames. ], tot_loss[loss=0.2362, simple_loss=0.2822, pruned_loss=0.07204, ctc_loss=0.1465, cr_loss=0.4209, over 6741584.90 frames. ], batch size: 89, lr: 7.04e-03, grad_scale: 16.0 2024-09-18 00:29:34,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=303669.3333333333, ans=0.125 2024-09-18 00:29:53,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.70 vs. limit=22.5 2024-09-18 00:30:15,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.67 vs. limit=15.0 2024-09-18 00:30:39,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=303856.0, ans=0.0 2024-09-18 00:30:47,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=303856.0, ans=0.125 2024-09-18 00:30:49,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=303856.0, ans=0.125 2024-09-18 00:30:52,179 INFO [train.py:1198] (1/2) Epoch 17, batch 3100, loss[loss=0.2474, simple_loss=0.2955, pruned_loss=0.07551, ctc_loss=0.1542, cr_loss=0.4382, over 34184.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2818, pruned_loss=0.07184, ctc_loss=0.1462, cr_loss=0.4204, over 6741236.24 frames. ], batch size: 117, lr: 7.04e-03, grad_scale: 16.0 2024-09-18 00:31:04,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.65 vs. limit=10.0 2024-09-18 00:31:09,978 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.364e+02 2.682e+02 3.215e+02 5.413e+02, threshold=5.365e+02, percent-clipped=0.0 2024-09-18 00:31:11,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=303949.3333333333, ans=10.0 2024-09-18 00:31:34,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=303996.0, ans=0.0 2024-09-18 00:31:49,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=304042.6666666667, ans=0.125 2024-09-18 00:32:02,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=304089.3333333333, ans=0.125 2024-09-18 00:32:07,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=304089.3333333333, ans=0.125 2024-09-18 00:32:13,403 INFO [train.py:1198] (1/2) Epoch 17, batch 3150, loss[loss=0.2412, simple_loss=0.2897, pruned_loss=0.07273, ctc_loss=0.1483, cr_loss=0.4391, over 33875.00 frames. ], tot_loss[loss=0.2357, simple_loss=0.2818, pruned_loss=0.07181, ctc_loss=0.1461, cr_loss=0.4201, over 6747953.85 frames. ], batch size: 122, lr: 7.03e-03, grad_scale: 16.0 2024-09-18 00:32:53,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.33 vs. limit=12.0 2024-09-18 00:33:06,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.78 vs. limit=22.5 2024-09-18 00:33:10,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=15.0 2024-09-18 00:33:12,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=304276.0, ans=0.0 2024-09-18 00:33:14,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-18 00:33:34,658 INFO [train.py:1198] (1/2) Epoch 17, batch 3200, loss[loss=0.2184, simple_loss=0.2677, pruned_loss=0.06386, ctc_loss=0.13, cr_loss=0.383, over 34549.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.2811, pruned_loss=0.07145, ctc_loss=0.1454, cr_loss=0.4193, over 6761154.86 frames. ], batch size: 94, lr: 7.03e-03, grad_scale: 32.0 2024-09-18 00:33:42,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=15.0 2024-09-18 00:33:44,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=304369.3333333333, ans=0.0 2024-09-18 00:33:51,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.37 vs. limit=15.0 2024-09-18 00:33:52,645 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.075e+02 2.860e+02 3.619e+02 4.609e+02 9.165e+02, threshold=7.239e+02, percent-clipped=10.0 2024-09-18 00:34:04,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=304416.0, ans=0.125 2024-09-18 00:34:09,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304462.6666666667, ans=0.1 2024-09-18 00:34:22,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=304509.3333333333, ans=0.0 2024-09-18 00:34:38,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=304556.0, ans=0.125 2024-09-18 00:34:43,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=304556.0, ans=0.05 2024-09-18 00:34:53,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=304556.0, ans=0.1 2024-09-18 00:34:54,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=304602.6666666667, ans=0.2 2024-09-18 00:34:55,908 INFO [train.py:1198] (1/2) Epoch 17, batch 3250, loss[loss=0.2394, simple_loss=0.2833, pruned_loss=0.0741, ctc_loss=0.1505, cr_loss=0.4303, over 34644.00 frames. ], tot_loss[loss=0.2353, simple_loss=0.2817, pruned_loss=0.07155, ctc_loss=0.1456, cr_loss=0.4198, over 6770312.55 frames. ], batch size: 98, lr: 7.03e-03, grad_scale: 32.0 2024-09-18 00:34:56,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=304602.6666666667, ans=0.125 2024-09-18 00:34:56,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=304602.6666666667, ans=0.125 2024-09-18 00:35:15,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=304649.3333333333, ans=0.0 2024-09-18 00:35:30,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=304696.0, ans=0.125 2024-09-18 00:35:53,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=304742.6666666667, ans=0.125 2024-09-18 00:35:54,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=15.0 2024-09-18 00:36:03,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=304789.3333333333, ans=0.0 2024-09-18 00:36:17,435 INFO [train.py:1198] (1/2) Epoch 17, batch 3300, loss[loss=0.2341, simple_loss=0.2846, pruned_loss=0.06965, ctc_loss=0.1404, cr_loss=0.407, over 33116.00 frames. ], tot_loss[loss=0.2342, simple_loss=0.2804, pruned_loss=0.07115, ctc_loss=0.145, cr_loss=0.4182, over 6768587.19 frames. ], batch size: 130, lr: 7.03e-03, grad_scale: 32.0 2024-09-18 00:36:37,070 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.050e+02 2.448e+02 3.148e+02 3.851e+02 6.314e+02, threshold=6.296e+02, percent-clipped=0.0 2024-09-18 00:36:37,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=304882.6666666667, ans=0.0 2024-09-18 00:36:52,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=304929.3333333333, ans=0.0 2024-09-18 00:37:00,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=304929.3333333333, ans=0.2 2024-09-18 00:37:09,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=304976.0, ans=0.125 2024-09-18 00:37:40,339 INFO [train.py:1198] (1/2) Epoch 17, batch 3350, loss[loss=0.2451, simple_loss=0.2907, pruned_loss=0.07561, ctc_loss=0.1561, cr_loss=0.4266, over 33887.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2814, pruned_loss=0.07178, ctc_loss=0.1463, cr_loss=0.4201, over 6742132.63 frames. ], batch size: 122, lr: 7.02e-03, grad_scale: 32.0 2024-09-18 00:37:45,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=305069.3333333333, ans=0.04949747468305833 2024-09-18 00:37:50,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-09-18 00:38:00,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=305116.0, ans=0.125 2024-09-18 00:38:19,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2024-09-18 00:38:24,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=305162.6666666667, ans=0.125 2024-09-18 00:39:01,078 INFO [train.py:1198] (1/2) Epoch 17, batch 3400, loss[loss=0.2006, simple_loss=0.2486, pruned_loss=0.0575, ctc_loss=0.1186, cr_loss=0.3457, over 34171.00 frames. ], tot_loss[loss=0.2358, simple_loss=0.2815, pruned_loss=0.07194, ctc_loss=0.1464, cr_loss=0.4204, over 6732445.52 frames. ], batch size: 78, lr: 7.02e-03, grad_scale: 32.0 2024-09-18 00:39:14,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=305302.6666666667, ans=0.125 2024-09-18 00:39:18,920 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 2.513e+02 2.818e+02 3.570e+02 5.849e+02, threshold=5.635e+02, percent-clipped=0.0 2024-09-18 00:39:28,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=305349.3333333333, ans=0.95 2024-09-18 00:39:43,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=305396.0, ans=0.125 2024-09-18 00:39:59,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=305442.6666666667, ans=0.2 2024-09-18 00:40:21,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=305536.0, ans=0.5 2024-09-18 00:40:22,838 INFO [train.py:1198] (1/2) Epoch 17, batch 3450, loss[loss=0.2482, simple_loss=0.2973, pruned_loss=0.07542, ctc_loss=0.1549, cr_loss=0.4345, over 32961.00 frames. ], tot_loss[loss=0.236, simple_loss=0.2819, pruned_loss=0.072, ctc_loss=0.1465, cr_loss=0.4208, over 6745625.53 frames. ], batch size: 130, lr: 7.02e-03, grad_scale: 32.0 2024-09-18 00:40:34,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=305536.0, ans=0.125 2024-09-18 00:40:46,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.81 vs. limit=12.0 2024-09-18 00:40:47,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=305582.6666666667, ans=0.025 2024-09-18 00:41:05,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=305629.3333333333, ans=0.125 2024-09-18 00:41:06,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=305629.3333333333, ans=0.0 2024-09-18 00:41:22,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=305676.0, ans=0.125 2024-09-18 00:41:44,988 INFO [train.py:1198] (1/2) Epoch 17, batch 3500, loss[loss=0.1974, simple_loss=0.2508, pruned_loss=0.05383, ctc_loss=0.1117, cr_loss=0.3477, over 34456.00 frames. ], tot_loss[loss=0.2355, simple_loss=0.2814, pruned_loss=0.07178, ctc_loss=0.146, cr_loss=0.4196, over 6747299.03 frames. ], batch size: 85, lr: 7.02e-03, grad_scale: 32.0 2024-09-18 00:41:45,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=305769.3333333333, ans=0.125 2024-09-18 00:41:51,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=305769.3333333333, ans=0.1 2024-09-18 00:41:56,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=305769.3333333333, ans=0.0 2024-09-18 00:41:56,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=305769.3333333333, ans=0.125 2024-09-18 00:42:02,993 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.550e+02 3.026e+02 3.773e+02 5.675e+02, threshold=6.052e+02, percent-clipped=1.0 2024-09-18 00:42:04,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2024-09-18 00:42:11,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=305816.0, ans=0.0 2024-09-18 00:42:35,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=305909.3333333333, ans=0.125 2024-09-18 00:42:48,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=305956.0, ans=0.0 2024-09-18 00:42:59,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=305956.0, ans=0.1 2024-09-18 00:43:05,602 INFO [train.py:1198] (1/2) Epoch 17, batch 3550, loss[loss=0.2454, simple_loss=0.2916, pruned_loss=0.07519, ctc_loss=0.154, cr_loss=0.448, over 34383.00 frames. ], tot_loss[loss=0.2354, simple_loss=0.2814, pruned_loss=0.07172, ctc_loss=0.1459, cr_loss=0.4197, over 6756762.24 frames. ], batch size: 103, lr: 7.01e-03, grad_scale: 32.0 2024-09-18 00:43:07,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=306002.6666666667, ans=0.2 2024-09-18 00:43:30,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=306049.3333333333, ans=0.125 2024-09-18 00:44:00,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.58 vs. limit=10.0 2024-09-18 00:44:06,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=306142.6666666667, ans=0.125 2024-09-18 00:44:19,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306189.3333333333, ans=0.1 2024-09-18 00:44:27,324 INFO [train.py:1198] (1/2) Epoch 17, batch 3600, loss[loss=0.219, simple_loss=0.2647, pruned_loss=0.06536, ctc_loss=0.1333, cr_loss=0.3964, over 34489.00 frames. ], tot_loss[loss=0.2353, simple_loss=0.2814, pruned_loss=0.07159, ctc_loss=0.1459, cr_loss=0.4204, over 6766323.41 frames. ], batch size: 90, lr: 7.01e-03, grad_scale: 32.0 2024-09-18 00:44:37,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=306236.0, ans=0.125 2024-09-18 00:44:44,782 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.531e+02 3.094e+02 4.073e+02 7.273e+02, threshold=6.187e+02, percent-clipped=5.0 2024-09-18 00:44:45,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=306282.6666666667, ans=0.1 2024-09-18 00:45:00,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=306329.3333333333, ans=0.05 2024-09-18 00:45:07,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=306329.3333333333, ans=0.025 2024-09-18 00:45:35,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=306422.6666666667, ans=0.025 2024-09-18 00:45:35,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=306422.6666666667, ans=0.2 2024-09-18 00:45:37,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=306422.6666666667, ans=0.0 2024-09-18 00:45:48,235 INFO [train.py:1198] (1/2) Epoch 17, batch 3650, loss[loss=0.2581, simple_loss=0.3013, pruned_loss=0.08168, ctc_loss=0.163, cr_loss=0.4725, over 34439.00 frames. ], tot_loss[loss=0.235, simple_loss=0.2812, pruned_loss=0.07149, ctc_loss=0.1457, cr_loss=0.4199, over 6769094.92 frames. ], batch size: 110, lr: 7.01e-03, grad_scale: 32.0 2024-09-18 00:45:53,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.48 vs. limit=6.0 2024-09-18 00:46:27,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=306562.6666666667, ans=0.125 2024-09-18 00:47:08,007 INFO [train.py:1198] (1/2) Epoch 17, batch 3700, loss[loss=0.2445, simple_loss=0.2944, pruned_loss=0.07415, ctc_loss=0.1493, cr_loss=0.413, over 34621.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.2812, pruned_loss=0.07137, ctc_loss=0.1455, cr_loss=0.4194, over 6782997.64 frames. ], batch size: 102, lr: 7.00e-03, grad_scale: 32.0 2024-09-18 00:47:17,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=306702.6666666667, ans=0.0 2024-09-18 00:47:21,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.35 vs. limit=22.5 2024-09-18 00:47:26,381 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.500e+02 3.221e+02 4.208e+02 7.602e+02, threshold=6.442e+02, percent-clipped=5.0 2024-09-18 00:47:26,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=306749.3333333333, ans=0.125 2024-09-18 00:47:28,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.75 vs. limit=10.0 2024-09-18 00:47:37,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.68 vs. limit=15.0 2024-09-18 00:47:47,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=306796.0, ans=0.09899494936611666 2024-09-18 00:47:51,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=306796.0, ans=10.0 2024-09-18 00:48:00,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=306842.6666666667, ans=0.125 2024-09-18 00:48:02,530 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:48:17,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=306889.3333333333, ans=0.2 2024-09-18 00:48:23,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=306889.3333333333, ans=0.2 2024-09-18 00:48:30,466 INFO [train.py:1198] (1/2) Epoch 17, batch 3750, loss[loss=0.2425, simple_loss=0.2927, pruned_loss=0.07312, ctc_loss=0.146, cr_loss=0.4239, over 34319.00 frames. ], tot_loss[loss=0.2382, simple_loss=0.2845, pruned_loss=0.07268, ctc_loss=0.148, cr_loss=0.4249, over 6785364.04 frames. ], batch size: 113, lr: 7.00e-03, grad_scale: 32.0 2024-09-18 00:48:41,189 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2024-09-18 00:48:49,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=306982.6666666667, ans=0.2 2024-09-18 00:48:51,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=306982.6666666667, ans=0.125 2024-09-18 00:49:03,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.11 vs. limit=15.0 2024-09-18 00:49:16,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=307029.3333333333, ans=0.0 2024-09-18 00:49:19,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=307076.0, ans=0.015 2024-09-18 00:49:42,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=307122.6666666667, ans=0.2 2024-09-18 00:49:52,060 INFO [train.py:1198] (1/2) Epoch 17, batch 3800, loss[loss=0.2623, simple_loss=0.2989, pruned_loss=0.08674, ctc_loss=0.172, cr_loss=0.445, over 30248.00 frames. ], tot_loss[loss=0.2419, simple_loss=0.2873, pruned_loss=0.07455, ctc_loss=0.1513, cr_loss=0.4298, over 6676860.88 frames. ], batch size: 175, lr: 7.00e-03, grad_scale: 32.0 2024-09-18 00:49:57,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=307169.3333333333, ans=0.0 2024-09-18 00:50:10,348 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.043e+02 2.398e+02 2.672e+02 2.974e+02 4.656e+02, threshold=5.344e+02, percent-clipped=0.0 2024-09-18 00:50:20,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=307216.0, ans=0.5 2024-09-18 00:50:32,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=307262.6666666667, ans=0.1 2024-09-18 00:50:34,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=307262.6666666667, ans=0.09899494936611666 2024-09-18 00:50:51,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=307309.3333333333, ans=0.2 2024-09-18 00:50:51,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=307309.3333333333, ans=0.125 2024-09-18 00:51:08,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=307356.0, ans=0.125 2024-09-18 00:51:11,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=307356.0, ans=0.0 2024-09-18 00:51:15,845 INFO [train.py:1198] (1/2) Epoch 17, batch 3850, loss[loss=0.2834, simple_loss=0.313, pruned_loss=0.09805, ctc_loss=0.1938, cr_loss=0.474, over 23055.00 frames. ], tot_loss[loss=0.2468, simple_loss=0.2903, pruned_loss=0.07729, ctc_loss=0.1569, cr_loss=0.4333, over 6253466.22 frames. ], batch size: 245, lr: 7.00e-03, grad_scale: 16.0 2024-09-18 00:51:28,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=22.5 2024-09-18 00:51:32,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=307449.3333333333, ans=0.125 2024-09-18 00:52:44,419 INFO [train.py:1198] (1/2) Epoch 18, batch 0, loss[loss=0.2261, simple_loss=0.2713, pruned_loss=0.06859, ctc_loss=0.1377, cr_loss=0.4018, over 34461.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2713, pruned_loss=0.06859, ctc_loss=0.1377, cr_loss=0.4018, over 34461.00 frames. ], batch size: 85, lr: 6.80e-03, grad_scale: 32.0 2024-09-18 00:52:44,419 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 00:53:01,438 INFO [train.py:1230] (1/2) Epoch 18, validation: loss=0.1513, simple_loss=0.2493, pruned_loss=0.0223, ctc_loss=0.043, cr_loss=1.78e-14, over 944034.00 frames. 2024-09-18 00:53:01,438 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 00:53:47,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=307622.0, ans=0.0 2024-09-18 00:53:48,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=307622.0, ans=0.125 2024-09-18 00:53:52,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=307668.6666666667, ans=0.125 2024-09-18 00:54:01,731 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.084e+02 2.538e+02 2.772e+02 2.996e+02 5.321e+02, threshold=5.544e+02, percent-clipped=0.0 2024-09-18 00:54:13,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=307715.3333333333, ans=0.2 2024-09-18 00:54:25,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=307762.0, ans=0.125 2024-09-18 00:54:26,607 INFO [train.py:1198] (1/2) Epoch 18, batch 50, loss[loss=0.2024, simple_loss=0.247, pruned_loss=0.0591, ctc_loss=0.1248, cr_loss=0.3637, over 34478.00 frames. ], tot_loss[loss=0.2366, simple_loss=0.2823, pruned_loss=0.07222, ctc_loss=0.1472, cr_loss=0.4236, over 1480401.10 frames. ], batch size: 82, lr: 6.79e-03, grad_scale: 32.0 2024-09-18 00:55:40,214 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 00:55:51,151 INFO [train.py:1198] (1/2) Epoch 18, batch 100, loss[loss=0.2202, simple_loss=0.2645, pruned_loss=0.06671, ctc_loss=0.1334, cr_loss=0.3959, over 34563.00 frames. ], tot_loss[loss=0.2384, simple_loss=0.2845, pruned_loss=0.07278, ctc_loss=0.1485, cr_loss=0.4257, over 2627508.04 frames. ], batch size: 89, lr: 6.79e-03, grad_scale: 32.0 2024-09-18 00:56:06,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=308042.0, ans=0.1 2024-09-18 00:56:09,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=308042.0, ans=0.0 2024-09-18 00:56:10,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=308042.0, ans=0.125 2024-09-18 00:56:16,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=308042.0, ans=0.125 2024-09-18 00:56:21,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=308042.0, ans=0.125 2024-09-18 00:56:38,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.66 vs. limit=15.0 2024-09-18 00:56:48,934 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.395e+02 2.846e+02 3.872e+02 7.641e+02, threshold=5.691e+02, percent-clipped=3.0 2024-09-18 00:57:13,355 INFO [train.py:1198] (1/2) Epoch 18, batch 150, loss[loss=0.2102, simple_loss=0.2558, pruned_loss=0.06158, ctc_loss=0.1267, cr_loss=0.4012, over 34470.00 frames. ], tot_loss[loss=0.2356, simple_loss=0.2821, pruned_loss=0.07155, ctc_loss=0.1461, cr_loss=0.4214, over 3554016.04 frames. ], batch size: 82, lr: 6.79e-03, grad_scale: 32.0 2024-09-18 00:57:17,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=308228.6666666667, ans=0.2 2024-09-18 00:57:42,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=308275.3333333333, ans=0.2 2024-09-18 00:57:45,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=308275.3333333333, ans=0.0 2024-09-18 00:58:39,654 INFO [train.py:1198] (1/2) Epoch 18, batch 200, loss[loss=0.2456, simple_loss=0.2911, pruned_loss=0.07584, ctc_loss=0.1554, cr_loss=0.4307, over 31956.00 frames. ], tot_loss[loss=0.2339, simple_loss=0.2803, pruned_loss=0.07091, ctc_loss=0.1447, cr_loss=0.4187, over 4270598.24 frames. ], batch size: 145, lr: 6.79e-03, grad_scale: 32.0 2024-09-18 00:59:37,521 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.059e+02 2.561e+02 3.138e+02 4.151e+02 6.684e+02, threshold=6.276e+02, percent-clipped=8.0 2024-09-18 00:59:43,425 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.22 vs. limit=15.0 2024-09-18 00:59:52,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=308648.6666666667, ans=0.125 2024-09-18 00:59:54,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=308648.6666666667, ans=0.125 2024-09-18 01:00:02,180 INFO [train.py:1198] (1/2) Epoch 18, batch 250, loss[loss=0.2625, simple_loss=0.3075, pruned_loss=0.08339, ctc_loss=0.1632, cr_loss=0.4538, over 34181.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.2809, pruned_loss=0.07113, ctc_loss=0.145, cr_loss=0.42, over 4833105.59 frames. ], batch size: 117, lr: 6.78e-03, grad_scale: 16.0 2024-09-18 01:00:07,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=308695.3333333333, ans=0.1 2024-09-18 01:00:09,792 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.35 vs. limit=15.0 2024-09-18 01:00:25,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=308742.0, ans=0.125 2024-09-18 01:00:37,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.47 vs. limit=22.5 2024-09-18 01:01:26,567 INFO [train.py:1198] (1/2) Epoch 18, batch 300, loss[loss=0.2656, simple_loss=0.3058, pruned_loss=0.08646, ctc_loss=0.1684, cr_loss=0.471, over 34372.00 frames. ], tot_loss[loss=0.2338, simple_loss=0.2803, pruned_loss=0.07082, ctc_loss=0.1445, cr_loss=0.419, over 5260939.52 frames. ], batch size: 107, lr: 6.78e-03, grad_scale: 16.0 2024-09-18 01:01:41,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.48 vs. limit=15.0 2024-09-18 01:01:47,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=308975.3333333333, ans=0.0 2024-09-18 01:01:47,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=308975.3333333333, ans=0.125 2024-09-18 01:01:50,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308975.3333333333, ans=0.1 2024-09-18 01:02:17,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=309068.6666666667, ans=0.125 2024-09-18 01:02:28,302 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.028e+02 2.422e+02 2.715e+02 3.536e+02 5.586e+02, threshold=5.429e+02, percent-clipped=0.0 2024-09-18 01:02:43,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=309115.3333333333, ans=0.125 2024-09-18 01:02:44,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.00 vs. limit=22.5 2024-09-18 01:02:51,554 INFO [train.py:1198] (1/2) Epoch 18, batch 350, loss[loss=0.2089, simple_loss=0.2584, pruned_loss=0.05934, ctc_loss=0.1267, cr_loss=0.3859, over 34275.00 frames. ], tot_loss[loss=0.2347, simple_loss=0.2811, pruned_loss=0.07123, ctc_loss=0.1455, cr_loss=0.4209, over 5597479.81 frames. ], batch size: 83, lr: 6.78e-03, grad_scale: 16.0 2024-09-18 01:03:02,245 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=14.30 vs. limit=15.0 2024-09-18 01:03:13,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=309208.6666666667, ans=0.025 2024-09-18 01:03:31,339 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:04:13,519 INFO [train.py:1198] (1/2) Epoch 18, batch 400, loss[loss=0.2295, simple_loss=0.2801, pruned_loss=0.06721, ctc_loss=0.14, cr_loss=0.4103, over 34419.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.2805, pruned_loss=0.07096, ctc_loss=0.145, cr_loss=0.4198, over 5865039.88 frames. ], batch size: 95, lr: 6.77e-03, grad_scale: 32.0 2024-09-18 01:04:34,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=309442.0, ans=0.1 2024-09-18 01:04:50,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2024-09-18 01:05:05,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=309535.3333333333, ans=0.025 2024-09-18 01:05:14,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=309535.3333333333, ans=0.125 2024-09-18 01:05:15,437 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.375e+02 2.681e+02 3.371e+02 9.269e+02, threshold=5.362e+02, percent-clipped=3.0 2024-09-18 01:05:27,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=309582.0, ans=0.125 2024-09-18 01:05:30,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=309582.0, ans=0.0 2024-09-18 01:05:40,529 INFO [train.py:1198] (1/2) Epoch 18, batch 450, loss[loss=0.2441, simple_loss=0.2894, pruned_loss=0.07506, ctc_loss=0.1565, cr_loss=0.4325, over 34690.00 frames. ], tot_loss[loss=0.2346, simple_loss=0.2809, pruned_loss=0.07126, ctc_loss=0.1454, cr_loss=0.4198, over 6054705.42 frames. ], batch size: 97, lr: 6.77e-03, grad_scale: 32.0 2024-09-18 01:05:45,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=309628.6666666667, ans=0.125 2024-09-18 01:05:59,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=309675.3333333333, ans=0.125 2024-09-18 01:06:00,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=309675.3333333333, ans=0.125 2024-09-18 01:06:04,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=309675.3333333333, ans=10.0 2024-09-18 01:06:15,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.62 vs. limit=15.0 2024-09-18 01:06:58,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=309815.3333333333, ans=0.025 2024-09-18 01:07:01,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=309862.0, ans=0.125 2024-09-18 01:07:02,980 INFO [train.py:1198] (1/2) Epoch 18, batch 500, loss[loss=0.2605, simple_loss=0.3016, pruned_loss=0.08379, ctc_loss=0.1655, cr_loss=0.4652, over 34460.00 frames. ], tot_loss[loss=0.2338, simple_loss=0.28, pruned_loss=0.07093, ctc_loss=0.1447, cr_loss=0.4189, over 6221229.05 frames. ], batch size: 110, lr: 6.77e-03, grad_scale: 32.0 2024-09-18 01:07:13,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=309862.0, ans=0.2 2024-09-18 01:07:21,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=309908.6666666667, ans=0.125 2024-09-18 01:07:25,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=309908.6666666667, ans=0.125 2024-09-18 01:07:59,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=310002.0, ans=0.125 2024-09-18 01:08:02,511 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.585e+02 3.185e+02 4.126e+02 6.509e+02, threshold=6.370e+02, percent-clipped=7.0 2024-09-18 01:08:02,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=310002.0, ans=0.125 2024-09-18 01:08:25,623 INFO [train.py:1198] (1/2) Epoch 18, batch 550, loss[loss=0.2495, simple_loss=0.2963, pruned_loss=0.07609, ctc_loss=0.1586, cr_loss=0.4706, over 33889.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.2799, pruned_loss=0.07084, ctc_loss=0.1447, cr_loss=0.4185, over 6331260.05 frames. ], batch size: 122, lr: 6.77e-03, grad_scale: 16.0 2024-09-18 01:08:47,620 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:09:15,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=310235.3333333333, ans=0.0 2024-09-18 01:09:17,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=310235.3333333333, ans=0.125 2024-09-18 01:09:52,671 INFO [train.py:1198] (1/2) Epoch 18, batch 600, loss[loss=0.2447, simple_loss=0.2915, pruned_loss=0.07539, ctc_loss=0.1514, cr_loss=0.4237, over 34190.00 frames. ], tot_loss[loss=0.2342, simple_loss=0.2805, pruned_loss=0.07103, ctc_loss=0.1449, cr_loss=0.4195, over 6433202.46 frames. ], batch size: 117, lr: 6.76e-03, grad_scale: 16.0 2024-09-18 01:10:04,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=310328.6666666667, ans=0.02 2024-09-18 01:10:06,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=310328.6666666667, ans=0.125 2024-09-18 01:10:25,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=310422.0, ans=0.125 2024-09-18 01:10:32,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=310422.0, ans=0.125 2024-09-18 01:10:53,225 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 2.594e+02 3.073e+02 4.173e+02 7.721e+02, threshold=6.147e+02, percent-clipped=4.0 2024-09-18 01:11:03,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=310515.3333333333, ans=0.0 2024-09-18 01:11:05,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=310515.3333333333, ans=0.1 2024-09-18 01:11:14,685 INFO [train.py:1198] (1/2) Epoch 18, batch 650, loss[loss=0.2358, simple_loss=0.2795, pruned_loss=0.07301, ctc_loss=0.1457, cr_loss=0.4255, over 34531.00 frames. ], tot_loss[loss=0.2329, simple_loss=0.2796, pruned_loss=0.07038, ctc_loss=0.1437, cr_loss=0.4172, over 6524937.97 frames. ], batch size: 94, lr: 6.76e-03, grad_scale: 16.0 2024-09-18 01:11:23,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=310562.0, ans=0.025 2024-09-18 01:11:28,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=310562.0, ans=0.1 2024-09-18 01:11:36,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2024-09-18 01:11:38,293 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:11:46,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=310655.3333333333, ans=0.125 2024-09-18 01:11:58,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=310655.3333333333, ans=0.0 2024-09-18 01:12:11,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=310702.0, ans=0.125 2024-09-18 01:12:26,781 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.07 vs. limit=15.0 2024-09-18 01:12:29,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=310748.6666666667, ans=0.1 2024-09-18 01:12:37,465 INFO [train.py:1198] (1/2) Epoch 18, batch 700, loss[loss=0.2292, simple_loss=0.2723, pruned_loss=0.06996, ctc_loss=0.1421, cr_loss=0.4424, over 34596.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.2797, pruned_loss=0.07024, ctc_loss=0.1436, cr_loss=0.4175, over 6580141.51 frames. ], batch size: 89, lr: 6.76e-03, grad_scale: 8.0 2024-09-18 01:12:51,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=310795.3333333333, ans=0.0 2024-09-18 01:12:54,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=310842.0, ans=0.2 2024-09-18 01:13:44,643 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.149e+02 2.590e+02 3.266e+02 4.174e+02 7.165e+02, threshold=6.533e+02, percent-clipped=4.0 2024-09-18 01:13:55,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=310982.0, ans=0.2 2024-09-18 01:13:56,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=310982.0, ans=0.1 2024-09-18 01:14:04,479 INFO [train.py:1198] (1/2) Epoch 18, batch 750, loss[loss=0.2277, simple_loss=0.2752, pruned_loss=0.06824, ctc_loss=0.1382, cr_loss=0.4022, over 34430.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2793, pruned_loss=0.07002, ctc_loss=0.1432, cr_loss=0.4169, over 6625191.12 frames. ], batch size: 95, lr: 6.76e-03, grad_scale: 8.0 2024-09-18 01:14:20,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.30 vs. limit=15.0 2024-09-18 01:14:27,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=22.5 2024-09-18 01:14:32,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=311075.3333333333, ans=0.125 2024-09-18 01:14:41,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.55 vs. limit=15.0 2024-09-18 01:14:49,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.28 vs. limit=22.5 2024-09-18 01:15:04,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=311168.6666666667, ans=0.2 2024-09-18 01:15:22,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=311215.3333333333, ans=0.125 2024-09-18 01:15:26,569 INFO [train.py:1198] (1/2) Epoch 18, batch 800, loss[loss=0.2147, simple_loss=0.2615, pruned_loss=0.06315, ctc_loss=0.1293, cr_loss=0.3949, over 34471.00 frames. ], tot_loss[loss=0.2324, simple_loss=0.2792, pruned_loss=0.07014, ctc_loss=0.1433, cr_loss=0.4167, over 6660398.30 frames. ], batch size: 85, lr: 6.75e-03, grad_scale: 16.0 2024-09-18 01:15:38,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=311262.0, ans=0.2 2024-09-18 01:15:46,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=311308.6666666667, ans=0.125 2024-09-18 01:15:51,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=311308.6666666667, ans=0.0 2024-09-18 01:16:03,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2024-09-18 01:16:21,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=311402.0, ans=0.125 2024-09-18 01:16:28,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=311402.0, ans=0.1 2024-09-18 01:16:29,491 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.021e+02 2.394e+02 2.724e+02 3.347e+02 4.900e+02, threshold=5.449e+02, percent-clipped=0.0 2024-09-18 01:16:40,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=311448.6666666667, ans=22.5 2024-09-18 01:16:51,055 INFO [train.py:1198] (1/2) Epoch 18, batch 850, loss[loss=0.2514, simple_loss=0.2991, pruned_loss=0.07677, ctc_loss=0.1589, cr_loss=0.4625, over 34400.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2791, pruned_loss=0.07012, ctc_loss=0.1432, cr_loss=0.417, over 6693700.63 frames. ], batch size: 103, lr: 6.75e-03, grad_scale: 16.0 2024-09-18 01:16:52,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=311495.3333333333, ans=22.5 2024-09-18 01:16:53,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=311495.3333333333, ans=0.125 2024-09-18 01:17:27,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=311588.6666666667, ans=0.05 2024-09-18 01:17:40,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=311635.3333333333, ans=0.125 2024-09-18 01:17:52,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=311635.3333333333, ans=0.04949747468305833 2024-09-18 01:18:14,996 INFO [train.py:1198] (1/2) Epoch 18, batch 900, loss[loss=0.2087, simple_loss=0.2596, pruned_loss=0.05872, ctc_loss=0.1241, cr_loss=0.3909, over 34466.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2792, pruned_loss=0.07022, ctc_loss=0.1435, cr_loss=0.4169, over 6699606.95 frames. ], batch size: 85, lr: 6.75e-03, grad_scale: 8.0 2024-09-18 01:18:50,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=311822.0, ans=0.125 2024-09-18 01:19:09,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.35 vs. limit=10.0 2024-09-18 01:19:19,572 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.424e+02 2.772e+02 3.602e+02 6.827e+02, threshold=5.545e+02, percent-clipped=4.0 2024-09-18 01:19:26,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=311915.3333333333, ans=0.0 2024-09-18 01:19:36,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=311962.0, ans=0.1 2024-09-18 01:19:37,555 INFO [train.py:1198] (1/2) Epoch 18, batch 950, loss[loss=0.207, simple_loss=0.2587, pruned_loss=0.05831, ctc_loss=0.1199, cr_loss=0.3676, over 34667.00 frames. ], tot_loss[loss=0.2324, simple_loss=0.2792, pruned_loss=0.07012, ctc_loss=0.1435, cr_loss=0.4166, over 6701835.51 frames. ], batch size: 87, lr: 6.75e-03, grad_scale: 8.0 2024-09-18 01:19:40,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.24 vs. limit=15.0 2024-09-18 01:19:44,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2024-09-18 01:19:45,194 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.84 vs. limit=15.0 2024-09-18 01:19:54,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=312008.6666666667, ans=0.125 2024-09-18 01:19:54,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=312008.6666666667, ans=0.125 2024-09-18 01:19:58,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=312008.6666666667, ans=0.0 2024-09-18 01:20:20,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=312055.3333333333, ans=0.125 2024-09-18 01:20:20,592 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:20:26,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=312055.3333333333, ans=0.0 2024-09-18 01:20:27,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=312102.0, ans=0.1 2024-09-18 01:20:27,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=312102.0, ans=0.125 2024-09-18 01:20:44,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=312102.0, ans=0.1 2024-09-18 01:21:03,733 INFO [train.py:1198] (1/2) Epoch 18, batch 1000, loss[loss=0.2218, simple_loss=0.2661, pruned_loss=0.06721, ctc_loss=0.1377, cr_loss=0.3871, over 34516.00 frames. ], tot_loss[loss=0.2334, simple_loss=0.28, pruned_loss=0.07061, ctc_loss=0.1443, cr_loss=0.4177, over 6695899.00 frames. ], batch size: 90, lr: 6.74e-03, grad_scale: 8.0 2024-09-18 01:21:17,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=312195.3333333333, ans=0.025 2024-09-18 01:21:17,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=312195.3333333333, ans=0.0 2024-09-18 01:21:27,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.14 vs. limit=15.0 2024-09-18 01:21:30,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=312242.0, ans=0.0 2024-09-18 01:21:42,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=312288.6666666667, ans=0.125 2024-09-18 01:21:45,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=312288.6666666667, ans=10.0 2024-09-18 01:21:47,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=312288.6666666667, ans=0.125 2024-09-18 01:21:55,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=312335.3333333333, ans=0.125 2024-09-18 01:22:01,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=312335.3333333333, ans=0.125 2024-09-18 01:22:07,863 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.353e+02 2.756e+02 3.235e+02 5.419e+02, threshold=5.512e+02, percent-clipped=0.0 2024-09-18 01:22:08,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.88 vs. limit=15.0 2024-09-18 01:22:25,807 INFO [train.py:1198] (1/2) Epoch 18, batch 1050, loss[loss=0.2432, simple_loss=0.2951, pruned_loss=0.07259, ctc_loss=0.1464, cr_loss=0.4199, over 34561.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2799, pruned_loss=0.07074, ctc_loss=0.1444, cr_loss=0.418, over 6704806.29 frames. ], batch size: 99, lr: 6.74e-03, grad_scale: 8.0 2024-09-18 01:22:27,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=312428.6666666667, ans=0.1 2024-09-18 01:22:47,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=312475.3333333333, ans=0.125 2024-09-18 01:23:02,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=312522.0, ans=0.125 2024-09-18 01:23:05,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=312522.0, ans=0.0 2024-09-18 01:23:09,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=312522.0, ans=0.2 2024-09-18 01:23:12,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=312522.0, ans=0.025 2024-09-18 01:23:43,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=312615.3333333333, ans=0.125 2024-09-18 01:23:48,193 INFO [train.py:1198] (1/2) Epoch 18, batch 1100, loss[loss=0.2348, simple_loss=0.2789, pruned_loss=0.072, ctc_loss=0.1457, cr_loss=0.4405, over 34369.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.2798, pruned_loss=0.07059, ctc_loss=0.1441, cr_loss=0.4178, over 6715522.09 frames. ], batch size: 91, lr: 6.74e-03, grad_scale: 8.0 2024-09-18 01:24:35,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=312755.3333333333, ans=10.0 2024-09-18 01:24:37,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=312755.3333333333, ans=0.125 2024-09-18 01:24:43,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=312802.0, ans=0.125 2024-09-18 01:24:47,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=312802.0, ans=0.025 2024-09-18 01:24:56,837 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.395e+02 2.872e+02 3.693e+02 5.292e+02, threshold=5.745e+02, percent-clipped=0.0 2024-09-18 01:25:02,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=312848.6666666667, ans=0.125 2024-09-18 01:25:05,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=312848.6666666667, ans=0.2 2024-09-18 01:25:10,893 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:25:15,238 INFO [train.py:1198] (1/2) Epoch 18, batch 1150, loss[loss=0.2354, simple_loss=0.2777, pruned_loss=0.07309, ctc_loss=0.1492, cr_loss=0.4242, over 34349.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2799, pruned_loss=0.07074, ctc_loss=0.1443, cr_loss=0.4176, over 6712913.84 frames. ], batch size: 91, lr: 6.74e-03, grad_scale: 8.0 2024-09-18 01:25:30,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=312942.0, ans=0.125 2024-09-18 01:25:33,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=312942.0, ans=0.125 2024-09-18 01:26:02,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=312988.6666666667, ans=0.125 2024-09-18 01:26:10,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=313035.3333333333, ans=0.07 2024-09-18 01:26:15,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=313035.3333333333, ans=0.2 2024-09-18 01:26:19,269 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2024-09-18 01:26:21,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=313082.0, ans=0.2 2024-09-18 01:26:28,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=313082.0, ans=0.125 2024-09-18 01:26:29,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=313082.0, ans=0.1 2024-09-18 01:26:35,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.64 vs. limit=15.0 2024-09-18 01:26:36,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=313128.6666666667, ans=0.125 2024-09-18 01:26:37,766 INFO [train.py:1198] (1/2) Epoch 18, batch 1200, loss[loss=0.2448, simple_loss=0.292, pruned_loss=0.075, ctc_loss=0.1522, cr_loss=0.4303, over 34571.00 frames. ], tot_loss[loss=0.2342, simple_loss=0.2807, pruned_loss=0.07095, ctc_loss=0.1448, cr_loss=0.4185, over 6704751.93 frames. ], batch size: 99, lr: 6.73e-03, grad_scale: 16.0 2024-09-18 01:26:38,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=22.00 vs. limit=22.5 2024-09-18 01:26:43,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=313128.6666666667, ans=0.0 2024-09-18 01:27:29,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=313268.6666666667, ans=0.125 2024-09-18 01:27:42,372 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.970e+02 2.467e+02 2.865e+02 3.480e+02 7.666e+02, threshold=5.730e+02, percent-clipped=3.0 2024-09-18 01:27:52,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=313315.3333333333, ans=0.07 2024-09-18 01:28:02,696 INFO [train.py:1198] (1/2) Epoch 18, batch 1250, loss[loss=0.2535, simple_loss=0.3, pruned_loss=0.07854, ctc_loss=0.1568, cr_loss=0.4643, over 34346.00 frames. ], tot_loss[loss=0.2346, simple_loss=0.2811, pruned_loss=0.07113, ctc_loss=0.1452, cr_loss=0.4197, over 6738804.95 frames. ], batch size: 107, lr: 6.73e-03, grad_scale: 16.0 2024-09-18 01:28:12,178 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2024-09-18 01:28:16,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=313362.0, ans=0.2 2024-09-18 01:28:18,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.65 vs. limit=22.5 2024-09-18 01:28:21,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=313408.6666666667, ans=0.125 2024-09-18 01:28:41,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=313455.3333333333, ans=0.2 2024-09-18 01:28:41,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=313455.3333333333, ans=0.125 2024-09-18 01:29:06,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=313502.0, ans=0.125 2024-09-18 01:29:08,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=313502.0, ans=0.0 2024-09-18 01:29:14,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=313548.6666666667, ans=0.125 2024-09-18 01:29:17,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.77 vs. limit=15.0 2024-09-18 01:29:27,191 INFO [train.py:1198] (1/2) Epoch 18, batch 1300, loss[loss=0.2433, simple_loss=0.2909, pruned_loss=0.0742, ctc_loss=0.1511, cr_loss=0.4289, over 33211.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.2799, pruned_loss=0.07048, ctc_loss=0.144, cr_loss=0.4171, over 6742126.70 frames. ], batch size: 130, lr: 6.73e-03, grad_scale: 16.0 2024-09-18 01:29:30,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.15 vs. limit=22.5 2024-09-18 01:30:03,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.42 vs. limit=15.0 2024-09-18 01:30:20,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=313735.3333333333, ans=0.0 2024-09-18 01:30:22,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=313735.3333333333, ans=0.0 2024-09-18 01:30:28,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=313735.3333333333, ans=0.125 2024-09-18 01:30:31,833 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.473e+02 2.825e+02 3.761e+02 6.158e+02, threshold=5.650e+02, percent-clipped=3.0 2024-09-18 01:30:50,234 INFO [train.py:1198] (1/2) Epoch 18, batch 1350, loss[loss=0.2292, simple_loss=0.2732, pruned_loss=0.06983, ctc_loss=0.1449, cr_loss=0.4131, over 34549.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2795, pruned_loss=0.07029, ctc_loss=0.1437, cr_loss=0.4175, over 6760950.79 frames. ], batch size: 94, lr: 6.73e-03, grad_scale: 16.0 2024-09-18 01:31:26,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=313922.0, ans=0.1 2024-09-18 01:31:41,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=313968.6666666667, ans=0.125 2024-09-18 01:31:48,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=313968.6666666667, ans=0.0 2024-09-18 01:32:16,668 INFO [train.py:1198] (1/2) Epoch 18, batch 1400, loss[loss=0.1985, simple_loss=0.2463, pruned_loss=0.05629, ctc_loss=0.1192, cr_loss=0.3572, over 34332.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.279, pruned_loss=0.07011, ctc_loss=0.1433, cr_loss=0.4165, over 6774306.13 frames. ], batch size: 80, lr: 6.73e-03, grad_scale: 16.0 2024-09-18 01:32:18,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=314062.0, ans=0.0 2024-09-18 01:32:28,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=314062.0, ans=0.125 2024-09-18 01:32:33,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=314108.6666666667, ans=0.125 2024-09-18 01:32:45,487 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.80 vs. limit=6.0 2024-09-18 01:32:49,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=314155.3333333333, ans=0.0 2024-09-18 01:32:53,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=314155.3333333333, ans=0.125 2024-09-18 01:33:13,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=15.0 2024-09-18 01:33:15,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=314202.0, ans=0.125 2024-09-18 01:33:21,411 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.416e+02 2.850e+02 4.143e+02 6.857e+02, threshold=5.700e+02, percent-clipped=8.0 2024-09-18 01:33:23,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=12.0 2024-09-18 01:33:39,652 INFO [train.py:1198] (1/2) Epoch 18, batch 1450, loss[loss=0.2409, simple_loss=0.2909, pruned_loss=0.0722, ctc_loss=0.1454, cr_loss=0.4364, over 34450.00 frames. ], tot_loss[loss=0.233, simple_loss=0.2799, pruned_loss=0.0704, ctc_loss=0.1438, cr_loss=0.4174, over 6771701.15 frames. ], batch size: 110, lr: 6.72e-03, grad_scale: 16.0 2024-09-18 01:34:25,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=314388.6666666667, ans=0.125 2024-09-18 01:34:38,379 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:34:57,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=314482.0, ans=0.125 2024-09-18 01:35:00,957 INFO [train.py:1198] (1/2) Epoch 18, batch 1500, loss[loss=0.2473, simple_loss=0.2948, pruned_loss=0.07608, ctc_loss=0.1519, cr_loss=0.4337, over 34478.00 frames. ], tot_loss[loss=0.2334, simple_loss=0.2803, pruned_loss=0.07049, ctc_loss=0.144, cr_loss=0.4178, over 6771128.65 frames. ], batch size: 100, lr: 6.72e-03, grad_scale: 16.0 2024-09-18 01:35:06,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=314528.6666666667, ans=0.2 2024-09-18 01:35:09,256 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.98 vs. limit=15.0 2024-09-18 01:35:15,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=314528.6666666667, ans=0.1 2024-09-18 01:35:18,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=314575.3333333333, ans=0.0 2024-09-18 01:35:57,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=314668.6666666667, ans=0.125 2024-09-18 01:35:59,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=314668.6666666667, ans=0.1 2024-09-18 01:36:10,433 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.475e+02 2.817e+02 3.680e+02 6.411e+02, threshold=5.634e+02, percent-clipped=4.0 2024-09-18 01:36:19,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=314715.3333333333, ans=0.2 2024-09-18 01:36:28,540 INFO [train.py:1198] (1/2) Epoch 18, batch 1550, loss[loss=0.2335, simple_loss=0.2849, pruned_loss=0.06837, ctc_loss=0.1408, cr_loss=0.4313, over 34388.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.2804, pruned_loss=0.0707, ctc_loss=0.1445, cr_loss=0.4178, over 6742490.31 frames. ], batch size: 105, lr: 6.72e-03, grad_scale: 16.0 2024-09-18 01:36:34,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=314762.0, ans=0.025 2024-09-18 01:36:58,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=314808.6666666667, ans=0.025 2024-09-18 01:37:27,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2024-09-18 01:37:35,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=314948.6666666667, ans=0.125 2024-09-18 01:37:37,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.27 vs. limit=22.5 2024-09-18 01:37:43,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=314948.6666666667, ans=0.125 2024-09-18 01:37:51,157 INFO [train.py:1198] (1/2) Epoch 18, batch 1600, loss[loss=0.2434, simple_loss=0.293, pruned_loss=0.07278, ctc_loss=0.1531, cr_loss=0.4421, over 34565.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.2803, pruned_loss=0.07076, ctc_loss=0.1445, cr_loss=0.4179, over 6722282.71 frames. ], batch size: 99, lr: 6.72e-03, grad_scale: 32.0 2024-09-18 01:38:14,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=315042.0, ans=0.025 2024-09-18 01:38:17,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=315042.0, ans=0.125 2024-09-18 01:38:26,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=315088.6666666667, ans=0.125 2024-09-18 01:38:31,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=315088.6666666667, ans=0.125 2024-09-18 01:38:33,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=315088.6666666667, ans=0.09899494936611666 2024-09-18 01:38:34,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=315088.6666666667, ans=0.0 2024-09-18 01:38:46,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=315135.3333333333, ans=0.125 2024-09-18 01:38:46,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.31 vs. limit=22.5 2024-09-18 01:38:55,678 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.426e+02 2.985e+02 3.851e+02 7.159e+02, threshold=5.970e+02, percent-clipped=6.0 2024-09-18 01:39:09,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=315182.0, ans=0.125 2024-09-18 01:39:15,436 INFO [train.py:1198] (1/2) Epoch 18, batch 1650, loss[loss=0.2318, simple_loss=0.2858, pruned_loss=0.06603, ctc_loss=0.1417, cr_loss=0.4334, over 34385.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2801, pruned_loss=0.07067, ctc_loss=0.1445, cr_loss=0.4181, over 6715303.43 frames. ], batch size: 103, lr: 6.71e-03, grad_scale: 32.0 2024-09-18 01:39:45,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=315275.3333333333, ans=0.035 2024-09-18 01:40:02,242 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:40:39,543 INFO [train.py:1198] (1/2) Epoch 18, batch 1700, loss[loss=0.2091, simple_loss=0.2528, pruned_loss=0.06248, ctc_loss=0.1253, cr_loss=0.3863, over 34307.00 frames. ], tot_loss[loss=0.234, simple_loss=0.2805, pruned_loss=0.07088, ctc_loss=0.1447, cr_loss=0.4189, over 6741099.01 frames. ], batch size: 80, lr: 6.71e-03, grad_scale: 32.0 2024-09-18 01:40:39,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=315462.0, ans=0.125 2024-09-18 01:40:45,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=315462.0, ans=0.1 2024-09-18 01:41:44,119 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.596e+02 3.015e+02 3.744e+02 8.110e+02, threshold=6.030e+02, percent-clipped=3.0 2024-09-18 01:41:51,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.15 vs. limit=15.0 2024-09-18 01:41:52,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=315648.6666666667, ans=0.125 2024-09-18 01:41:54,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=315648.6666666667, ans=0.0 2024-09-18 01:42:02,314 INFO [train.py:1198] (1/2) Epoch 18, batch 1750, loss[loss=0.2082, simple_loss=0.2563, pruned_loss=0.06021, ctc_loss=0.1236, cr_loss=0.3753, over 34120.00 frames. ], tot_loss[loss=0.2331, simple_loss=0.2797, pruned_loss=0.07048, ctc_loss=0.144, cr_loss=0.4179, over 6751681.00 frames. ], batch size: 78, lr: 6.71e-03, grad_scale: 32.0 2024-09-18 01:42:05,887 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:42:51,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=315835.3333333333, ans=0.04949747468305833 2024-09-18 01:43:20,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-18 01:43:28,210 INFO [train.py:1198] (1/2) Epoch 18, batch 1800, loss[loss=0.2357, simple_loss=0.2833, pruned_loss=0.07054, ctc_loss=0.1463, cr_loss=0.4411, over 34683.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.2797, pruned_loss=0.07056, ctc_loss=0.1442, cr_loss=0.418, over 6755565.41 frames. ], batch size: 97, lr: 6.71e-03, grad_scale: 32.0 2024-09-18 01:43:48,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=315975.3333333333, ans=0.125 2024-09-18 01:44:16,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=316068.6666666667, ans=0.0 2024-09-18 01:44:26,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=316068.6666666667, ans=0.0 2024-09-18 01:44:30,041 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.53 vs. limit=12.0 2024-09-18 01:44:32,570 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.532e+02 3.327e+02 4.543e+02 7.312e+02, threshold=6.653e+02, percent-clipped=4.0 2024-09-18 01:44:51,048 INFO [train.py:1198] (1/2) Epoch 18, batch 1850, loss[loss=0.2513, simple_loss=0.298, pruned_loss=0.07771, ctc_loss=0.1577, cr_loss=0.443, over 34455.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.2795, pruned_loss=0.07036, ctc_loss=0.1437, cr_loss=0.4173, over 6763171.28 frames. ], batch size: 100, lr: 6.70e-03, grad_scale: 32.0 2024-09-18 01:44:58,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=316162.0, ans=0.0 2024-09-18 01:45:09,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=316208.6666666667, ans=0.2 2024-09-18 01:45:21,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=316208.6666666667, ans=0.0 2024-09-18 01:45:39,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=316302.0, ans=0.125 2024-09-18 01:45:47,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=316302.0, ans=0.125 2024-09-18 01:46:02,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=316348.6666666667, ans=0.025 2024-09-18 01:46:13,085 INFO [train.py:1198] (1/2) Epoch 18, batch 1900, loss[loss=0.2588, simple_loss=0.3032, pruned_loss=0.08159, ctc_loss=0.165, cr_loss=0.4545, over 34400.00 frames. ], tot_loss[loss=0.2338, simple_loss=0.2805, pruned_loss=0.07075, ctc_loss=0.1445, cr_loss=0.4196, over 6772121.81 frames. ], batch size: 103, lr: 6.70e-03, grad_scale: 32.0 2024-09-18 01:46:17,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=316395.3333333333, ans=0.1 2024-09-18 01:46:35,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=316442.0, ans=0.0 2024-09-18 01:46:53,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=316488.6666666667, ans=0.2 2024-09-18 01:47:05,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.63 vs. limit=22.5 2024-09-18 01:47:19,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.19 vs. limit=22.5 2024-09-18 01:47:23,090 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.090e+02 2.603e+02 3.338e+02 4.380e+02 7.966e+02, threshold=6.676e+02, percent-clipped=3.0 2024-09-18 01:47:23,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=316582.0, ans=0.125 2024-09-18 01:47:26,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=316582.0, ans=0.0 2024-09-18 01:47:39,792 INFO [train.py:1198] (1/2) Epoch 18, batch 1950, loss[loss=0.2283, simple_loss=0.2723, pruned_loss=0.06952, ctc_loss=0.1411, cr_loss=0.4266, over 34754.00 frames. ], tot_loss[loss=0.2351, simple_loss=0.2818, pruned_loss=0.07119, ctc_loss=0.1453, cr_loss=0.4216, over 6789305.52 frames. ], batch size: 92, lr: 6.70e-03, grad_scale: 16.0 2024-09-18 01:47:48,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=316628.6666666667, ans=0.125 2024-09-18 01:48:36,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=316768.6666666667, ans=0.1 2024-09-18 01:49:02,617 INFO [train.py:1198] (1/2) Epoch 18, batch 2000, loss[loss=0.2226, simple_loss=0.264, pruned_loss=0.06852, ctc_loss=0.1372, cr_loss=0.4181, over 34172.00 frames. ], tot_loss[loss=0.2356, simple_loss=0.2822, pruned_loss=0.07143, ctc_loss=0.1458, cr_loss=0.4228, over 6766589.67 frames. ], batch size: 78, lr: 6.70e-03, grad_scale: 16.0 2024-09-18 01:50:00,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=317002.0, ans=0.2 2024-09-18 01:50:10,331 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.538e+02 3.035e+02 3.748e+02 9.290e+02, threshold=6.070e+02, percent-clipped=1.0 2024-09-18 01:50:26,918 INFO [train.py:1198] (1/2) Epoch 18, batch 2050, loss[loss=0.2053, simple_loss=0.2514, pruned_loss=0.06031, ctc_loss=0.1195, cr_loss=0.3647, over 34506.00 frames. ], tot_loss[loss=0.2345, simple_loss=0.281, pruned_loss=0.07105, ctc_loss=0.145, cr_loss=0.421, over 6757511.34 frames. ], batch size: 82, lr: 6.69e-03, grad_scale: 16.0 2024-09-18 01:50:37,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=317095.3333333333, ans=0.1 2024-09-18 01:50:43,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=317142.0, ans=0.125 2024-09-18 01:51:33,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.80 vs. limit=10.0 2024-09-18 01:51:43,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=317282.0, ans=0.09899494936611666 2024-09-18 01:51:48,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=317282.0, ans=0.2 2024-09-18 01:51:51,174 INFO [train.py:1198] (1/2) Epoch 18, batch 2100, loss[loss=0.2244, simple_loss=0.275, pruned_loss=0.06487, ctc_loss=0.1361, cr_loss=0.419, over 34515.00 frames. ], tot_loss[loss=0.2334, simple_loss=0.2802, pruned_loss=0.07049, ctc_loss=0.1441, cr_loss=0.4191, over 6770118.04 frames. ], batch size: 94, lr: 6.69e-03, grad_scale: 16.0 2024-09-18 01:52:15,110 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.75 vs. limit=6.0 2024-09-18 01:52:19,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=317375.3333333333, ans=0.125 2024-09-18 01:52:20,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=317375.3333333333, ans=0.125 2024-09-18 01:52:24,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=317375.3333333333, ans=0.2 2024-09-18 01:52:28,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=317422.0, ans=0.125 2024-09-18 01:52:41,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=317422.0, ans=0.125 2024-09-18 01:52:45,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=317468.6666666667, ans=0.0 2024-09-18 01:52:59,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.87 vs. limit=22.5 2024-09-18 01:53:00,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=317468.6666666667, ans=0.125 2024-09-18 01:53:04,760 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.595e+02 2.920e+02 3.906e+02 8.354e+02, threshold=5.839e+02, percent-clipped=2.0 2024-09-18 01:53:19,555 INFO [train.py:1198] (1/2) Epoch 18, batch 2150, loss[loss=0.2322, simple_loss=0.2742, pruned_loss=0.07201, ctc_loss=0.1453, cr_loss=0.428, over 34373.00 frames. ], tot_loss[loss=0.232, simple_loss=0.279, pruned_loss=0.06985, ctc_loss=0.143, cr_loss=0.4171, over 6789586.30 frames. ], batch size: 91, lr: 6.69e-03, grad_scale: 16.0 2024-09-18 01:54:11,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=317702.0, ans=0.0 2024-09-18 01:54:16,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=317702.0, ans=0.09899494936611666 2024-09-18 01:54:29,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=317748.6666666667, ans=0.1 2024-09-18 01:54:44,175 INFO [train.py:1198] (1/2) Epoch 18, batch 2200, loss[loss=0.2398, simple_loss=0.2893, pruned_loss=0.07219, ctc_loss=0.1459, cr_loss=0.4191, over 34438.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2794, pruned_loss=0.07012, ctc_loss=0.1435, cr_loss=0.4185, over 6784965.16 frames. ], batch size: 100, lr: 6.69e-03, grad_scale: 16.0 2024-09-18 01:54:54,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=317795.3333333333, ans=0.125 2024-09-18 01:55:02,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.51 vs. limit=15.0 2024-09-18 01:55:21,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=317888.6666666667, ans=0.1 2024-09-18 01:55:37,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=317935.3333333333, ans=0.0 2024-09-18 01:55:44,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=317935.3333333333, ans=0.07 2024-09-18 01:55:54,018 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.964e+02 2.582e+02 3.261e+02 4.284e+02 9.812e+02, threshold=6.522e+02, percent-clipped=9.0 2024-09-18 01:56:01,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=317982.0, ans=0.07 2024-09-18 01:56:01,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=317982.0, ans=0.0 2024-09-18 01:56:08,885 INFO [train.py:1198] (1/2) Epoch 18, batch 2250, loss[loss=0.2327, simple_loss=0.2819, pruned_loss=0.06951, ctc_loss=0.1408, cr_loss=0.4103, over 34455.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2794, pruned_loss=0.07012, ctc_loss=0.1433, cr_loss=0.4177, over 6782069.99 frames. ], batch size: 95, lr: 6.68e-03, grad_scale: 16.0 2024-09-18 01:56:24,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=318075.3333333333, ans=0.0 2024-09-18 01:57:16,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=318215.3333333333, ans=0.125 2024-09-18 01:57:23,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=318215.3333333333, ans=0.2 2024-09-18 01:57:30,969 INFO [train.py:1198] (1/2) Epoch 18, batch 2300, loss[loss=0.2068, simple_loss=0.2551, pruned_loss=0.05922, ctc_loss=0.1224, cr_loss=0.3901, over 34297.00 frames. ], tot_loss[loss=0.2318, simple_loss=0.2785, pruned_loss=0.06987, ctc_loss=0.1429, cr_loss=0.4163, over 6767399.64 frames. ], batch size: 83, lr: 6.68e-03, grad_scale: 16.0 2024-09-18 01:57:36,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=318262.0, ans=0.0 2024-09-18 01:58:01,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=318308.6666666667, ans=0.1 2024-09-18 01:58:30,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=318402.0, ans=0.0 2024-09-18 01:58:40,258 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.081e+02 2.580e+02 3.061e+02 4.216e+02 7.870e+02, threshold=6.121e+02, percent-clipped=2.0 2024-09-18 01:58:47,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=318448.6666666667, ans=0.04949747468305833 2024-09-18 01:58:57,278 INFO [train.py:1198] (1/2) Epoch 18, batch 2350, loss[loss=0.2353, simple_loss=0.279, pruned_loss=0.07272, ctc_loss=0.1451, cr_loss=0.4265, over 34711.00 frames. ], tot_loss[loss=0.232, simple_loss=0.2787, pruned_loss=0.07003, ctc_loss=0.1432, cr_loss=0.4169, over 6773561.92 frames. ], batch size: 97, lr: 6.68e-03, grad_scale: 16.0 2024-09-18 01:59:22,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.96 vs. limit=15.0 2024-09-18 01:59:30,343 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 01:59:38,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=318588.6666666667, ans=0.0 2024-09-18 01:59:51,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=318635.3333333333, ans=0.1 2024-09-18 02:00:07,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.42 vs. limit=15.0 2024-09-18 02:00:19,497 INFO [train.py:1198] (1/2) Epoch 18, batch 2400, loss[loss=0.236, simple_loss=0.28, pruned_loss=0.07267, ctc_loss=0.1484, cr_loss=0.4222, over 34577.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2793, pruned_loss=0.07029, ctc_loss=0.1436, cr_loss=0.4181, over 6777432.56 frames. ], batch size: 89, lr: 6.68e-03, grad_scale: 32.0 2024-09-18 02:00:19,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=318728.6666666667, ans=10.0 2024-09-18 02:00:33,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=318728.6666666667, ans=0.125 2024-09-18 02:00:42,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=318775.3333333333, ans=0.1 2024-09-18 02:00:56,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=318822.0, ans=0.0 2024-09-18 02:00:58,617 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.31 vs. limit=12.0 2024-09-18 02:01:26,769 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.514e+02 2.948e+02 3.863e+02 6.583e+02, threshold=5.897e+02, percent-clipped=2.0 2024-09-18 02:01:43,573 INFO [train.py:1198] (1/2) Epoch 18, batch 2450, loss[loss=0.2369, simple_loss=0.2847, pruned_loss=0.07193, ctc_loss=0.142, cr_loss=0.4225, over 34433.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.2803, pruned_loss=0.07063, ctc_loss=0.1443, cr_loss=0.4193, over 6752884.86 frames. ], batch size: 95, lr: 6.67e-03, grad_scale: 32.0 2024-09-18 02:01:51,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=318962.0, ans=0.0 2024-09-18 02:02:04,957 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:02:16,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=319055.3333333333, ans=0.2 2024-09-18 02:02:33,890 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:02:34,620 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.30 vs. limit=15.0 2024-09-18 02:02:41,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-09-18 02:02:43,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=319102.0, ans=0.2 2024-09-18 02:03:04,620 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.72 vs. limit=15.0 2024-09-18 02:03:05,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=319148.6666666667, ans=0.125 2024-09-18 02:03:06,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2024-09-18 02:03:06,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=319195.3333333333, ans=0.125 2024-09-18 02:03:08,214 INFO [train.py:1198] (1/2) Epoch 18, batch 2500, loss[loss=0.234, simple_loss=0.283, pruned_loss=0.06992, ctc_loss=0.1434, cr_loss=0.4129, over 34465.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.2802, pruned_loss=0.07068, ctc_loss=0.1444, cr_loss=0.4193, over 6764072.45 frames. ], batch size: 100, lr: 6.67e-03, grad_scale: 32.0 2024-09-18 02:03:25,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=319242.0, ans=0.0 2024-09-18 02:04:10,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.94 vs. limit=15.0 2024-09-18 02:04:16,210 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.405e+02 2.818e+02 3.804e+02 8.132e+02, threshold=5.637e+02, percent-clipped=1.0 2024-09-18 02:04:31,203 INFO [train.py:1198] (1/2) Epoch 18, batch 2550, loss[loss=0.2032, simple_loss=0.2483, pruned_loss=0.05916, ctc_loss=0.1244, cr_loss=0.3753, over 34182.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.2802, pruned_loss=0.07063, ctc_loss=0.1443, cr_loss=0.4192, over 6766674.01 frames. ], batch size: 78, lr: 6.67e-03, grad_scale: 32.0 2024-09-18 02:04:32,253 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2024-09-18 02:04:41,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=319428.6666666667, ans=0.0 2024-09-18 02:04:43,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=319428.6666666667, ans=0.1 2024-09-18 02:04:49,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=319475.3333333333, ans=0.125 2024-09-18 02:05:24,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=319568.6666666667, ans=0.025 2024-09-18 02:05:42,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=319615.3333333333, ans=0.0 2024-09-18 02:05:55,122 INFO [train.py:1198] (1/2) Epoch 18, batch 2600, loss[loss=0.2205, simple_loss=0.2714, pruned_loss=0.06396, ctc_loss=0.1306, cr_loss=0.3917, over 34356.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.2807, pruned_loss=0.07087, ctc_loss=0.1448, cr_loss=0.4194, over 6762341.01 frames. ], batch size: 91, lr: 6.67e-03, grad_scale: 16.0 2024-09-18 02:06:08,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=319662.0, ans=0.0 2024-09-18 02:06:13,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=319708.6666666667, ans=0.0 2024-09-18 02:06:45,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=319802.0, ans=0.125 2024-09-18 02:07:06,149 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.028e+02 2.486e+02 2.980e+02 3.740e+02 7.828e+02, threshold=5.961e+02, percent-clipped=8.0 2024-09-18 02:07:16,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=319848.6666666667, ans=0.125 2024-09-18 02:07:19,393 INFO [train.py:1198] (1/2) Epoch 18, batch 2650, loss[loss=0.2481, simple_loss=0.2964, pruned_loss=0.0758, ctc_loss=0.1514, cr_loss=0.4478, over 34264.00 frames. ], tot_loss[loss=0.2341, simple_loss=0.2809, pruned_loss=0.07077, ctc_loss=0.1446, cr_loss=0.4196, over 6770588.22 frames. ], batch size: 117, lr: 6.66e-03, grad_scale: 16.0 2024-09-18 02:07:36,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=319942.0, ans=0.2 2024-09-18 02:07:42,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=319942.0, ans=0.125 2024-09-18 02:07:44,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=319942.0, ans=0.125 2024-09-18 02:07:48,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.29 vs. limit=15.0 2024-09-18 02:07:55,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=319988.6666666667, ans=0.125 2024-09-18 02:08:20,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=320035.3333333333, ans=0.125 2024-09-18 02:08:41,585 INFO [train.py:1198] (1/2) Epoch 18, batch 2700, loss[loss=0.2283, simple_loss=0.2802, pruned_loss=0.06601, ctc_loss=0.1421, cr_loss=0.3963, over 34625.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.2811, pruned_loss=0.07084, ctc_loss=0.1447, cr_loss=0.4196, over 6766037.70 frames. ], batch size: 102, lr: 6.66e-03, grad_scale: 16.0 2024-09-18 02:08:50,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=320128.6666666667, ans=0.125 2024-09-18 02:09:35,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=320268.6666666667, ans=0.035 2024-09-18 02:09:36,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=320268.6666666667, ans=0.125 2024-09-18 02:09:52,995 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.120e+02 2.592e+02 3.114e+02 3.931e+02 6.827e+02, threshold=6.228e+02, percent-clipped=3.0 2024-09-18 02:10:08,538 INFO [train.py:1198] (1/2) Epoch 18, batch 2750, loss[loss=0.2123, simple_loss=0.2605, pruned_loss=0.06156, ctc_loss=0.1278, cr_loss=0.3861, over 34607.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2793, pruned_loss=0.07, ctc_loss=0.1431, cr_loss=0.4168, over 6762781.75 frames. ], batch size: 88, lr: 6.66e-03, grad_scale: 16.0 2024-09-18 02:10:30,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=320408.6666666667, ans=0.125 2024-09-18 02:10:44,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.90 vs. limit=15.0 2024-09-18 02:11:10,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.06 vs. limit=12.0 2024-09-18 02:11:27,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=320548.6666666667, ans=0.125 2024-09-18 02:11:31,134 INFO [train.py:1198] (1/2) Epoch 18, batch 2800, loss[loss=0.2747, simple_loss=0.3075, pruned_loss=0.09346, ctc_loss=0.1851, cr_loss=0.4497, over 24415.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2795, pruned_loss=0.07016, ctc_loss=0.1433, cr_loss=0.4173, over 6740858.52 frames. ], batch size: 245, lr: 6.66e-03, grad_scale: 32.0 2024-09-18 02:11:38,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=320595.3333333333, ans=0.05 2024-09-18 02:11:38,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.29 vs. limit=15.0 2024-09-18 02:11:57,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=320642.0, ans=0.125 2024-09-18 02:12:09,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=320688.6666666667, ans=0.0 2024-09-18 02:12:35,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.76 vs. limit=22.5 2024-09-18 02:12:41,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=320782.0, ans=0.0 2024-09-18 02:12:42,448 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.520e+02 2.976e+02 3.768e+02 9.446e+02, threshold=5.951e+02, percent-clipped=4.0 2024-09-18 02:12:45,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=320782.0, ans=0.2 2024-09-18 02:12:55,435 INFO [train.py:1198] (1/2) Epoch 18, batch 2850, loss[loss=0.2167, simple_loss=0.2673, pruned_loss=0.06222, ctc_loss=0.1306, cr_loss=0.3865, over 34499.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2802, pruned_loss=0.07058, ctc_loss=0.1442, cr_loss=0.4183, over 6724853.70 frames. ], batch size: 90, lr: 6.65e-03, grad_scale: 32.0 2024-09-18 02:13:05,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=320828.6666666667, ans=0.0 2024-09-18 02:13:07,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.39 vs. limit=15.0 2024-09-18 02:13:12,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=320875.3333333333, ans=0.125 2024-09-18 02:13:17,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.07 vs. limit=12.0 2024-09-18 02:13:58,921 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.695e-02 2024-09-18 02:14:13,921 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:14:20,112 INFO [train.py:1198] (1/2) Epoch 18, batch 2900, loss[loss=0.236, simple_loss=0.2816, pruned_loss=0.07198, ctc_loss=0.1443, cr_loss=0.4399, over 34534.00 frames. ], tot_loss[loss=0.2346, simple_loss=0.2815, pruned_loss=0.07098, ctc_loss=0.1448, cr_loss=0.4203, over 6755365.64 frames. ], batch size: 94, lr: 6.65e-03, grad_scale: 32.0 2024-09-18 02:14:32,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=321062.0, ans=0.0 2024-09-18 02:14:40,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=321108.6666666667, ans=0.0 2024-09-18 02:14:53,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=321155.3333333333, ans=0.0 2024-09-18 02:14:57,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2024-09-18 02:14:59,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.51 vs. limit=15.0 2024-09-18 02:15:05,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=321155.3333333333, ans=0.2 2024-09-18 02:15:07,085 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=12.0 2024-09-18 02:15:08,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.01 vs. limit=10.0 2024-09-18 02:15:16,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=321202.0, ans=0.0 2024-09-18 02:15:26,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=321248.6666666667, ans=0.0 2024-09-18 02:15:26,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.89 vs. limit=15.0 2024-09-18 02:15:29,273 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.089e+02 2.357e+02 2.858e+02 3.865e+02 8.619e+02, threshold=5.715e+02, percent-clipped=6.0 2024-09-18 02:15:39,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=321248.6666666667, ans=0.1 2024-09-18 02:15:39,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=321248.6666666667, ans=0.125 2024-09-18 02:15:42,417 INFO [train.py:1198] (1/2) Epoch 18, batch 2950, loss[loss=0.2305, simple_loss=0.2727, pruned_loss=0.0718, ctc_loss=0.1416, cr_loss=0.4096, over 34622.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.2804, pruned_loss=0.07068, ctc_loss=0.1444, cr_loss=0.4193, over 6749490.04 frames. ], batch size: 88, lr: 6.65e-03, grad_scale: 32.0 2024-09-18 02:15:51,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=15.0 2024-09-18 02:16:04,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=321342.0, ans=0.125 2024-09-18 02:16:18,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=321388.6666666667, ans=0.2 2024-09-18 02:16:45,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=321435.3333333333, ans=0.0 2024-09-18 02:17:04,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.91 vs. limit=15.0 2024-09-18 02:17:06,607 INFO [train.py:1198] (1/2) Epoch 18, batch 3000, loss[loss=0.2334, simple_loss=0.2762, pruned_loss=0.07206, ctc_loss=0.1461, cr_loss=0.4292, over 34518.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.28, pruned_loss=0.07044, ctc_loss=0.1441, cr_loss=0.4186, over 6750707.56 frames. ], batch size: 94, lr: 6.65e-03, grad_scale: 32.0 2024-09-18 02:17:06,607 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 02:17:23,513 INFO [train.py:1230] (1/2) Epoch 18, validation: loss=0.1506, simple_loss=0.2473, pruned_loss=0.02271, ctc_loss=0.04257, cr_loss=1.732e-14, over 944034.00 frames. 2024-09-18 02:17:23,513 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 02:17:31,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=321528.6666666667, ans=0.09899494936611666 2024-09-18 02:17:39,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=321528.6666666667, ans=0.125 2024-09-18 02:17:40,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=321575.3333333333, ans=10.0 2024-09-18 02:17:42,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=321575.3333333333, ans=0.125 2024-09-18 02:18:27,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=321668.6666666667, ans=0.0 2024-09-18 02:18:35,444 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.110e+02 2.378e+02 2.704e+02 3.170e+02 7.108e+02, threshold=5.408e+02, percent-clipped=3.0 2024-09-18 02:18:35,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=321715.3333333333, ans=10.0 2024-09-18 02:18:36,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.59 vs. limit=15.0 2024-09-18 02:18:43,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=321715.3333333333, ans=0.2 2024-09-18 02:18:45,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=321762.0, ans=0.125 2024-09-18 02:18:46,974 INFO [train.py:1198] (1/2) Epoch 18, batch 3050, loss[loss=0.2156, simple_loss=0.26, pruned_loss=0.06418, ctc_loss=0.132, cr_loss=0.4121, over 34582.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.281, pruned_loss=0.0709, ctc_loss=0.1448, cr_loss=0.4202, over 6742107.85 frames. ], batch size: 89, lr: 6.65e-03, grad_scale: 16.0 2024-09-18 02:19:27,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=321855.3333333333, ans=0.0 2024-09-18 02:19:36,328 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.86 vs. limit=10.0 2024-09-18 02:19:40,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=321902.0, ans=0.125 2024-09-18 02:19:45,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=321902.0, ans=0.125 2024-09-18 02:19:53,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=321948.6666666667, ans=0.2 2024-09-18 02:19:56,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=321948.6666666667, ans=0.125 2024-09-18 02:19:57,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-09-18 02:20:07,506 INFO [train.py:1198] (1/2) Epoch 18, batch 3100, loss[loss=0.2397, simple_loss=0.2886, pruned_loss=0.07258, ctc_loss=0.1446, cr_loss=0.4179, over 34217.00 frames. ], tot_loss[loss=0.2336, simple_loss=0.2804, pruned_loss=0.0706, ctc_loss=0.1443, cr_loss=0.4196, over 6742009.80 frames. ], batch size: 117, lr: 6.64e-03, grad_scale: 16.0 2024-09-18 02:20:17,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=321995.3333333333, ans=0.0 2024-09-18 02:20:56,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=322135.3333333333, ans=0.1 2024-09-18 02:21:01,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=322135.3333333333, ans=0.0 2024-09-18 02:21:02,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.78 vs. limit=15.0 2024-09-18 02:21:15,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=322182.0, ans=0.0 2024-09-18 02:21:17,275 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.130e+02 2.464e+02 2.909e+02 4.076e+02 7.220e+02, threshold=5.817e+02, percent-clipped=11.0 2024-09-18 02:21:19,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=322182.0, ans=0.1 2024-09-18 02:21:22,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=322182.0, ans=0.125 2024-09-18 02:21:30,165 INFO [train.py:1198] (1/2) Epoch 18, batch 3150, loss[loss=0.2492, simple_loss=0.2962, pruned_loss=0.07652, ctc_loss=0.156, cr_loss=0.4473, over 33894.00 frames. ], tot_loss[loss=0.2337, simple_loss=0.2805, pruned_loss=0.07063, ctc_loss=0.1444, cr_loss=0.4189, over 6748520.27 frames. ], batch size: 122, lr: 6.64e-03, grad_scale: 16.0 2024-09-18 02:21:48,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=322275.3333333333, ans=0.125 2024-09-18 02:22:05,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=322322.0, ans=0.125 2024-09-18 02:22:28,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=322368.6666666667, ans=0.0 2024-09-18 02:22:50,543 INFO [train.py:1198] (1/2) Epoch 18, batch 3200, loss[loss=0.2416, simple_loss=0.2875, pruned_loss=0.07387, ctc_loss=0.1504, cr_loss=0.4467, over 34547.00 frames. ], tot_loss[loss=0.2328, simple_loss=0.2797, pruned_loss=0.07026, ctc_loss=0.1436, cr_loss=0.4176, over 6761873.45 frames. ], batch size: 94, lr: 6.64e-03, grad_scale: 32.0 2024-09-18 02:22:50,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff2.min_abs, batch_count=322462.0, ans=0.1 2024-09-18 02:23:18,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=322508.6666666667, ans=0.125 2024-09-18 02:24:01,903 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.575e+02 3.026e+02 3.840e+02 7.369e+02, threshold=6.052e+02, percent-clipped=3.0 2024-09-18 02:24:10,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=322648.6666666667, ans=0.0 2024-09-18 02:24:13,277 INFO [train.py:1198] (1/2) Epoch 18, batch 3250, loss[loss=0.2542, simple_loss=0.2965, pruned_loss=0.08107, ctc_loss=0.1599, cr_loss=0.4459, over 34674.00 frames. ], tot_loss[loss=0.2331, simple_loss=0.2801, pruned_loss=0.07034, ctc_loss=0.1437, cr_loss=0.4179, over 6770510.91 frames. ], batch size: 98, lr: 6.64e-03, grad_scale: 32.0 2024-09-18 02:24:15,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.37 vs. limit=15.0 2024-09-18 02:24:29,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=322742.0, ans=0.125 2024-09-18 02:24:45,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=322788.6666666667, ans=0.0 2024-09-18 02:25:00,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=322835.3333333333, ans=0.0 2024-09-18 02:25:03,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=322835.3333333333, ans=0.125 2024-09-18 02:25:14,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=15.0 2024-09-18 02:25:33,880 INFO [train.py:1198] (1/2) Epoch 18, batch 3300, loss[loss=0.2425, simple_loss=0.2914, pruned_loss=0.07265, ctc_loss=0.1512, cr_loss=0.4498, over 33155.00 frames. ], tot_loss[loss=0.2315, simple_loss=0.2785, pruned_loss=0.06971, ctc_loss=0.1426, cr_loss=0.4155, over 6768839.32 frames. ], batch size: 130, lr: 6.63e-03, grad_scale: 32.0 2024-09-18 02:25:38,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=322928.6666666667, ans=0.2 2024-09-18 02:25:42,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=322928.6666666667, ans=0.05 2024-09-18 02:26:02,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=322975.3333333333, ans=0.05 2024-09-18 02:26:08,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=323022.0, ans=0.125 2024-09-18 02:26:13,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=323022.0, ans=0.0 2024-09-18 02:26:43,213 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.342e+02 2.815e+02 3.471e+02 6.068e+02, threshold=5.631e+02, percent-clipped=1.0 2024-09-18 02:26:55,844 INFO [train.py:1198] (1/2) Epoch 18, batch 3350, loss[loss=0.2601, simple_loss=0.306, pruned_loss=0.08115, ctc_loss=0.1667, cr_loss=0.4613, over 33827.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2794, pruned_loss=0.07011, ctc_loss=0.1434, cr_loss=0.4172, over 6744521.10 frames. ], batch size: 122, lr: 6.63e-03, grad_scale: 32.0 2024-09-18 02:26:56,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=323162.0, ans=0.125 2024-09-18 02:27:11,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.10 vs. limit=15.0 2024-09-18 02:27:12,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=323208.6666666667, ans=0.0 2024-09-18 02:27:25,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=323208.6666666667, ans=0.125 2024-09-18 02:27:25,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=323208.6666666667, ans=0.0 2024-09-18 02:27:39,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=323255.3333333333, ans=0.1 2024-09-18 02:27:41,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=22.5 2024-09-18 02:27:49,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=323302.0, ans=0.2 2024-09-18 02:27:52,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=323302.0, ans=0.1 2024-09-18 02:27:52,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.23 vs. limit=22.5 2024-09-18 02:27:54,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=323302.0, ans=0.2 2024-09-18 02:28:16,229 INFO [train.py:1198] (1/2) Epoch 18, batch 3400, loss[loss=0.2076, simple_loss=0.2513, pruned_loss=0.06121, ctc_loss=0.1299, cr_loss=0.3878, over 34171.00 frames. ], tot_loss[loss=0.233, simple_loss=0.2797, pruned_loss=0.07036, ctc_loss=0.144, cr_loss=0.4184, over 6734402.77 frames. ], batch size: 78, lr: 6.63e-03, grad_scale: 32.0 2024-09-18 02:28:16,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=323395.3333333333, ans=0.2 2024-09-18 02:28:16,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=323395.3333333333, ans=0.0 2024-09-18 02:28:18,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=323395.3333333333, ans=15.0 2024-09-18 02:28:26,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=323395.3333333333, ans=0.1 2024-09-18 02:28:31,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=323442.0, ans=0.1 2024-09-18 02:28:40,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=323442.0, ans=0.0 2024-09-18 02:28:51,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=323488.6666666667, ans=0.125 2024-09-18 02:29:14,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=323535.3333333333, ans=0.125 2024-09-18 02:29:19,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=323535.3333333333, ans=0.125 2024-09-18 02:29:26,876 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.474e+02 2.924e+02 3.898e+02 5.776e+02, threshold=5.849e+02, percent-clipped=1.0 2024-09-18 02:29:29,369 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.53 vs. limit=15.0 2024-09-18 02:29:36,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.34 vs. limit=15.0 2024-09-18 02:29:38,133 INFO [train.py:1198] (1/2) Epoch 18, batch 3450, loss[loss=0.2354, simple_loss=0.2893, pruned_loss=0.06848, ctc_loss=0.1405, cr_loss=0.4132, over 33083.00 frames. ], tot_loss[loss=0.2333, simple_loss=0.2802, pruned_loss=0.07045, ctc_loss=0.1442, cr_loss=0.4187, over 6746269.95 frames. ], batch size: 130, lr: 6.63e-03, grad_scale: 32.0 2024-09-18 02:29:53,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.47 vs. limit=22.5 2024-09-18 02:29:54,695 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:29:58,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.94 vs. limit=22.5 2024-09-18 02:30:15,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=323722.0, ans=0.0 2024-09-18 02:30:16,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=323722.0, ans=0.0 2024-09-18 02:30:20,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=323722.0, ans=0.125 2024-09-18 02:30:48,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=22.5 2024-09-18 02:30:49,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=323815.3333333333, ans=0.0 2024-09-18 02:30:53,036 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=12.0 2024-09-18 02:30:58,250 INFO [train.py:1198] (1/2) Epoch 18, batch 3500, loss[loss=0.2115, simple_loss=0.2583, pruned_loss=0.06158, ctc_loss=0.1285, cr_loss=0.3964, over 34515.00 frames. ], tot_loss[loss=0.2326, simple_loss=0.2795, pruned_loss=0.07018, ctc_loss=0.1436, cr_loss=0.4177, over 6748335.67 frames. ], batch size: 85, lr: 6.62e-03, grad_scale: 32.0 2024-09-18 02:30:58,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=323862.0, ans=0.025 2024-09-18 02:31:44,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=323955.3333333333, ans=0.0 2024-09-18 02:31:58,366 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=15.0 2024-09-18 02:32:02,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=324048.6666666667, ans=0.1 2024-09-18 02:32:08,554 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.033e+02 2.461e+02 2.893e+02 3.517e+02 6.292e+02, threshold=5.786e+02, percent-clipped=2.0 2024-09-18 02:32:19,857 INFO [train.py:1198] (1/2) Epoch 18, batch 3550, loss[loss=0.2438, simple_loss=0.2947, pruned_loss=0.07307, ctc_loss=0.1484, cr_loss=0.4246, over 34375.00 frames. ], tot_loss[loss=0.2331, simple_loss=0.2799, pruned_loss=0.0704, ctc_loss=0.1438, cr_loss=0.4188, over 6756847.44 frames. ], batch size: 103, lr: 6.62e-03, grad_scale: 32.0 2024-09-18 02:32:26,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=324095.3333333333, ans=0.0 2024-09-18 02:32:26,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=324095.3333333333, ans=0.0 2024-09-18 02:32:48,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=324142.0, ans=0.07 2024-09-18 02:32:50,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=324188.6666666667, ans=0.125 2024-09-18 02:32:56,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=324188.6666666667, ans=0.2 2024-09-18 02:33:38,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=324282.0, ans=0.05 2024-09-18 02:33:41,040 INFO [train.py:1198] (1/2) Epoch 18, batch 3600, loss[loss=0.229, simple_loss=0.275, pruned_loss=0.06943, ctc_loss=0.1367, cr_loss=0.4199, over 34481.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2802, pruned_loss=0.07055, ctc_loss=0.144, cr_loss=0.4193, over 6766586.03 frames. ], batch size: 90, lr: 6.62e-03, grad_scale: 32.0 2024-09-18 02:33:51,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=324328.6666666667, ans=0.125 2024-09-18 02:34:36,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=324468.6666666667, ans=0.2 2024-09-18 02:34:40,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=324468.6666666667, ans=0.125 2024-09-18 02:34:50,288 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.057e+02 2.590e+02 3.013e+02 3.969e+02 6.994e+02, threshold=6.027e+02, percent-clipped=3.0 2024-09-18 02:34:56,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=324515.3333333333, ans=0.125 2024-09-18 02:34:59,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.47 vs. limit=10.0 2024-09-18 02:35:02,488 INFO [train.py:1198] (1/2) Epoch 18, batch 3650, loss[loss=0.2462, simple_loss=0.2932, pruned_loss=0.07545, ctc_loss=0.1543, cr_loss=0.4331, over 34457.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2793, pruned_loss=0.07014, ctc_loss=0.1434, cr_loss=0.4175, over 6769182.00 frames. ], batch size: 110, lr: 6.62e-03, grad_scale: 32.0 2024-09-18 02:35:17,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=324608.6666666667, ans=0.0 2024-09-18 02:35:27,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=324608.6666666667, ans=0.125 2024-09-18 02:35:46,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=324655.3333333333, ans=0.125 2024-09-18 02:35:48,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=324655.3333333333, ans=0.125 2024-09-18 02:35:56,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.99 vs. limit=15.0 2024-09-18 02:35:59,387 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:36:05,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=324748.6666666667, ans=0.2 2024-09-18 02:36:15,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=324748.6666666667, ans=0.125 2024-09-18 02:36:21,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=324795.3333333333, ans=0.2 2024-09-18 02:36:23,230 INFO [train.py:1198] (1/2) Epoch 18, batch 3700, loss[loss=0.2371, simple_loss=0.2894, pruned_loss=0.07006, ctc_loss=0.1408, cr_loss=0.4149, over 34627.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2793, pruned_loss=0.06991, ctc_loss=0.1431, cr_loss=0.4169, over 6784363.50 frames. ], batch size: 102, lr: 6.61e-03, grad_scale: 32.0 2024-09-18 02:36:58,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.54 vs. limit=12.0 2024-09-18 02:37:13,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=324935.3333333333, ans=0.2 2024-09-18 02:37:21,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=324935.3333333333, ans=0.1 2024-09-18 02:37:29,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=324982.0, ans=0.125 2024-09-18 02:37:32,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=324982.0, ans=0.2 2024-09-18 02:37:35,777 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.909e+02 2.334e+02 2.523e+02 3.164e+02 5.781e+02, threshold=5.045e+02, percent-clipped=1.0 2024-09-18 02:37:45,662 INFO [train.py:1198] (1/2) Epoch 18, batch 3750, loss[loss=0.2603, simple_loss=0.3015, pruned_loss=0.08311, ctc_loss=0.1657, cr_loss=0.4926, over 34342.00 frames. ], tot_loss[loss=0.2356, simple_loss=0.2826, pruned_loss=0.07126, ctc_loss=0.1457, cr_loss=0.4223, over 6785396.43 frames. ], batch size: 113, lr: 6.61e-03, grad_scale: 16.0 2024-09-18 02:37:46,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=325028.6666666667, ans=0.125 2024-09-18 02:38:02,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=325075.3333333333, ans=0.1 2024-09-18 02:38:03,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.93 vs. limit=15.0 2024-09-18 02:38:10,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=325075.3333333333, ans=0.0 2024-09-18 02:38:15,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=325075.3333333333, ans=0.0 2024-09-18 02:38:21,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=325122.0, ans=0.0 2024-09-18 02:39:06,738 INFO [train.py:1198] (1/2) Epoch 18, batch 3800, loss[loss=0.2828, simple_loss=0.3086, pruned_loss=0.09915, ctc_loss=0.1943, cr_loss=0.4931, over 29899.00 frames. ], tot_loss[loss=0.2391, simple_loss=0.2854, pruned_loss=0.07299, ctc_loss=0.1488, cr_loss=0.4272, over 6674672.12 frames. ], batch size: 175, lr: 6.61e-03, grad_scale: 16.0 2024-09-18 02:39:07,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=325262.0, ans=0.1 2024-09-18 02:39:22,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=325308.6666666667, ans=0.2 2024-09-18 02:39:53,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=325355.3333333333, ans=0.125 2024-09-18 02:40:12,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=325402.0, ans=0.125 2024-09-18 02:40:21,255 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.109e+02 2.381e+02 2.675e+02 2.956e+02 4.662e+02, threshold=5.349e+02, percent-clipped=0.0 2024-09-18 02:40:31,506 INFO [train.py:1198] (1/2) Epoch 18, batch 3850, loss[loss=0.2729, simple_loss=0.3074, pruned_loss=0.09137, ctc_loss=0.1877, cr_loss=0.4526, over 23316.00 frames. ], tot_loss[loss=0.2436, simple_loss=0.2881, pruned_loss=0.07554, ctc_loss=0.154, cr_loss=0.4308, over 6254461.67 frames. ], batch size: 245, lr: 6.61e-03, grad_scale: 16.0 2024-09-18 02:40:50,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=325542.0, ans=0.2 2024-09-18 02:41:09,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=325588.6666666667, ans=0.125 2024-09-18 02:41:11,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=325588.6666666667, ans=0.025 2024-09-18 02:41:58,717 INFO [train.py:1198] (1/2) Epoch 19, batch 0, loss[loss=0.2035, simple_loss=0.2532, pruned_loss=0.05757, ctc_loss=0.1195, cr_loss=0.3704, over 34463.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.2532, pruned_loss=0.05757, ctc_loss=0.1195, cr_loss=0.3704, over 34463.00 frames. ], batch size: 85, lr: 6.43e-03, grad_scale: 32.0 2024-09-18 02:41:58,717 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 02:42:15,705 INFO [train.py:1230] (1/2) Epoch 19, validation: loss=0.1504, simple_loss=0.2484, pruned_loss=0.02198, ctc_loss=0.04242, cr_loss=1.796e-14, over 944034.00 frames. 2024-09-18 02:42:15,705 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 02:42:27,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=325616.6666666667, ans=0.025 2024-09-18 02:42:54,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=325710.0, ans=0.125 2024-09-18 02:43:14,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=325756.6666666667, ans=0.125 2024-09-18 02:43:15,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=325756.6666666667, ans=0.0 2024-09-18 02:43:15,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=325756.6666666667, ans=0.025 2024-09-18 02:43:40,282 INFO [train.py:1198] (1/2) Epoch 19, batch 50, loss[loss=0.2053, simple_loss=0.2518, pruned_loss=0.0593, ctc_loss=0.1236, cr_loss=0.385, over 34474.00 frames. ], tot_loss[loss=0.2372, simple_loss=0.2827, pruned_loss=0.07258, ctc_loss=0.1476, cr_loss=0.4244, over 1482321.72 frames. ], batch size: 82, lr: 6.42e-03, grad_scale: 32.0 2024-09-18 02:43:40,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=325850.0, ans=0.125 2024-09-18 02:43:42,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=325850.0, ans=0.125 2024-09-18 02:43:58,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=325896.6666666667, ans=0.0 2024-09-18 02:44:10,098 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.510e+02 2.797e+02 3.314e+02 5.610e+02, threshold=5.594e+02, percent-clipped=4.0 2024-09-18 02:44:10,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=325896.6666666667, ans=0.0 2024-09-18 02:44:12,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=325943.3333333333, ans=0.125 2024-09-18 02:44:22,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=325943.3333333333, ans=0.0 2024-09-18 02:44:33,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=325990.0, ans=0.0 2024-09-18 02:44:42,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2024-09-18 02:45:02,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.92 vs. limit=22.5 2024-09-18 02:45:03,185 INFO [train.py:1198] (1/2) Epoch 19, batch 100, loss[loss=0.224, simple_loss=0.2679, pruned_loss=0.06744, ctc_loss=0.1438, cr_loss=0.4117, over 34585.00 frames. ], tot_loss[loss=0.2376, simple_loss=0.2838, pruned_loss=0.07239, ctc_loss=0.1474, cr_loss=0.4254, over 2629832.20 frames. ], batch size: 89, lr: 6.42e-03, grad_scale: 32.0 2024-09-18 02:45:23,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=326130.0, ans=0.125 2024-09-18 02:45:33,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=326130.0, ans=0.0 2024-09-18 02:45:39,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=326176.6666666667, ans=0.125 2024-09-18 02:45:47,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=326176.6666666667, ans=0.125 2024-09-18 02:45:50,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=326176.6666666667, ans=15.0 2024-09-18 02:45:58,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=326223.3333333333, ans=0.0 2024-09-18 02:46:14,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=326270.0, ans=0.0 2024-09-18 02:46:14,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=326270.0, ans=0.125 2024-09-18 02:46:28,314 INFO [train.py:1198] (1/2) Epoch 19, batch 150, loss[loss=0.2104, simple_loss=0.2558, pruned_loss=0.06293, ctc_loss=0.1248, cr_loss=0.3526, over 34498.00 frames. ], tot_loss[loss=0.2347, simple_loss=0.2815, pruned_loss=0.07101, ctc_loss=0.1453, cr_loss=0.4216, over 3558076.75 frames. ], batch size: 82, lr: 6.42e-03, grad_scale: 32.0 2024-09-18 02:46:38,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=326316.6666666667, ans=0.125 2024-09-18 02:46:43,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=326363.3333333333, ans=0.125 2024-09-18 02:46:45,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=326363.3333333333, ans=0.035 2024-09-18 02:46:57,862 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.439e+02 3.037e+02 3.761e+02 7.478e+02, threshold=6.074e+02, percent-clipped=5.0 2024-09-18 02:47:19,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.70 vs. limit=15.0 2024-09-18 02:47:20,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=326456.6666666667, ans=0.125 2024-09-18 02:47:22,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=326456.6666666667, ans=0.04949747468305833 2024-09-18 02:47:27,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=326456.6666666667, ans=0.125 2024-09-18 02:47:39,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=326503.3333333333, ans=0.05 2024-09-18 02:47:50,209 INFO [train.py:1198] (1/2) Epoch 19, batch 200, loss[loss=0.2472, simple_loss=0.2964, pruned_loss=0.07553, ctc_loss=0.1516, cr_loss=0.4146, over 31815.00 frames. ], tot_loss[loss=0.2332, simple_loss=0.2802, pruned_loss=0.07035, ctc_loss=0.1439, cr_loss=0.4194, over 4271838.08 frames. ], batch size: 145, lr: 6.42e-03, grad_scale: 32.0 2024-09-18 02:47:55,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=326550.0, ans=0.125 2024-09-18 02:48:15,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=326596.6666666667, ans=0.2 2024-09-18 02:48:15,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=326596.6666666667, ans=0.125 2024-09-18 02:48:16,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=326596.6666666667, ans=0.0 2024-09-18 02:48:21,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=326643.3333333333, ans=0.0 2024-09-18 02:48:51,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=326690.0, ans=0.05 2024-09-18 02:48:53,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=326690.0, ans=0.2 2024-09-18 02:49:15,494 INFO [train.py:1198] (1/2) Epoch 19, batch 250, loss[loss=0.2395, simple_loss=0.2905, pruned_loss=0.07123, ctc_loss=0.1449, cr_loss=0.427, over 34227.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2798, pruned_loss=0.07007, ctc_loss=0.1434, cr_loss=0.419, over 4833968.19 frames. ], batch size: 117, lr: 6.42e-03, grad_scale: 32.0 2024-09-18 02:49:27,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=326783.3333333333, ans=0.05 2024-09-18 02:49:37,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=15.0 2024-09-18 02:49:46,863 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.490e+02 3.060e+02 3.923e+02 7.897e+02, threshold=6.119e+02, percent-clipped=7.0 2024-09-18 02:49:50,686 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 02:50:17,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=326923.3333333333, ans=12.0 2024-09-18 02:50:21,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=326970.0, ans=0.0 2024-09-18 02:50:39,625 INFO [train.py:1198] (1/2) Epoch 19, batch 300, loss[loss=0.2522, simple_loss=0.294, pruned_loss=0.07948, ctc_loss=0.161, cr_loss=0.4831, over 34384.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2794, pruned_loss=0.0699, ctc_loss=0.1431, cr_loss=0.4191, over 5263748.37 frames. ], batch size: 107, lr: 6.41e-03, grad_scale: 32.0 2024-09-18 02:50:45,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=327016.6666666667, ans=0.2 2024-09-18 02:50:45,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=327016.6666666667, ans=0.0 2024-09-18 02:50:53,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=327016.6666666667, ans=0.0 2024-09-18 02:51:17,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=327110.0, ans=0.2 2024-09-18 02:51:45,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=327203.3333333333, ans=0.125 2024-09-18 02:51:45,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=327203.3333333333, ans=0.125 2024-09-18 02:51:49,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=327203.3333333333, ans=0.125 2024-09-18 02:51:55,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=327203.3333333333, ans=0.1 2024-09-18 02:52:00,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=22.5 2024-09-18 02:52:01,880 INFO [train.py:1198] (1/2) Epoch 19, batch 350, loss[loss=0.2195, simple_loss=0.2654, pruned_loss=0.0652, ctc_loss=0.1345, cr_loss=0.4051, over 34280.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2796, pruned_loss=0.06984, ctc_loss=0.1431, cr_loss=0.4183, over 5598292.65 frames. ], batch size: 83, lr: 6.41e-03, grad_scale: 32.0 2024-09-18 02:52:02,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=327250.0, ans=0.025 2024-09-18 02:52:05,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=327250.0, ans=0.0 2024-09-18 02:52:05,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327250.0, ans=0.1 2024-09-18 02:52:07,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=327250.0, ans=0.125 2024-09-18 02:52:13,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327250.0, ans=0.1 2024-09-18 02:52:31,296 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.957e+02 2.302e+02 2.674e+02 3.386e+02 5.377e+02, threshold=5.347e+02, percent-clipped=0.0 2024-09-18 02:52:34,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=327343.3333333333, ans=0.0 2024-09-18 02:52:44,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=327343.3333333333, ans=0.5 2024-09-18 02:52:44,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=327343.3333333333, ans=0.0 2024-09-18 02:53:03,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=327390.0, ans=0.125 2024-09-18 02:53:23,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=327436.6666666667, ans=0.025 2024-09-18 02:53:28,104 INFO [train.py:1198] (1/2) Epoch 19, batch 400, loss[loss=0.2357, simple_loss=0.284, pruned_loss=0.07087, ctc_loss=0.1425, cr_loss=0.4292, over 34434.00 frames. ], tot_loss[loss=0.2321, simple_loss=0.2793, pruned_loss=0.06974, ctc_loss=0.1429, cr_loss=0.4182, over 5864486.65 frames. ], batch size: 95, lr: 6.41e-03, grad_scale: 32.0 2024-09-18 02:53:45,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=327530.0, ans=0.0 2024-09-18 02:53:55,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.71 vs. limit=12.0 2024-09-18 02:54:00,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=327576.6666666667, ans=0.0 2024-09-18 02:54:05,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=327576.6666666667, ans=0.0 2024-09-18 02:54:24,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.69 vs. limit=22.5 2024-09-18 02:54:30,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=327623.3333333333, ans=0.125 2024-09-18 02:54:48,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327670.0, ans=0.1 2024-09-18 02:54:51,540 INFO [train.py:1198] (1/2) Epoch 19, batch 450, loss[loss=0.2471, simple_loss=0.2943, pruned_loss=0.07556, ctc_loss=0.1539, cr_loss=0.4531, over 34695.00 frames. ], tot_loss[loss=0.2322, simple_loss=0.2795, pruned_loss=0.0698, ctc_loss=0.1429, cr_loss=0.4186, over 6051488.29 frames. ], batch size: 97, lr: 6.41e-03, grad_scale: 16.0 2024-09-18 02:54:53,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.05 vs. limit=22.5 2024-09-18 02:55:13,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327763.3333333333, ans=0.1 2024-09-18 02:55:18,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=327763.3333333333, ans=0.0 2024-09-18 02:55:22,771 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.562e+02 2.839e+02 3.606e+02 6.373e+02, threshold=5.679e+02, percent-clipped=6.0 2024-09-18 02:55:28,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=327810.0, ans=0.2 2024-09-18 02:55:41,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=327856.6666666667, ans=0.125 2024-09-18 02:56:09,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=327903.3333333333, ans=0.125 2024-09-18 02:56:13,943 INFO [train.py:1198] (1/2) Epoch 19, batch 500, loss[loss=0.2462, simple_loss=0.2899, pruned_loss=0.07661, ctc_loss=0.156, cr_loss=0.4492, over 34442.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.2784, pruned_loss=0.06932, ctc_loss=0.142, cr_loss=0.4173, over 6218878.55 frames. ], batch size: 110, lr: 6.40e-03, grad_scale: 16.0 2024-09-18 02:56:18,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.91 vs. limit=22.5 2024-09-18 02:56:19,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.88 vs. limit=10.0 2024-09-18 02:56:29,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327996.6666666667, ans=0.1 2024-09-18 02:57:22,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=328136.6666666667, ans=0.125 2024-09-18 02:57:32,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=328136.6666666667, ans=0.125 2024-09-18 02:57:40,532 INFO [train.py:1198] (1/2) Epoch 19, batch 550, loss[loss=0.2472, simple_loss=0.2963, pruned_loss=0.07515, ctc_loss=0.1536, cr_loss=0.4262, over 33902.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.2784, pruned_loss=0.06941, ctc_loss=0.1422, cr_loss=0.4168, over 6328938.99 frames. ], batch size: 122, lr: 6.40e-03, grad_scale: 16.0 2024-09-18 02:57:54,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=328183.3333333333, ans=0.0 2024-09-18 02:58:12,048 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.101e+02 2.523e+02 2.793e+02 3.510e+02 5.735e+02, threshold=5.587e+02, percent-clipped=1.0 2024-09-18 02:58:15,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=328276.6666666667, ans=0.125 2024-09-18 02:58:18,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=328276.6666666667, ans=0.1 2024-09-18 02:58:19,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=328276.6666666667, ans=0.2 2024-09-18 02:58:22,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=328276.6666666667, ans=0.0 2024-09-18 02:58:30,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=328323.3333333333, ans=0.0 2024-09-18 02:58:44,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.23 vs. limit=22.5 2024-09-18 02:59:03,397 INFO [train.py:1198] (1/2) Epoch 19, batch 600, loss[loss=0.2475, simple_loss=0.2965, pruned_loss=0.07481, ctc_loss=0.1539, cr_loss=0.4538, over 34241.00 frames. ], tot_loss[loss=0.2315, simple_loss=0.2789, pruned_loss=0.0695, ctc_loss=0.1424, cr_loss=0.4173, over 6429622.47 frames. ], batch size: 117, lr: 6.40e-03, grad_scale: 16.0 2024-09-18 02:59:06,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=328416.6666666667, ans=0.025 2024-09-18 02:59:18,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=328463.3333333333, ans=0.2 2024-09-18 02:59:25,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=328463.3333333333, ans=0.125 2024-09-18 02:59:56,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=328556.6666666667, ans=0.125 2024-09-18 03:00:06,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=328556.6666666667, ans=0.125 2024-09-18 03:00:11,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=328603.3333333333, ans=0.2 2024-09-18 03:00:26,100 INFO [train.py:1198] (1/2) Epoch 19, batch 650, loss[loss=0.2294, simple_loss=0.2769, pruned_loss=0.06905, ctc_loss=0.1379, cr_loss=0.4078, over 34543.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.2779, pruned_loss=0.06878, ctc_loss=0.1411, cr_loss=0.4146, over 6521423.70 frames. ], batch size: 94, lr: 6.40e-03, grad_scale: 16.0 2024-09-18 03:00:45,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=328696.6666666667, ans=0.0 2024-09-18 03:00:46,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328696.6666666667, ans=0.1 2024-09-18 03:00:53,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=328696.6666666667, ans=0.125 2024-09-18 03:01:01,519 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.551e+02 3.263e+02 4.998e+02 8.596e+02, threshold=6.527e+02, percent-clipped=17.0 2024-09-18 03:01:26,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=328790.0, ans=0.1 2024-09-18 03:01:44,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=328836.6666666667, ans=0.125 2024-09-18 03:01:52,512 INFO [train.py:1198] (1/2) Epoch 19, batch 700, loss[loss=0.2154, simple_loss=0.2634, pruned_loss=0.06265, ctc_loss=0.1322, cr_loss=0.3939, over 34599.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2785, pruned_loss=0.06897, ctc_loss=0.1413, cr_loss=0.415, over 6578888.15 frames. ], batch size: 89, lr: 6.40e-03, grad_scale: 16.0 2024-09-18 03:02:07,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=328930.0, ans=0.125 2024-09-18 03:02:13,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.84 vs. limit=6.0 2024-09-18 03:03:00,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=329070.0, ans=0.0 2024-09-18 03:03:14,448 INFO [train.py:1198] (1/2) Epoch 19, batch 750, loss[loss=0.2406, simple_loss=0.2847, pruned_loss=0.07479, ctc_loss=0.1493, cr_loss=0.4268, over 34430.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.2781, pruned_loss=0.06891, ctc_loss=0.1411, cr_loss=0.4146, over 6622051.70 frames. ], batch size: 95, lr: 6.39e-03, grad_scale: 16.0 2024-09-18 03:03:32,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=329163.3333333333, ans=0.2 2024-09-18 03:03:38,354 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.19 vs. limit=22.5 2024-09-18 03:03:40,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.77 vs. limit=15.0 2024-09-18 03:03:45,361 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.876e+02 2.462e+02 3.078e+02 4.366e+02 8.626e+02, threshold=6.155e+02, percent-clipped=3.0 2024-09-18 03:03:46,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2024-09-18 03:04:10,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=329256.6666666667, ans=0.125 2024-09-18 03:04:12,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=329256.6666666667, ans=0.125 2024-09-18 03:04:19,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.33 vs. limit=15.0 2024-09-18 03:04:38,950 INFO [train.py:1198] (1/2) Epoch 19, batch 800, loss[loss=0.1979, simple_loss=0.2485, pruned_loss=0.05437, ctc_loss=0.1187, cr_loss=0.3691, over 34477.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2779, pruned_loss=0.06891, ctc_loss=0.1411, cr_loss=0.4147, over 6657388.43 frames. ], batch size: 85, lr: 6.39e-03, grad_scale: 32.0 2024-09-18 03:04:56,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2024-09-18 03:05:00,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=329396.6666666667, ans=0.125 2024-09-18 03:05:01,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=329396.6666666667, ans=0.2 2024-09-18 03:05:25,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=329443.3333333333, ans=0.125 2024-09-18 03:05:42,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=329490.0, ans=0.125 2024-09-18 03:05:47,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=329536.6666666667, ans=0.025 2024-09-18 03:05:53,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=329536.6666666667, ans=0.025 2024-09-18 03:06:03,148 INFO [train.py:1198] (1/2) Epoch 19, batch 850, loss[loss=0.2366, simple_loss=0.2933, pruned_loss=0.06764, ctc_loss=0.1409, cr_loss=0.4111, over 34363.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2774, pruned_loss=0.06863, ctc_loss=0.1406, cr_loss=0.4139, over 6690968.17 frames. ], batch size: 103, lr: 6.39e-03, grad_scale: 32.0 2024-09-18 03:06:19,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=329630.0, ans=0.0 2024-09-18 03:06:29,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=329630.0, ans=0.2 2024-09-18 03:06:34,090 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.047e+02 2.326e+02 2.892e+02 3.691e+02 1.011e+03, threshold=5.785e+02, percent-clipped=1.0 2024-09-18 03:06:41,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2024-09-18 03:06:56,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=12.0 2024-09-18 03:06:59,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=329723.3333333333, ans=0.125 2024-09-18 03:07:00,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=329723.3333333333, ans=0.0 2024-09-18 03:07:04,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=329723.3333333333, ans=0.125 2024-09-18 03:07:12,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=329770.0, ans=0.125 2024-09-18 03:07:17,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.56 vs. limit=15.0 2024-09-18 03:07:25,345 INFO [train.py:1198] (1/2) Epoch 19, batch 900, loss[loss=0.1972, simple_loss=0.2467, pruned_loss=0.05521, ctc_loss=0.1142, cr_loss=0.36, over 34454.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2778, pruned_loss=0.06894, ctc_loss=0.1414, cr_loss=0.4154, over 6697957.80 frames. ], batch size: 85, lr: 6.39e-03, grad_scale: 32.0 2024-09-18 03:07:48,495 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:08:05,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=15.0 2024-09-18 03:08:25,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=329956.6666666667, ans=0.025 2024-09-18 03:08:27,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=329956.6666666667, ans=0.125 2024-09-18 03:08:33,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=330003.3333333333, ans=0.1 2024-09-18 03:08:39,516 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=8.55 vs. limit=15.0 2024-09-18 03:08:42,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=330003.3333333333, ans=0.025 2024-09-18 03:08:51,735 INFO [train.py:1198] (1/2) Epoch 19, batch 950, loss[loss=0.2144, simple_loss=0.2657, pruned_loss=0.06064, ctc_loss=0.1276, cr_loss=0.4062, over 34687.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2783, pruned_loss=0.0691, ctc_loss=0.1417, cr_loss=0.4158, over 6701787.87 frames. ], batch size: 87, lr: 6.38e-03, grad_scale: 32.0 2024-09-18 03:08:53,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=330050.0, ans=0.1 2024-09-18 03:09:08,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=330096.6666666667, ans=0.0 2024-09-18 03:09:23,249 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.075e+02 2.717e+02 3.382e+02 4.445e+02 7.915e+02, threshold=6.763e+02, percent-clipped=6.0 2024-09-18 03:09:23,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=330143.3333333333, ans=0.125 2024-09-18 03:09:35,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.84 vs. limit=15.0 2024-09-18 03:10:13,993 INFO [train.py:1198] (1/2) Epoch 19, batch 1000, loss[loss=0.2184, simple_loss=0.2642, pruned_loss=0.06466, ctc_loss=0.1344, cr_loss=0.4079, over 34507.00 frames. ], tot_loss[loss=0.2316, simple_loss=0.279, pruned_loss=0.06954, ctc_loss=0.1424, cr_loss=0.417, over 6694822.51 frames. ], batch size: 90, lr: 6.38e-03, grad_scale: 32.0 2024-09-18 03:10:39,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.71 vs. limit=15.0 2024-09-18 03:10:47,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=330376.6666666667, ans=0.0 2024-09-18 03:10:52,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=330376.6666666667, ans=0.2 2024-09-18 03:10:54,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=330376.6666666667, ans=0.2 2024-09-18 03:11:15,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=330423.3333333333, ans=0.125 2024-09-18 03:11:19,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=330470.0, ans=0.125 2024-09-18 03:11:36,791 INFO [train.py:1198] (1/2) Epoch 19, batch 1050, loss[loss=0.2398, simple_loss=0.2896, pruned_loss=0.07228, ctc_loss=0.1417, cr_loss=0.4278, over 34584.00 frames. ], tot_loss[loss=0.2317, simple_loss=0.2788, pruned_loss=0.06969, ctc_loss=0.1425, cr_loss=0.4178, over 6703629.92 frames. ], batch size: 99, lr: 6.38e-03, grad_scale: 32.0 2024-09-18 03:11:47,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=330516.6666666667, ans=0.125 2024-09-18 03:11:55,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=330563.3333333333, ans=0.0 2024-09-18 03:12:10,104 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.81 vs. limit=10.0 2024-09-18 03:12:12,459 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.372e+02 2.724e+02 3.454e+02 6.585e+02, threshold=5.448e+02, percent-clipped=0.0 2024-09-18 03:12:18,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.03 vs. limit=15.0 2024-09-18 03:12:19,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=330610.0, ans=0.0 2024-09-18 03:12:23,371 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.20 vs. limit=15.0 2024-09-18 03:12:30,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=330656.6666666667, ans=0.125 2024-09-18 03:12:42,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=330656.6666666667, ans=0.0 2024-09-18 03:13:03,499 INFO [train.py:1198] (1/2) Epoch 19, batch 1100, loss[loss=0.2288, simple_loss=0.271, pruned_loss=0.07056, ctc_loss=0.1437, cr_loss=0.4185, over 34343.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.2783, pruned_loss=0.0694, ctc_loss=0.1421, cr_loss=0.4169, over 6716980.67 frames. ], batch size: 91, lr: 6.38e-03, grad_scale: 32.0 2024-09-18 03:13:20,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=330796.6666666667, ans=0.125 2024-09-18 03:13:27,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2024-09-18 03:13:48,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=330843.3333333333, ans=0.1 2024-09-18 03:13:50,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=330843.3333333333, ans=0.0 2024-09-18 03:13:53,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=330890.0, ans=0.125 2024-09-18 03:14:05,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=330890.0, ans=0.1 2024-09-18 03:14:10,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=330936.6666666667, ans=0.125 2024-09-18 03:14:11,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=330936.6666666667, ans=0.125 2024-09-18 03:14:13,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=330936.6666666667, ans=0.2 2024-09-18 03:14:26,491 INFO [train.py:1198] (1/2) Epoch 19, batch 1150, loss[loss=0.2258, simple_loss=0.2721, pruned_loss=0.06784, ctc_loss=0.1376, cr_loss=0.4093, over 34740.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2785, pruned_loss=0.06951, ctc_loss=0.1423, cr_loss=0.4173, over 6714071.78 frames. ], batch size: 92, lr: 6.38e-03, grad_scale: 32.0 2024-09-18 03:14:29,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=330983.3333333333, ans=12.0 2024-09-18 03:14:57,894 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.298e+02 2.738e+02 3.503e+02 6.942e+02, threshold=5.477e+02, percent-clipped=3.0 2024-09-18 03:15:31,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.30 vs. limit=22.5 2024-09-18 03:15:53,125 INFO [train.py:1198] (1/2) Epoch 19, batch 1200, loss[loss=0.2325, simple_loss=0.284, pruned_loss=0.06827, ctc_loss=0.1393, cr_loss=0.4155, over 34577.00 frames. ], tot_loss[loss=0.2317, simple_loss=0.279, pruned_loss=0.06957, ctc_loss=0.1426, cr_loss=0.4178, over 6707263.75 frames. ], batch size: 99, lr: 6.37e-03, grad_scale: 32.0 2024-09-18 03:16:42,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2024-09-18 03:16:55,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.69 vs. limit=12.0 2024-09-18 03:16:58,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=331403.3333333333, ans=0.125 2024-09-18 03:17:07,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-18 03:17:16,141 INFO [train.py:1198] (1/2) Epoch 19, batch 1250, loss[loss=0.2347, simple_loss=0.286, pruned_loss=0.0692, ctc_loss=0.1397, cr_loss=0.4274, over 34330.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2796, pruned_loss=0.06982, ctc_loss=0.1429, cr_loss=0.4187, over 6742144.47 frames. ], batch size: 107, lr: 6.37e-03, grad_scale: 32.0 2024-09-18 03:17:16,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=331450.0, ans=0.125 2024-09-18 03:17:39,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=331496.6666666667, ans=0.0 2024-09-18 03:17:47,476 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.084e+02 2.418e+02 2.838e+02 3.536e+02 6.741e+02, threshold=5.676e+02, percent-clipped=1.0 2024-09-18 03:18:01,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=331543.3333333333, ans=0.1 2024-09-18 03:18:12,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=331590.0, ans=0.0 2024-09-18 03:18:15,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=331590.0, ans=0.125 2024-09-18 03:18:38,377 INFO [train.py:1198] (1/2) Epoch 19, batch 1300, loss[loss=0.2295, simple_loss=0.2839, pruned_loss=0.06568, ctc_loss=0.1376, cr_loss=0.4052, over 33115.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.2789, pruned_loss=0.0694, ctc_loss=0.1421, cr_loss=0.417, over 6744900.34 frames. ], batch size: 130, lr: 6.37e-03, grad_scale: 32.0 2024-09-18 03:18:53,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=331730.0, ans=0.125 2024-09-18 03:19:11,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=331776.6666666667, ans=0.125 2024-09-18 03:19:21,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=331776.6666666667, ans=0.125 2024-09-18 03:19:30,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=331823.3333333333, ans=0.125 2024-09-18 03:19:30,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=331823.3333333333, ans=0.2 2024-09-18 03:19:47,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331870.0, ans=0.1 2024-09-18 03:19:58,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=331870.0, ans=0.125 2024-09-18 03:20:00,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=331870.0, ans=0.0 2024-09-18 03:20:04,883 INFO [train.py:1198] (1/2) Epoch 19, batch 1350, loss[loss=0.2257, simple_loss=0.2741, pruned_loss=0.06679, ctc_loss=0.1386, cr_loss=0.4005, over 34540.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2783, pruned_loss=0.06899, ctc_loss=0.1413, cr_loss=0.4155, over 6763713.60 frames. ], batch size: 94, lr: 6.37e-03, grad_scale: 32.0 2024-09-18 03:20:05,370 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:20:24,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=331963.3333333333, ans=0.0 2024-09-18 03:20:35,752 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.958e+02 2.472e+02 3.029e+02 3.878e+02 5.669e+02, threshold=6.059e+02, percent-clipped=0.0 2024-09-18 03:20:59,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=332056.6666666667, ans=0.07 2024-09-18 03:21:12,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=332103.3333333333, ans=0.0 2024-09-18 03:21:17,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=332103.3333333333, ans=0.125 2024-09-18 03:21:26,826 INFO [train.py:1198] (1/2) Epoch 19, batch 1400, loss[loss=0.191, simple_loss=0.2417, pruned_loss=0.05163, ctc_loss=0.1135, cr_loss=0.3594, over 34306.00 frames. ], tot_loss[loss=0.2305, simple_loss=0.2782, pruned_loss=0.06898, ctc_loss=0.1413, cr_loss=0.4156, over 6776202.89 frames. ], batch size: 80, lr: 6.36e-03, grad_scale: 32.0 2024-09-18 03:21:40,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=332150.0, ans=0.125 2024-09-18 03:21:43,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=332196.6666666667, ans=0.125 2024-09-18 03:21:54,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=332196.6666666667, ans=0.0 2024-09-18 03:22:07,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=332243.3333333333, ans=6.0 2024-09-18 03:22:22,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=332290.0, ans=0.0 2024-09-18 03:22:26,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=332290.0, ans=0.125 2024-09-18 03:22:39,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.73 vs. limit=22.5 2024-09-18 03:22:49,275 INFO [train.py:1198] (1/2) Epoch 19, batch 1450, loss[loss=0.2523, simple_loss=0.2937, pruned_loss=0.08019, ctc_loss=0.1624, cr_loss=0.4503, over 34454.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2788, pruned_loss=0.06931, ctc_loss=0.1418, cr_loss=0.4167, over 6772487.17 frames. ], batch size: 110, lr: 6.36e-03, grad_scale: 32.0 2024-09-18 03:23:14,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=332430.0, ans=0.125 2024-09-18 03:23:22,337 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.967e+02 2.487e+02 2.823e+02 3.991e+02 6.695e+02, threshold=5.647e+02, percent-clipped=2.0 2024-09-18 03:23:51,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=332523.3333333333, ans=0.0 2024-09-18 03:24:01,347 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.30 vs. limit=10.0 2024-09-18 03:24:13,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=332616.6666666667, ans=0.125 2024-09-18 03:24:15,037 INFO [train.py:1198] (1/2) Epoch 19, batch 1500, loss[loss=0.2333, simple_loss=0.2844, pruned_loss=0.06828, ctc_loss=0.1425, cr_loss=0.4277, over 34440.00 frames. ], tot_loss[loss=0.2316, simple_loss=0.2793, pruned_loss=0.0694, ctc_loss=0.1421, cr_loss=0.417, over 6772851.61 frames. ], batch size: 100, lr: 6.36e-03, grad_scale: 32.0 2024-09-18 03:24:20,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=332616.6666666667, ans=0.125 2024-09-18 03:24:26,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=332616.6666666667, ans=0.125 2024-09-18 03:25:00,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=332710.0, ans=0.125 2024-09-18 03:25:33,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=332803.3333333333, ans=0.125 2024-09-18 03:25:37,781 INFO [train.py:1198] (1/2) Epoch 19, batch 1550, loss[loss=0.2504, simple_loss=0.2957, pruned_loss=0.07772, ctc_loss=0.1582, cr_loss=0.4507, over 34429.00 frames. ], tot_loss[loss=0.2318, simple_loss=0.2793, pruned_loss=0.06959, ctc_loss=0.1423, cr_loss=0.4177, over 6744265.94 frames. ], batch size: 105, lr: 6.36e-03, grad_scale: 16.0 2024-09-18 03:25:39,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=332850.0, ans=0.2 2024-09-18 03:25:39,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=332850.0, ans=0.2 2024-09-18 03:25:41,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=332850.0, ans=0.125 2024-09-18 03:25:53,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.50 vs. limit=22.5 2024-09-18 03:25:59,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=332896.6666666667, ans=0.125 2024-09-18 03:26:06,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=332896.6666666667, ans=0.025 2024-09-18 03:26:10,606 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.930e+02 2.421e+02 2.807e+02 3.730e+02 8.179e+02, threshold=5.614e+02, percent-clipped=3.0 2024-09-18 03:26:16,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.87 vs. limit=22.5 2024-09-18 03:26:36,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.12 vs. limit=15.0 2024-09-18 03:27:01,754 INFO [train.py:1198] (1/2) Epoch 19, batch 1600, loss[loss=0.227, simple_loss=0.2785, pruned_loss=0.06532, ctc_loss=0.1395, cr_loss=0.4218, over 34578.00 frames. ], tot_loss[loss=0.2313, simple_loss=0.2788, pruned_loss=0.06939, ctc_loss=0.142, cr_loss=0.4167, over 6723406.11 frames. ], batch size: 99, lr: 6.36e-03, grad_scale: 32.0 2024-09-18 03:27:11,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=333083.3333333333, ans=0.125 2024-09-18 03:27:53,516 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.81 vs. limit=15.0 2024-09-18 03:28:04,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=333223.3333333333, ans=0.0 2024-09-18 03:28:21,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.70 vs. limit=22.5 2024-09-18 03:28:26,787 INFO [train.py:1198] (1/2) Epoch 19, batch 1650, loss[loss=0.2427, simple_loss=0.296, pruned_loss=0.07101, ctc_loss=0.1461, cr_loss=0.4535, over 34379.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2787, pruned_loss=0.06931, ctc_loss=0.142, cr_loss=0.4161, over 6718337.56 frames. ], batch size: 103, lr: 6.35e-03, grad_scale: 32.0 2024-09-18 03:28:39,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.53 vs. limit=15.0 2024-09-18 03:28:45,957 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.31 vs. limit=6.0 2024-09-18 03:28:59,852 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.534e+02 3.033e+02 4.043e+02 9.198e+02, threshold=6.067e+02, percent-clipped=9.0 2024-09-18 03:29:11,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=333410.0, ans=0.125 2024-09-18 03:29:31,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=333503.3333333333, ans=0.025 2024-09-18 03:29:42,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=333503.3333333333, ans=0.0 2024-09-18 03:29:48,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.40 vs. limit=10.0 2024-09-18 03:29:49,256 INFO [train.py:1198] (1/2) Epoch 19, batch 1700, loss[loss=0.2115, simple_loss=0.2541, pruned_loss=0.06335, ctc_loss=0.1326, cr_loss=0.3911, over 34302.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.2784, pruned_loss=0.06914, ctc_loss=0.1416, cr_loss=0.4157, over 6744606.40 frames. ], batch size: 80, lr: 6.35e-03, grad_scale: 32.0 2024-09-18 03:29:59,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=333550.0, ans=0.125 2024-09-18 03:30:15,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=15.0 2024-09-18 03:30:55,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=333736.6666666667, ans=0.1 2024-09-18 03:30:57,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=333736.6666666667, ans=0.2 2024-09-18 03:30:59,012 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:30:59,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=333736.6666666667, ans=0.125 2024-09-18 03:30:59,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=333736.6666666667, ans=0.125 2024-09-18 03:31:02,924 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.98 vs. limit=22.5 2024-09-18 03:31:15,636 INFO [train.py:1198] (1/2) Epoch 19, batch 1750, loss[loss=0.2125, simple_loss=0.2559, pruned_loss=0.06321, ctc_loss=0.1302, cr_loss=0.4162, over 34185.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2778, pruned_loss=0.06893, ctc_loss=0.1412, cr_loss=0.415, over 6754378.20 frames. ], batch size: 78, lr: 6.35e-03, grad_scale: 32.0 2024-09-18 03:31:16,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=333783.3333333333, ans=0.0 2024-09-18 03:31:24,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=333783.3333333333, ans=0.2 2024-09-18 03:31:48,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.441e+02 2.857e+02 3.619e+02 4.809e+02, threshold=5.715e+02, percent-clipped=0.0 2024-09-18 03:32:05,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.92 vs. limit=15.0 2024-09-18 03:32:37,834 INFO [train.py:1198] (1/2) Epoch 19, batch 1800, loss[loss=0.2288, simple_loss=0.2774, pruned_loss=0.06775, ctc_loss=0.1407, cr_loss=0.4109, over 34700.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2781, pruned_loss=0.06909, ctc_loss=0.1415, cr_loss=0.4154, over 6757518.63 frames. ], batch size: 97, lr: 6.35e-03, grad_scale: 32.0 2024-09-18 03:33:16,823 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.76 vs. limit=15.0 2024-09-18 03:33:34,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=334156.6666666667, ans=0.0 2024-09-18 03:34:00,506 INFO [train.py:1198] (1/2) Epoch 19, batch 1850, loss[loss=0.2421, simple_loss=0.2929, pruned_loss=0.07228, ctc_loss=0.1483, cr_loss=0.4295, over 34441.00 frames. ], tot_loss[loss=0.2305, simple_loss=0.278, pruned_loss=0.06903, ctc_loss=0.1415, cr_loss=0.4154, over 6764039.46 frames. ], batch size: 100, lr: 6.34e-03, grad_scale: 32.0 2024-09-18 03:34:00,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=334250.0, ans=0.125 2024-09-18 03:34:32,887 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.52 vs. limit=15.0 2024-09-18 03:34:35,118 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.066e+02 2.753e+02 3.718e+02 4.824e+02 7.529e+02, threshold=7.437e+02, percent-clipped=14.0 2024-09-18 03:35:00,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=334390.0, ans=0.0 2024-09-18 03:35:21,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.67 vs. limit=12.0 2024-09-18 03:35:26,413 INFO [train.py:1198] (1/2) Epoch 19, batch 1900, loss[loss=0.2239, simple_loss=0.278, pruned_loss=0.06395, ctc_loss=0.1315, cr_loss=0.3921, over 34369.00 frames. ], tot_loss[loss=0.2309, simple_loss=0.2786, pruned_loss=0.06912, ctc_loss=0.1417, cr_loss=0.4161, over 6772312.67 frames. ], batch size: 103, lr: 6.34e-03, grad_scale: 32.0 2024-09-18 03:35:31,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=334483.3333333333, ans=0.125 2024-09-18 03:35:35,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.34 vs. limit=15.0 2024-09-18 03:35:41,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334530.0, ans=0.1 2024-09-18 03:35:53,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=334530.0, ans=0.0 2024-09-18 03:36:11,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=334576.6666666667, ans=0.04949747468305833 2024-09-18 03:36:47,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=334716.6666666667, ans=0.025 2024-09-18 03:36:48,918 INFO [train.py:1198] (1/2) Epoch 19, batch 1950, loss[loss=0.2255, simple_loss=0.2722, pruned_loss=0.06731, ctc_loss=0.1385, cr_loss=0.4113, over 34319.00 frames. ], tot_loss[loss=0.232, simple_loss=0.2799, pruned_loss=0.06946, ctc_loss=0.1424, cr_loss=0.418, over 6789263.67 frames. ], batch size: 91, lr: 6.34e-03, grad_scale: 32.0 2024-09-18 03:37:07,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=334763.3333333333, ans=0.125 2024-09-18 03:37:14,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=334763.3333333333, ans=0.2 2024-09-18 03:37:17,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=334763.3333333333, ans=0.125 2024-09-18 03:37:22,349 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.082e+02 2.432e+02 2.875e+02 3.634e+02 6.413e+02, threshold=5.750e+02, percent-clipped=0.0 2024-09-18 03:37:22,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=334810.0, ans=0.0 2024-09-18 03:37:25,002 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.80 vs. limit=22.5 2024-09-18 03:37:35,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334810.0, ans=0.1 2024-09-18 03:37:42,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334856.6666666667, ans=0.1 2024-09-18 03:37:48,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334856.6666666667, ans=0.1 2024-09-18 03:37:52,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=334856.6666666667, ans=0.0 2024-09-18 03:37:57,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=334903.3333333333, ans=0.0 2024-09-18 03:38:02,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=334903.3333333333, ans=0.125 2024-09-18 03:38:09,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.66 vs. limit=22.5 2024-09-18 03:38:12,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=334950.0, ans=0.2 2024-09-18 03:38:13,422 INFO [train.py:1198] (1/2) Epoch 19, batch 2000, loss[loss=0.2022, simple_loss=0.2467, pruned_loss=0.05943, ctc_loss=0.1199, cr_loss=0.3712, over 34187.00 frames. ], tot_loss[loss=0.2329, simple_loss=0.2806, pruned_loss=0.06988, ctc_loss=0.1431, cr_loss=0.4193, over 6765293.73 frames. ], batch size: 78, lr: 6.34e-03, grad_scale: 32.0 2024-09-18 03:38:28,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=12.0 2024-09-18 03:38:33,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=334996.6666666667, ans=0.125 2024-09-18 03:39:03,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=335043.3333333333, ans=0.125 2024-09-18 03:39:16,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.96 vs. limit=15.0 2024-09-18 03:39:24,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=335136.6666666667, ans=0.125 2024-09-18 03:39:24,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=335136.6666666667, ans=0.04949747468305833 2024-09-18 03:39:26,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.87 vs. limit=10.0 2024-09-18 03:39:31,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=335136.6666666667, ans=0.125 2024-09-18 03:39:38,910 INFO [train.py:1198] (1/2) Epoch 19, batch 2050, loss[loss=0.2023, simple_loss=0.2532, pruned_loss=0.05644, ctc_loss=0.118, cr_loss=0.3709, over 34497.00 frames. ], tot_loss[loss=0.2323, simple_loss=0.2798, pruned_loss=0.06978, ctc_loss=0.1429, cr_loss=0.4186, over 6755400.18 frames. ], batch size: 82, lr: 6.34e-03, grad_scale: 32.0 2024-09-18 03:39:52,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=335183.3333333333, ans=0.2 2024-09-18 03:39:55,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=335230.0, ans=0.2 2024-09-18 03:39:59,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2024-09-18 03:40:11,814 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.523e+02 3.062e+02 3.911e+02 7.884e+02, threshold=6.123e+02, percent-clipped=8.0 2024-09-18 03:40:41,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=335323.3333333333, ans=0.1 2024-09-18 03:40:46,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=335370.0, ans=0.0 2024-09-18 03:40:48,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.32 vs. limit=15.0 2024-09-18 03:40:51,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=335370.0, ans=0.1 2024-09-18 03:41:01,148 INFO [train.py:1198] (1/2) Epoch 19, batch 2100, loss[loss=0.2332, simple_loss=0.2774, pruned_loss=0.07134, ctc_loss=0.1438, cr_loss=0.4408, over 34554.00 frames. ], tot_loss[loss=0.2317, simple_loss=0.2791, pruned_loss=0.06951, ctc_loss=0.1424, cr_loss=0.4177, over 6769964.84 frames. ], batch size: 94, lr: 6.33e-03, grad_scale: 32.0 2024-09-18 03:41:18,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2024-09-18 03:41:22,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=335463.3333333333, ans=0.0 2024-09-18 03:41:24,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=335463.3333333333, ans=0.0 2024-09-18 03:41:53,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=335556.6666666667, ans=0.0 2024-09-18 03:41:54,194 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.20 vs. limit=15.0 2024-09-18 03:42:01,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=335556.6666666667, ans=0.035 2024-09-18 03:42:11,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.23 vs. limit=22.5 2024-09-18 03:42:24,371 INFO [train.py:1198] (1/2) Epoch 19, batch 2150, loss[loss=0.2233, simple_loss=0.2708, pruned_loss=0.06599, ctc_loss=0.1393, cr_loss=0.4024, over 34348.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2781, pruned_loss=0.06881, ctc_loss=0.1411, cr_loss=0.416, over 6786931.79 frames. ], batch size: 91, lr: 6.33e-03, grad_scale: 16.0 2024-09-18 03:42:38,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=335650.0, ans=0.0 2024-09-18 03:42:49,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=335696.6666666667, ans=0.125 2024-09-18 03:43:01,179 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.347e+02 2.754e+02 3.419e+02 7.435e+02, threshold=5.508e+02, percent-clipped=2.0 2024-09-18 03:43:19,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=335790.0, ans=0.0 2024-09-18 03:43:28,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=335790.0, ans=0.0 2024-09-18 03:43:47,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=335883.3333333333, ans=0.025 2024-09-18 03:43:49,166 INFO [train.py:1198] (1/2) Epoch 19, batch 2200, loss[loss=0.2304, simple_loss=0.2846, pruned_loss=0.0662, ctc_loss=0.1361, cr_loss=0.4118, over 34435.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2777, pruned_loss=0.06864, ctc_loss=0.1408, cr_loss=0.4152, over 6783426.67 frames. ], batch size: 100, lr: 6.33e-03, grad_scale: 16.0 2024-09-18 03:44:12,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=335930.0, ans=0.125 2024-09-18 03:44:17,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=335930.0, ans=0.07 2024-09-18 03:44:29,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=335976.6666666667, ans=0.125 2024-09-18 03:44:43,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=336023.3333333333, ans=0.0 2024-09-18 03:44:51,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2024-09-18 03:44:52,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=336023.3333333333, ans=0.0 2024-09-18 03:45:06,273 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.08 vs. limit=15.0 2024-09-18 03:45:07,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.46 vs. limit=15.0 2024-09-18 03:45:18,173 INFO [train.py:1198] (1/2) Epoch 19, batch 2250, loss[loss=0.2274, simple_loss=0.2794, pruned_loss=0.06592, ctc_loss=0.1362, cr_loss=0.4089, over 34434.00 frames. ], tot_loss[loss=0.2298, simple_loss=0.2777, pruned_loss=0.06857, ctc_loss=0.1405, cr_loss=0.4141, over 6781298.54 frames. ], batch size: 95, lr: 6.33e-03, grad_scale: 16.0 2024-09-18 03:45:28,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=336116.6666666667, ans=0.025 2024-09-18 03:45:35,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=336163.3333333333, ans=0.125 2024-09-18 03:45:54,614 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.718e+02 3.553e+02 4.361e+02 7.888e+02, threshold=7.106e+02, percent-clipped=9.0 2024-09-18 03:46:11,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=336256.6666666667, ans=0.2 2024-09-18 03:46:44,104 INFO [train.py:1198] (1/2) Epoch 19, batch 2300, loss[loss=0.2013, simple_loss=0.2473, pruned_loss=0.05778, ctc_loss=0.1244, cr_loss=0.3708, over 34295.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2765, pruned_loss=0.06818, ctc_loss=0.14, cr_loss=0.4135, over 6766693.02 frames. ], batch size: 83, lr: 6.32e-03, grad_scale: 16.0 2024-09-18 03:46:44,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=336350.0, ans=0.0 2024-09-18 03:46:47,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=336350.0, ans=0.07 2024-09-18 03:46:54,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=336350.0, ans=0.125 2024-09-18 03:46:59,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=336396.6666666667, ans=0.2 2024-09-18 03:47:00,732 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:47:10,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=336396.6666666667, ans=0.2 2024-09-18 03:47:55,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=336536.6666666667, ans=0.0 2024-09-18 03:48:03,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=336536.6666666667, ans=0.0 2024-09-18 03:48:06,679 INFO [train.py:1198] (1/2) Epoch 19, batch 2350, loss[loss=0.248, simple_loss=0.2944, pruned_loss=0.076, ctc_loss=0.1595, cr_loss=0.4416, over 34691.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2767, pruned_loss=0.06829, ctc_loss=0.1402, cr_loss=0.4136, over 6773111.32 frames. ], batch size: 97, lr: 6.32e-03, grad_scale: 16.0 2024-09-18 03:48:31,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=336630.0, ans=0.0 2024-09-18 03:48:38,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=336676.6666666667, ans=0.125 2024-09-18 03:48:41,053 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.164e+02 2.496e+02 2.869e+02 3.806e+02 6.762e+02, threshold=5.738e+02, percent-clipped=0.0 2024-09-18 03:48:48,639 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.61 vs. limit=15.0 2024-09-18 03:49:06,355 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=12.0 2024-09-18 03:49:30,599 INFO [train.py:1198] (1/2) Epoch 19, batch 2400, loss[loss=0.2056, simple_loss=0.2545, pruned_loss=0.05847, ctc_loss=0.1226, cr_loss=0.3826, over 34569.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2773, pruned_loss=0.06856, ctc_loss=0.1408, cr_loss=0.4156, over 6777757.56 frames. ], batch size: 89, lr: 6.32e-03, grad_scale: 32.0 2024-09-18 03:49:34,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=336816.6666666667, ans=0.09899494936611666 2024-09-18 03:49:49,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=336863.3333333333, ans=0.0 2024-09-18 03:50:07,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=336910.0, ans=0.0 2024-09-18 03:50:19,752 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 03:50:44,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=337003.3333333333, ans=0.05 2024-09-18 03:50:54,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=337050.0, ans=0.2 2024-09-18 03:50:55,734 INFO [train.py:1198] (1/2) Epoch 19, batch 2450, loss[loss=0.2303, simple_loss=0.2834, pruned_loss=0.06656, ctc_loss=0.1385, cr_loss=0.4064, over 34404.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.2785, pruned_loss=0.06905, ctc_loss=0.1418, cr_loss=0.417, over 6752314.54 frames. ], batch size: 95, lr: 6.32e-03, grad_scale: 32.0 2024-09-18 03:51:04,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=337050.0, ans=0.125 2024-09-18 03:51:07,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=337050.0, ans=0.0 2024-09-18 03:51:30,339 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.426e+02 2.951e+02 3.427e+02 6.453e+02, threshold=5.902e+02, percent-clipped=2.0 2024-09-18 03:51:30,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337143.3333333333, ans=0.1 2024-09-18 03:51:35,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=337143.3333333333, ans=0.0 2024-09-18 03:52:06,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=337236.6666666667, ans=0.07 2024-09-18 03:52:17,991 INFO [train.py:1198] (1/2) Epoch 19, batch 2500, loss[loss=0.2317, simple_loss=0.284, pruned_loss=0.06699, ctc_loss=0.1404, cr_loss=0.4314, over 34463.00 frames. ], tot_loss[loss=0.2309, simple_loss=0.2784, pruned_loss=0.06914, ctc_loss=0.1419, cr_loss=0.4172, over 6764062.45 frames. ], batch size: 100, lr: 6.32e-03, grad_scale: 32.0 2024-09-18 03:52:19,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=337283.3333333333, ans=0.125 2024-09-18 03:52:28,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=337283.3333333333, ans=0.125 2024-09-18 03:52:31,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=337283.3333333333, ans=0.1 2024-09-18 03:52:41,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=337330.0, ans=0.0 2024-09-18 03:52:51,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=337376.6666666667, ans=0.0 2024-09-18 03:53:04,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=337376.6666666667, ans=0.125 2024-09-18 03:53:33,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-09-18 03:53:39,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=337470.0, ans=0.0 2024-09-18 03:53:42,108 INFO [train.py:1198] (1/2) Epoch 19, batch 2550, loss[loss=0.2003, simple_loss=0.2489, pruned_loss=0.05692, ctc_loss=0.1163, cr_loss=0.3629, over 34151.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.2787, pruned_loss=0.06921, ctc_loss=0.142, cr_loss=0.4173, over 6767135.77 frames. ], batch size: 78, lr: 6.31e-03, grad_scale: 16.0 2024-09-18 03:53:49,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.67 vs. limit=15.0 2024-09-18 03:53:53,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=337516.6666666667, ans=0.2 2024-09-18 03:53:54,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=15.0 2024-09-18 03:54:11,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.08 vs. limit=15.0 2024-09-18 03:54:17,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=337610.0, ans=0.0 2024-09-18 03:54:20,570 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.583e+02 3.199e+02 4.437e+02 7.502e+02, threshold=6.397e+02, percent-clipped=6.0 2024-09-18 03:54:30,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=337610.0, ans=0.2 2024-09-18 03:55:06,829 INFO [train.py:1198] (1/2) Epoch 19, batch 2600, loss[loss=0.2188, simple_loss=0.2713, pruned_loss=0.06244, ctc_loss=0.1287, cr_loss=0.3916, over 34351.00 frames. ], tot_loss[loss=0.2315, simple_loss=0.2792, pruned_loss=0.06931, ctc_loss=0.1422, cr_loss=0.4177, over 6762084.16 frames. ], batch size: 91, lr: 6.31e-03, grad_scale: 16.0 2024-09-18 03:55:18,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=337750.0, ans=0.5 2024-09-18 03:55:23,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=337796.6666666667, ans=0.125 2024-09-18 03:55:32,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.13 vs. limit=15.0 2024-09-18 03:55:45,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=22.5 2024-09-18 03:56:28,961 INFO [train.py:1198] (1/2) Epoch 19, batch 2650, loss[loss=0.2576, simple_loss=0.3021, pruned_loss=0.08091, ctc_loss=0.1609, cr_loss=0.4769, over 34200.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2791, pruned_loss=0.06913, ctc_loss=0.1419, cr_loss=0.4171, over 6769081.21 frames. ], batch size: 117, lr: 6.31e-03, grad_scale: 16.0 2024-09-18 03:56:45,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=338030.0, ans=0.025 2024-09-18 03:57:06,670 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.505e+02 2.939e+02 3.716e+02 6.507e+02, threshold=5.879e+02, percent-clipped=2.0 2024-09-18 03:57:49,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=338170.0, ans=0.125 2024-09-18 03:57:54,778 INFO [train.py:1198] (1/2) Epoch 19, batch 2700, loss[loss=0.2491, simple_loss=0.2951, pruned_loss=0.07709, ctc_loss=0.1572, cr_loss=0.4381, over 34638.00 frames. ], tot_loss[loss=0.2318, simple_loss=0.2796, pruned_loss=0.0694, ctc_loss=0.1423, cr_loss=0.4175, over 6763405.92 frames. ], batch size: 102, lr: 6.31e-03, grad_scale: 16.0 2024-09-18 03:58:03,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=6.16 vs. limit=12.0 2024-09-18 03:58:06,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=338216.6666666667, ans=0.07 2024-09-18 03:58:16,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=338263.3333333333, ans=0.0 2024-09-18 03:58:18,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=338263.3333333333, ans=0.1 2024-09-18 03:58:28,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=15.0 2024-09-18 03:58:59,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=338403.3333333333, ans=0.125 2024-09-18 03:59:06,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=338403.3333333333, ans=0.0 2024-09-18 03:59:10,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=338403.3333333333, ans=0.125 2024-09-18 03:59:17,328 INFO [train.py:1198] (1/2) Epoch 19, batch 2750, loss[loss=0.2085, simple_loss=0.2559, pruned_loss=0.06017, ctc_loss=0.1268, cr_loss=0.3865, over 34617.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2784, pruned_loss=0.06902, ctc_loss=0.1416, cr_loss=0.4154, over 6760957.82 frames. ], batch size: 88, lr: 6.31e-03, grad_scale: 16.0 2024-09-18 03:59:42,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=338496.6666666667, ans=0.0 2024-09-18 03:59:53,334 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.355e+02 2.855e+02 3.510e+02 7.360e+02, threshold=5.710e+02, percent-clipped=1.0 2024-09-18 04:00:13,992 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.48 vs. limit=15.0 2024-09-18 04:00:16,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338590.0, ans=0.1 2024-09-18 04:00:28,722 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=15.0 2024-09-18 04:00:30,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=338636.6666666667, ans=0.125 2024-09-18 04:00:41,358 INFO [train.py:1198] (1/2) Epoch 19, batch 2800, loss[loss=0.2795, simple_loss=0.3082, pruned_loss=0.09713, ctc_loss=0.1923, cr_loss=0.4543, over 24530.00 frames. ], tot_loss[loss=0.2311, simple_loss=0.2787, pruned_loss=0.06927, ctc_loss=0.1419, cr_loss=0.4164, over 6740427.48 frames. ], batch size: 245, lr: 6.30e-03, grad_scale: 32.0 2024-09-18 04:00:43,445 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:00:51,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=338683.3333333333, ans=0.1 2024-09-18 04:01:03,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.41 vs. limit=12.0 2024-09-18 04:01:14,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=338776.6666666667, ans=0.0 2024-09-18 04:01:30,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=338823.3333333333, ans=0.0 2024-09-18 04:01:30,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2024-09-18 04:02:05,969 INFO [train.py:1198] (1/2) Epoch 19, batch 2850, loss[loss=0.2158, simple_loss=0.2611, pruned_loss=0.0638, ctc_loss=0.1345, cr_loss=0.3997, over 34466.00 frames. ], tot_loss[loss=0.2315, simple_loss=0.279, pruned_loss=0.06941, ctc_loss=0.1424, cr_loss=0.4175, over 6724899.56 frames. ], batch size: 90, lr: 6.30e-03, grad_scale: 32.0 2024-09-18 04:02:09,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=338916.6666666667, ans=0.0 2024-09-18 04:02:42,662 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.084e+02 2.561e+02 2.844e+02 3.603e+02 9.484e+02, threshold=5.688e+02, percent-clipped=1.0 2024-09-18 04:03:07,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=22.5 2024-09-18 04:03:08,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.41 vs. limit=15.0 2024-09-18 04:03:11,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2024-09-18 04:03:28,694 INFO [train.py:1198] (1/2) Epoch 19, batch 2900, loss[loss=0.2244, simple_loss=0.2736, pruned_loss=0.06557, ctc_loss=0.1387, cr_loss=0.4061, over 34529.00 frames. ], tot_loss[loss=0.2327, simple_loss=0.2804, pruned_loss=0.06977, ctc_loss=0.1431, cr_loss=0.4189, over 6755239.59 frames. ], batch size: 94, lr: 6.30e-03, grad_scale: 32.0 2024-09-18 04:03:34,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.05 vs. limit=15.0 2024-09-18 04:03:52,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=339196.6666666667, ans=0.0 2024-09-18 04:03:54,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.43 vs. limit=12.0 2024-09-18 04:04:13,942 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:04:30,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=339290.0, ans=0.1 2024-09-18 04:04:33,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=339290.0, ans=0.125 2024-09-18 04:04:38,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=339336.6666666667, ans=0.125 2024-09-18 04:04:40,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=339336.6666666667, ans=0.125 2024-09-18 04:04:48,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=339336.6666666667, ans=0.1 2024-09-18 04:04:53,171 INFO [train.py:1198] (1/2) Epoch 19, batch 2950, loss[loss=0.2306, simple_loss=0.2739, pruned_loss=0.07122, ctc_loss=0.1417, cr_loss=0.4142, over 34631.00 frames. ], tot_loss[loss=0.2314, simple_loss=0.2791, pruned_loss=0.06932, ctc_loss=0.142, cr_loss=0.417, over 6749244.15 frames. ], batch size: 88, lr: 6.30e-03, grad_scale: 32.0 2024-09-18 04:05:03,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=339383.3333333333, ans=0.0 2024-09-18 04:05:03,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=339383.3333333333, ans=0.1 2024-09-18 04:05:11,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=339430.0, ans=0.125 2024-09-18 04:05:19,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=339430.0, ans=0.2 2024-09-18 04:05:31,409 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.412e+02 2.961e+02 3.841e+02 6.816e+02, threshold=5.922e+02, percent-clipped=5.0 2024-09-18 04:05:38,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=339476.6666666667, ans=0.1 2024-09-18 04:06:11,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=339570.0, ans=0.125 2024-09-18 04:06:17,488 INFO [train.py:1198] (1/2) Epoch 19, batch 3000, loss[loss=0.2242, simple_loss=0.2726, pruned_loss=0.06617, ctc_loss=0.1348, cr_loss=0.4109, over 34535.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.2787, pruned_loss=0.06899, ctc_loss=0.1415, cr_loss=0.4162, over 6751365.03 frames. ], batch size: 94, lr: 6.29e-03, grad_scale: 32.0 2024-09-18 04:06:17,488 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 04:06:34,423 INFO [train.py:1230] (1/2) Epoch 19, validation: loss=0.1493, simple_loss=0.2461, pruned_loss=0.02217, ctc_loss=0.04103, cr_loss=1.833e-14, over 944034.00 frames. 2024-09-18 04:06:34,423 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 04:06:49,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=339663.3333333333, ans=0.0 2024-09-18 04:06:55,425 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=22.5 2024-09-18 04:07:01,803 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.08 vs. limit=15.0 2024-09-18 04:07:11,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=339710.0, ans=0.2 2024-09-18 04:07:25,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=339756.6666666667, ans=0.125 2024-09-18 04:07:35,270 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:07:46,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=339803.3333333333, ans=0.125 2024-09-18 04:07:56,171 INFO [train.py:1198] (1/2) Epoch 19, batch 3050, loss[loss=0.2316, simple_loss=0.2757, pruned_loss=0.07128, ctc_loss=0.14, cr_loss=0.4251, over 34590.00 frames. ], tot_loss[loss=0.2315, simple_loss=0.2794, pruned_loss=0.06927, ctc_loss=0.1419, cr_loss=0.4171, over 6743995.18 frames. ], batch size: 89, lr: 6.29e-03, grad_scale: 32.0 2024-09-18 04:08:20,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=339896.6666666667, ans=0.09899494936611666 2024-09-18 04:08:33,077 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.954e+02 2.522e+02 2.884e+02 3.345e+02 6.839e+02, threshold=5.768e+02, percent-clipped=1.0 2024-09-18 04:09:08,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.27 vs. limit=22.5 2024-09-18 04:09:18,438 INFO [train.py:1198] (1/2) Epoch 19, batch 3100, loss[loss=0.2394, simple_loss=0.2885, pruned_loss=0.072, ctc_loss=0.1442, cr_loss=0.4359, over 34317.00 frames. ], tot_loss[loss=0.2308, simple_loss=0.2788, pruned_loss=0.069, ctc_loss=0.1414, cr_loss=0.4161, over 6743477.62 frames. ], batch size: 117, lr: 6.29e-03, grad_scale: 32.0 2024-09-18 04:09:33,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=340130.0, ans=0.0 2024-09-18 04:09:49,488 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:09:57,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=340176.6666666667, ans=0.1 2024-09-18 04:10:04,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=340176.6666666667, ans=0.0 2024-09-18 04:10:25,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=340270.0, ans=0.5 2024-09-18 04:10:25,825 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.70 vs. limit=10.0 2024-09-18 04:10:41,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=14.20 vs. limit=15.0 2024-09-18 04:10:41,752 INFO [train.py:1198] (1/2) Epoch 19, batch 3150, loss[loss=0.2309, simple_loss=0.2831, pruned_loss=0.06708, ctc_loss=0.1395, cr_loss=0.4164, over 33865.00 frames. ], tot_loss[loss=0.2307, simple_loss=0.2785, pruned_loss=0.06897, ctc_loss=0.1413, cr_loss=0.4157, over 6749783.73 frames. ], batch size: 122, lr: 6.29e-03, grad_scale: 32.0 2024-09-18 04:11:06,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=340363.3333333333, ans=0.1 2024-09-18 04:11:16,860 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.679e+02 3.205e+02 4.609e+02 9.010e+02, threshold=6.411e+02, percent-clipped=6.0 2024-09-18 04:11:18,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=15.0 2024-09-18 04:11:22,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=340410.0, ans=0.0 2024-09-18 04:11:23,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=340410.0, ans=0.125 2024-09-18 04:11:30,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=340456.6666666667, ans=0.0 2024-09-18 04:11:41,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=340456.6666666667, ans=0.025 2024-09-18 04:11:47,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.00 vs. limit=5.0 2024-09-18 04:11:59,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=340503.3333333333, ans=0.125 2024-09-18 04:12:02,061 INFO [train.py:1198] (1/2) Epoch 19, batch 3200, loss[loss=0.2282, simple_loss=0.2759, pruned_loss=0.06777, ctc_loss=0.1421, cr_loss=0.4129, over 34542.00 frames. ], tot_loss[loss=0.23, simple_loss=0.2779, pruned_loss=0.06866, ctc_loss=0.1409, cr_loss=0.4151, over 6762996.46 frames. ], batch size: 94, lr: 6.29e-03, grad_scale: 32.0 2024-09-18 04:12:05,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=340550.0, ans=0.1 2024-09-18 04:12:10,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=340550.0, ans=0.125 2024-09-18 04:12:12,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.50 vs. limit=22.5 2024-09-18 04:12:22,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.91 vs. limit=15.0 2024-09-18 04:12:25,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=340596.6666666667, ans=0.025 2024-09-18 04:12:38,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=340643.3333333333, ans=0.0 2024-09-18 04:12:49,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=340690.0, ans=0.0 2024-09-18 04:13:12,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=340736.6666666667, ans=0.0 2024-09-18 04:13:15,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=340736.6666666667, ans=0.0 2024-09-18 04:13:23,125 INFO [train.py:1198] (1/2) Epoch 19, batch 3250, loss[loss=0.2346, simple_loss=0.2819, pruned_loss=0.0698, ctc_loss=0.1478, cr_loss=0.4538, over 34650.00 frames. ], tot_loss[loss=0.231, simple_loss=0.2787, pruned_loss=0.06911, ctc_loss=0.1417, cr_loss=0.417, over 6772315.60 frames. ], batch size: 98, lr: 6.28e-03, grad_scale: 32.0 2024-09-18 04:13:31,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=340783.3333333333, ans=0.125 2024-09-18 04:13:47,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=340830.0, ans=0.125 2024-09-18 04:13:58,418 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.904e+02 2.645e+02 3.106e+02 3.689e+02 6.249e+02, threshold=6.213e+02, percent-clipped=0.0 2024-09-18 04:14:22,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=340923.3333333333, ans=0.125 2024-09-18 04:14:24,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=340923.3333333333, ans=0.07 2024-09-18 04:14:44,986 INFO [train.py:1198] (1/2) Epoch 19, batch 3300, loss[loss=0.231, simple_loss=0.2845, pruned_loss=0.06716, ctc_loss=0.138, cr_loss=0.3876, over 32972.00 frames. ], tot_loss[loss=0.2298, simple_loss=0.2774, pruned_loss=0.06867, ctc_loss=0.1409, cr_loss=0.4151, over 6770199.14 frames. ], batch size: 130, lr: 6.28e-03, grad_scale: 32.0 2024-09-18 04:15:03,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=341063.3333333333, ans=0.125 2024-09-18 04:15:04,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=341063.3333333333, ans=0.125 2024-09-18 04:15:25,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=341110.0, ans=0.1 2024-09-18 04:15:49,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=341203.3333333333, ans=0.025 2024-09-18 04:16:05,519 INFO [train.py:1198] (1/2) Epoch 19, batch 3350, loss[loss=0.2305, simple_loss=0.2833, pruned_loss=0.06668, ctc_loss=0.1394, cr_loss=0.412, over 33871.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2783, pruned_loss=0.06898, ctc_loss=0.1416, cr_loss=0.4159, over 6744465.09 frames. ], batch size: 122, lr: 6.28e-03, grad_scale: 32.0 2024-09-18 04:16:09,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.24 vs. limit=12.0 2024-09-18 04:16:21,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=341296.6666666667, ans=0.09899494936611666 2024-09-18 04:16:23,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=341296.6666666667, ans=0.125 2024-09-18 04:16:27,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=341296.6666666667, ans=0.125 2024-09-18 04:16:34,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=341296.6666666667, ans=0.2 2024-09-18 04:16:41,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=341343.3333333333, ans=0.0 2024-09-18 04:16:42,565 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.047e+02 2.395e+02 2.722e+02 3.458e+02 5.029e+02, threshold=5.444e+02, percent-clipped=0.0 2024-09-18 04:16:44,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=341343.3333333333, ans=0.0 2024-09-18 04:17:10,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=341436.6666666667, ans=0.125 2024-09-18 04:17:10,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=341436.6666666667, ans=0.125 2024-09-18 04:17:25,506 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.21 vs. limit=15.0 2024-09-18 04:17:27,686 INFO [train.py:1198] (1/2) Epoch 19, batch 3400, loss[loss=0.194, simple_loss=0.2444, pruned_loss=0.054, ctc_loss=0.1111, cr_loss=0.3341, over 34134.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.2781, pruned_loss=0.06891, ctc_loss=0.1414, cr_loss=0.4158, over 6733984.20 frames. ], batch size: 78, lr: 6.28e-03, grad_scale: 32.0 2024-09-18 04:17:29,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=15.0 2024-09-18 04:17:32,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=341483.3333333333, ans=0.07 2024-09-18 04:18:13,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=341576.6666666667, ans=0.2 2024-09-18 04:18:27,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=341623.3333333333, ans=0.1 2024-09-18 04:18:49,241 INFO [train.py:1198] (1/2) Epoch 19, batch 3450, loss[loss=0.2449, simple_loss=0.2955, pruned_loss=0.07326, ctc_loss=0.1485, cr_loss=0.4491, over 33059.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2784, pruned_loss=0.06894, ctc_loss=0.1416, cr_loss=0.4166, over 6746200.79 frames. ], batch size: 130, lr: 6.28e-03, grad_scale: 32.0 2024-09-18 04:19:08,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=341763.3333333333, ans=0.1 2024-09-18 04:19:23,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=341810.0, ans=0.05 2024-09-18 04:19:24,161 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.498e+02 2.839e+02 3.844e+02 5.694e+02, threshold=5.679e+02, percent-clipped=2.0 2024-09-18 04:19:48,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=341856.6666666667, ans=0.0 2024-09-18 04:19:54,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.43 vs. limit=15.0 2024-09-18 04:20:09,243 INFO [train.py:1198] (1/2) Epoch 19, batch 3500, loss[loss=0.1969, simple_loss=0.2464, pruned_loss=0.055, ctc_loss=0.1165, cr_loss=0.3533, over 34465.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2779, pruned_loss=0.06886, ctc_loss=0.1414, cr_loss=0.416, over 6748177.04 frames. ], batch size: 85, lr: 6.27e-03, grad_scale: 32.0 2024-09-18 04:20:14,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=341950.0, ans=0.125 2024-09-18 04:20:19,323 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:20:27,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=341996.6666666667, ans=0.0 2024-09-18 04:20:35,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=341996.6666666667, ans=0.1 2024-09-18 04:21:19,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=342136.6666666667, ans=0.2 2024-09-18 04:21:22,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=342136.6666666667, ans=0.125 2024-09-18 04:21:27,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=342136.6666666667, ans=0.125 2024-09-18 04:21:30,581 INFO [train.py:1198] (1/2) Epoch 19, batch 3550, loss[loss=0.2295, simple_loss=0.2809, pruned_loss=0.06612, ctc_loss=0.1425, cr_loss=0.4355, over 34398.00 frames. ], tot_loss[loss=0.2302, simple_loss=0.2779, pruned_loss=0.06881, ctc_loss=0.1413, cr_loss=0.4163, over 6757589.87 frames. ], batch size: 103, lr: 6.27e-03, grad_scale: 32.0 2024-09-18 04:21:53,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=342230.0, ans=0.1 2024-09-18 04:21:53,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.67 vs. limit=10.0 2024-09-18 04:22:04,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=342276.6666666667, ans=0.5 2024-09-18 04:22:05,688 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.097e+02 2.450e+02 3.046e+02 3.851e+02 6.930e+02, threshold=6.092e+02, percent-clipped=1.0 2024-09-18 04:22:27,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=342323.3333333333, ans=0.2 2024-09-18 04:22:27,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=342323.3333333333, ans=0.125 2024-09-18 04:22:51,657 INFO [train.py:1198] (1/2) Epoch 19, batch 3600, loss[loss=0.2258, simple_loss=0.2691, pruned_loss=0.06915, ctc_loss=0.1398, cr_loss=0.4093, over 34492.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.2782, pruned_loss=0.06889, ctc_loss=0.1414, cr_loss=0.4163, over 6767108.56 frames. ], batch size: 90, lr: 6.27e-03, grad_scale: 32.0 2024-09-18 04:22:51,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=342416.6666666667, ans=0.1 2024-09-18 04:22:59,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=342416.6666666667, ans=0.1 2024-09-18 04:23:01,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342416.6666666667, ans=0.1 2024-09-18 04:23:07,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=342463.3333333333, ans=0.125 2024-09-18 04:23:17,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=342463.3333333333, ans=0.0 2024-09-18 04:23:43,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=342556.6666666667, ans=0.125 2024-09-18 04:23:44,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.20 vs. limit=15.0 2024-09-18 04:23:45,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=342556.6666666667, ans=0.125 2024-09-18 04:23:47,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.48 vs. limit=15.0 2024-09-18 04:24:12,316 INFO [train.py:1198] (1/2) Epoch 19, batch 3650, loss[loss=0.2503, simple_loss=0.2955, pruned_loss=0.07782, ctc_loss=0.1572, cr_loss=0.4473, over 34453.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2775, pruned_loss=0.06858, ctc_loss=0.1408, cr_loss=0.4148, over 6769686.90 frames. ], batch size: 110, lr: 6.27e-03, grad_scale: 32.0 2024-09-18 04:24:16,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.99 vs. limit=10.0 2024-09-18 04:24:22,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=342650.0, ans=0.0 2024-09-18 04:24:48,563 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.061e+02 2.710e+02 3.385e+02 4.331e+02 7.836e+02, threshold=6.770e+02, percent-clipped=6.0 2024-09-18 04:24:58,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=342743.3333333333, ans=0.125 2024-09-18 04:25:07,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=342790.0, ans=0.025 2024-09-18 04:25:11,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.30 vs. limit=15.0 2024-09-18 04:25:16,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.77 vs. limit=10.0 2024-09-18 04:25:30,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=342836.6666666667, ans=0.5 2024-09-18 04:25:32,900 INFO [train.py:1198] (1/2) Epoch 19, batch 3700, loss[loss=0.2371, simple_loss=0.2885, pruned_loss=0.06964, ctc_loss=0.1464, cr_loss=0.4304, over 34623.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.2777, pruned_loss=0.06847, ctc_loss=0.1407, cr_loss=0.414, over 6785089.61 frames. ], batch size: 102, lr: 6.26e-03, grad_scale: 32.0 2024-09-18 04:25:36,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=342883.3333333333, ans=0.125 2024-09-18 04:25:56,130 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:26:36,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=15.0 2024-09-18 04:26:38,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2024-09-18 04:26:42,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=343070.0, ans=0.125 2024-09-18 04:26:45,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=343070.0, ans=0.0 2024-09-18 04:26:46,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=343070.0, ans=0.125 2024-09-18 04:26:50,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.05 vs. limit=22.5 2024-09-18 04:26:54,798 INFO [train.py:1198] (1/2) Epoch 19, batch 3750, loss[loss=0.2534, simple_loss=0.2998, pruned_loss=0.07849, ctc_loss=0.1565, cr_loss=0.4657, over 34372.00 frames. ], tot_loss[loss=0.233, simple_loss=0.281, pruned_loss=0.06977, ctc_loss=0.1431, cr_loss=0.4194, over 6786810.25 frames. ], batch size: 113, lr: 6.26e-03, grad_scale: 32.0 2024-09-18 04:27:00,463 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.60 vs. limit=22.5 2024-09-18 04:27:00,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.53 vs. limit=22.5 2024-09-18 04:27:09,547 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:27:14,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=343163.3333333333, ans=0.1 2024-09-18 04:27:27,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=343210.0, ans=10.0 2024-09-18 04:27:30,587 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.398e+02 2.704e+02 3.394e+02 6.413e+02, threshold=5.408e+02, percent-clipped=0.0 2024-09-18 04:27:31,129 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.201e-02 2024-09-18 04:27:32,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=343210.0, ans=0.2 2024-09-18 04:27:37,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=22.5 2024-09-18 04:27:55,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=12.0 2024-09-18 04:27:56,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=343256.6666666667, ans=0.125 2024-09-18 04:28:01,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.78 vs. limit=15.0 2024-09-18 04:28:14,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=343350.0, ans=0.125 2024-09-18 04:28:15,645 INFO [train.py:1198] (1/2) Epoch 19, batch 3800, loss[loss=0.2699, simple_loss=0.3049, pruned_loss=0.09057, ctc_loss=0.1724, cr_loss=0.4802, over 30087.00 frames. ], tot_loss[loss=0.2368, simple_loss=0.284, pruned_loss=0.07163, ctc_loss=0.1466, cr_loss=0.4249, over 6677122.29 frames. ], batch size: 175, lr: 6.26e-03, grad_scale: 32.0 2024-09-18 04:28:17,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=343350.0, ans=0.125 2024-09-18 04:28:33,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=343396.6666666667, ans=0.125 2024-09-18 04:29:19,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=343490.0, ans=0.125 2024-09-18 04:29:37,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=343536.6666666667, ans=0.125 2024-09-18 04:29:40,687 INFO [train.py:1198] (1/2) Epoch 19, batch 3850, loss[loss=0.2619, simple_loss=0.2972, pruned_loss=0.08603, ctc_loss=0.1775, cr_loss=0.4764, over 22998.00 frames. ], tot_loss[loss=0.2417, simple_loss=0.287, pruned_loss=0.07438, ctc_loss=0.1522, cr_loss=0.4292, over 6252421.82 frames. ], batch size: 244, lr: 6.26e-03, grad_scale: 32.0 2024-09-18 04:29:47,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=343583.3333333333, ans=0.0 2024-09-18 04:30:02,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=343630.0, ans=10.0 2024-09-18 04:30:09,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=343630.0, ans=0.0 2024-09-18 04:30:12,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343676.6666666667, ans=0.1 2024-09-18 04:30:17,012 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.086e+02 2.560e+02 2.785e+02 3.015e+02 3.781e+02, threshold=5.569e+02, percent-clipped=0.0 2024-09-18 04:31:12,863 INFO [train.py:1198] (1/2) Epoch 20, batch 0, loss[loss=0.2169, simple_loss=0.2665, pruned_loss=0.0626, ctc_loss=0.1315, cr_loss=0.3925, over 34466.00 frames. ], tot_loss[loss=0.2169, simple_loss=0.2665, pruned_loss=0.0626, ctc_loss=0.1315, cr_loss=0.3925, over 34466.00 frames. ], batch size: 85, lr: 6.10e-03, grad_scale: 32.0 2024-09-18 04:31:12,863 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 04:31:18,768 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([6.7795, 6.1865, 6.5943, 6.3068], device='cuda:1') 2024-09-18 04:31:29,778 INFO [train.py:1230] (1/2) Epoch 20, validation: loss=0.1499, simple_loss=0.2476, pruned_loss=0.0219, ctc_loss=0.04237, cr_loss=1.757e-14, over 944034.00 frames. 2024-09-18 04:31:29,779 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 04:31:33,398 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:31:34,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=343704.6666666667, ans=0.2 2024-09-18 04:31:34,930 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:31:51,620 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:31:57,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2024-09-18 04:32:06,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=343798.0, ans=0.1 2024-09-18 04:32:16,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=343798.0, ans=0.1 2024-09-18 04:32:27,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=343844.6666666667, ans=0.1 2024-09-18 04:32:31,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=343844.6666666667, ans=0.125 2024-09-18 04:32:54,425 INFO [train.py:1198] (1/2) Epoch 20, batch 50, loss[loss=0.2061, simple_loss=0.2531, pruned_loss=0.05953, ctc_loss=0.1264, cr_loss=0.3712, over 34509.00 frames. ], tot_loss[loss=0.2324, simple_loss=0.2796, pruned_loss=0.06987, ctc_loss=0.1433, cr_loss=0.4199, over 1480645.57 frames. ], batch size: 82, lr: 6.09e-03, grad_scale: 32.0 2024-09-18 04:33:03,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=343938.0, ans=0.125 2024-09-18 04:33:04,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=343938.0, ans=0.0 2024-09-18 04:33:18,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=343984.6666666667, ans=0.0 2024-09-18 04:33:23,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.77 vs. limit=10.0 2024-09-18 04:33:29,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=344031.3333333333, ans=0.125 2024-09-18 04:33:36,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=344031.3333333333, ans=10.0 2024-09-18 04:33:41,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=344031.3333333333, ans=0.0 2024-09-18 04:33:56,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=344078.0, ans=0.125 2024-09-18 04:34:01,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=344124.6666666667, ans=0.0 2024-09-18 04:34:12,211 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.469e+02 2.816e+02 3.661e+02 7.257e+02, threshold=5.632e+02, percent-clipped=5.0 2024-09-18 04:34:12,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=344124.6666666667, ans=0.0 2024-09-18 04:34:18,882 INFO [train.py:1198] (1/2) Epoch 20, batch 100, loss[loss=0.2237, simple_loss=0.2714, pruned_loss=0.06646, ctc_loss=0.1355, cr_loss=0.3992, over 34599.00 frames. ], tot_loss[loss=0.2335, simple_loss=0.2812, pruned_loss=0.07009, ctc_loss=0.1437, cr_loss=0.4214, over 2629621.67 frames. ], batch size: 89, lr: 6.09e-03, grad_scale: 32.0 2024-09-18 04:34:29,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=344171.3333333333, ans=0.025 2024-09-18 04:34:31,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.41 vs. limit=15.0 2024-09-18 04:35:14,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=344311.3333333333, ans=0.125 2024-09-18 04:35:26,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=344358.0, ans=0.125 2024-09-18 04:35:35,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=344358.0, ans=0.125 2024-09-18 04:35:35,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=344358.0, ans=10.0 2024-09-18 04:35:40,820 INFO [train.py:1198] (1/2) Epoch 20, batch 150, loss[loss=0.2105, simple_loss=0.2601, pruned_loss=0.06019, ctc_loss=0.1244, cr_loss=0.3911, over 34496.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2793, pruned_loss=0.06901, ctc_loss=0.1416, cr_loss=0.4178, over 3557623.59 frames. ], batch size: 82, lr: 6.09e-03, grad_scale: 32.0 2024-09-18 04:35:47,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=344404.6666666667, ans=0.1 2024-09-18 04:35:49,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=344404.6666666667, ans=0.125 2024-09-18 04:36:12,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=344498.0, ans=0.0 2024-09-18 04:36:20,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=344498.0, ans=0.025 2024-09-18 04:36:24,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=344498.0, ans=0.125 2024-09-18 04:36:25,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=22.5 2024-09-18 04:36:25,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=344498.0, ans=0.0 2024-09-18 04:36:31,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-09-18 04:36:34,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=344544.6666666667, ans=0.2 2024-09-18 04:36:59,419 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.935e+02 2.527e+02 3.167e+02 3.904e+02 7.209e+02, threshold=6.333e+02, percent-clipped=3.0 2024-09-18 04:37:06,130 INFO [train.py:1198] (1/2) Epoch 20, batch 200, loss[loss=0.2579, simple_loss=0.2982, pruned_loss=0.08281, ctc_loss=0.1671, cr_loss=0.4615, over 31899.00 frames. ], tot_loss[loss=0.23, simple_loss=0.2781, pruned_loss=0.06858, ctc_loss=0.1409, cr_loss=0.4161, over 4273117.99 frames. ], batch size: 145, lr: 6.09e-03, grad_scale: 32.0 2024-09-18 04:37:59,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=344778.0, ans=0.05 2024-09-18 04:38:20,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=344824.6666666667, ans=0.125 2024-09-18 04:38:24,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=344824.6666666667, ans=0.125 2024-09-18 04:38:30,759 INFO [train.py:1198] (1/2) Epoch 20, batch 250, loss[loss=0.2386, simple_loss=0.2858, pruned_loss=0.07253, ctc_loss=0.1472, cr_loss=0.4238, over 34251.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2778, pruned_loss=0.06835, ctc_loss=0.1404, cr_loss=0.4159, over 4834952.19 frames. ], batch size: 117, lr: 6.09e-03, grad_scale: 32.0 2024-09-18 04:39:00,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=344918.0, ans=0.125 2024-09-18 04:39:00,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=344918.0, ans=0.2 2024-09-18 04:39:07,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=344964.6666666667, ans=0.125 2024-09-18 04:39:14,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=15.0 2024-09-18 04:39:22,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.29 vs. limit=10.0 2024-09-18 04:39:26,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.05 vs. limit=15.0 2024-09-18 04:39:26,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.31 vs. limit=12.0 2024-09-18 04:39:38,556 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:39:45,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=345058.0, ans=0.07 2024-09-18 04:39:46,262 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.394e+02 2.783e+02 3.533e+02 5.933e+02, threshold=5.566e+02, percent-clipped=0.0 2024-09-18 04:39:51,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=345104.6666666667, ans=0.0 2024-09-18 04:39:52,772 INFO [train.py:1198] (1/2) Epoch 20, batch 300, loss[loss=0.2645, simple_loss=0.3069, pruned_loss=0.08524, ctc_loss=0.1661, cr_loss=0.4614, over 34336.00 frames. ], tot_loss[loss=0.2296, simple_loss=0.2777, pruned_loss=0.0684, ctc_loss=0.1405, cr_loss=0.4159, over 5264416.67 frames. ], batch size: 107, lr: 6.08e-03, grad_scale: 32.0 2024-09-18 04:39:56,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=345104.6666666667, ans=0.125 2024-09-18 04:40:09,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=345151.3333333333, ans=0.125 2024-09-18 04:40:21,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=345151.3333333333, ans=0.125 2024-09-18 04:41:03,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=345291.3333333333, ans=0.0 2024-09-18 04:41:19,735 INFO [train.py:1198] (1/2) Epoch 20, batch 350, loss[loss=0.1978, simple_loss=0.2485, pruned_loss=0.05448, ctc_loss=0.1176, cr_loss=0.3631, over 34274.00 frames. ], tot_loss[loss=0.23, simple_loss=0.2782, pruned_loss=0.06845, ctc_loss=0.1408, cr_loss=0.4167, over 5600370.56 frames. ], batch size: 83, lr: 6.08e-03, grad_scale: 32.0 2024-09-18 04:41:37,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=345384.6666666667, ans=0.02 2024-09-18 04:41:51,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.05 vs. limit=15.0 2024-09-18 04:42:01,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.27 vs. limit=15.0 2024-09-18 04:42:04,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.79 vs. limit=15.0 2024-09-18 04:42:07,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=345478.0, ans=0.125 2024-09-18 04:42:24,084 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:42:25,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=345524.6666666667, ans=0.2 2024-09-18 04:42:36,480 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 2.507e+02 2.898e+02 3.866e+02 5.820e+02, threshold=5.796e+02, percent-clipped=5.0 2024-09-18 04:42:41,367 INFO [train.py:1198] (1/2) Epoch 20, batch 400, loss[loss=0.2217, simple_loss=0.2699, pruned_loss=0.06549, ctc_loss=0.1335, cr_loss=0.3976, over 34403.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2769, pruned_loss=0.06794, ctc_loss=0.1399, cr_loss=0.4149, over 5866352.44 frames. ], batch size: 95, lr: 6.08e-03, grad_scale: 32.0 2024-09-18 04:42:54,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=345571.3333333333, ans=0.0 2024-09-18 04:43:06,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=345618.0, ans=0.07 2024-09-18 04:43:47,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=345758.0, ans=0.125 2024-09-18 04:43:55,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=345758.0, ans=0.0 2024-09-18 04:44:03,702 INFO [train.py:1198] (1/2) Epoch 20, batch 450, loss[loss=0.2374, simple_loss=0.2869, pruned_loss=0.0711, ctc_loss=0.1434, cr_loss=0.4258, over 34684.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2771, pruned_loss=0.0681, ctc_loss=0.1401, cr_loss=0.4154, over 6055733.73 frames. ], batch size: 97, lr: 6.08e-03, grad_scale: 32.0 2024-09-18 04:44:24,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=345851.3333333333, ans=0.0 2024-09-18 04:44:24,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=345851.3333333333, ans=0.125 2024-09-18 04:44:34,402 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:44:44,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.48 vs. limit=6.0 2024-09-18 04:44:49,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.87 vs. limit=6.0 2024-09-18 04:45:12,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=345991.3333333333, ans=0.0 2024-09-18 04:45:20,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=345991.3333333333, ans=0.125 2024-09-18 04:45:25,060 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.505e+02 2.942e+02 3.427e+02 6.995e+02, threshold=5.884e+02, percent-clipped=1.0 2024-09-18 04:45:29,905 INFO [train.py:1198] (1/2) Epoch 20, batch 500, loss[loss=0.2316, simple_loss=0.2819, pruned_loss=0.06801, ctc_loss=0.1409, cr_loss=0.427, over 34512.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.2764, pruned_loss=0.06785, ctc_loss=0.1396, cr_loss=0.4151, over 6221925.54 frames. ], batch size: 110, lr: 6.08e-03, grad_scale: 32.0 2024-09-18 04:45:36,964 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:46:13,285 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:46:25,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.83 vs. limit=22.5 2024-09-18 04:46:44,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=346224.6666666667, ans=0.125 2024-09-18 04:46:52,698 INFO [train.py:1198] (1/2) Epoch 20, batch 550, loss[loss=0.2626, simple_loss=0.3057, pruned_loss=0.08436, ctc_loss=0.1633, cr_loss=0.4512, over 33873.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.2763, pruned_loss=0.06789, ctc_loss=0.1396, cr_loss=0.4148, over 6331244.68 frames. ], batch size: 122, lr: 6.07e-03, grad_scale: 32.0 2024-09-18 04:46:54,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=346271.3333333333, ans=0.125 2024-09-18 04:46:59,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=346271.3333333333, ans=0.125 2024-09-18 04:46:59,826 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 04:47:04,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=346271.3333333333, ans=0.2 2024-09-18 04:47:07,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=346318.0, ans=0.0 2024-09-18 04:48:03,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=346458.0, ans=0.0 2024-09-18 04:48:12,390 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.039e+02 2.459e+02 2.926e+02 3.578e+02 6.138e+02, threshold=5.851e+02, percent-clipped=3.0 2024-09-18 04:48:17,313 INFO [train.py:1198] (1/2) Epoch 20, batch 600, loss[loss=0.2456, simple_loss=0.2975, pruned_loss=0.07303, ctc_loss=0.1497, cr_loss=0.4429, over 34274.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2768, pruned_loss=0.06799, ctc_loss=0.1398, cr_loss=0.4148, over 6433178.60 frames. ], batch size: 117, lr: 6.07e-03, grad_scale: 32.0 2024-09-18 04:49:02,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=346598.0, ans=0.2 2024-09-18 04:49:21,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=346644.6666666667, ans=0.125 2024-09-18 04:49:23,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=346691.3333333333, ans=0.2 2024-09-18 04:49:24,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=12.0 2024-09-18 04:49:40,925 INFO [train.py:1198] (1/2) Epoch 20, batch 650, loss[loss=0.2224, simple_loss=0.2721, pruned_loss=0.06495, ctc_loss=0.1354, cr_loss=0.3917, over 34547.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2762, pruned_loss=0.0677, ctc_loss=0.1394, cr_loss=0.4137, over 6523564.56 frames. ], batch size: 94, lr: 6.07e-03, grad_scale: 32.0 2024-09-18 04:49:41,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=346738.0, ans=0.035 2024-09-18 04:50:47,710 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.97 vs. limit=15.0 2024-09-18 04:50:57,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=346924.6666666667, ans=0.125 2024-09-18 04:50:58,325 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.906e+02 2.495e+02 2.991e+02 4.487e+02 8.173e+02, threshold=5.982e+02, percent-clipped=11.0 2024-09-18 04:51:03,293 INFO [train.py:1198] (1/2) Epoch 20, batch 700, loss[loss=0.2098, simple_loss=0.2572, pruned_loss=0.06093, ctc_loss=0.1269, cr_loss=0.3789, over 34568.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.2769, pruned_loss=0.06795, ctc_loss=0.1398, cr_loss=0.4147, over 6580302.85 frames. ], batch size: 89, lr: 6.07e-03, grad_scale: 32.0 2024-09-18 04:51:42,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.20 vs. limit=15.0 2024-09-18 04:52:19,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=347158.0, ans=0.0 2024-09-18 04:52:30,662 INFO [train.py:1198] (1/2) Epoch 20, batch 750, loss[loss=0.2282, simple_loss=0.2801, pruned_loss=0.06667, ctc_loss=0.1356, cr_loss=0.3964, over 34404.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.2766, pruned_loss=0.06783, ctc_loss=0.1396, cr_loss=0.4136, over 6623885.36 frames. ], batch size: 95, lr: 6.07e-03, grad_scale: 32.0 2024-09-18 04:52:43,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=347204.6666666667, ans=0.1 2024-09-18 04:52:55,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=347251.3333333333, ans=0.125 2024-09-18 04:53:03,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=347298.0, ans=0.125 2024-09-18 04:53:14,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.96 vs. limit=22.5 2024-09-18 04:53:15,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=347298.0, ans=0.125 2024-09-18 04:53:48,146 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.092e+02 2.490e+02 2.933e+02 3.732e+02 6.230e+02, threshold=5.865e+02, percent-clipped=1.0 2024-09-18 04:53:53,005 INFO [train.py:1198] (1/2) Epoch 20, batch 800, loss[loss=0.2119, simple_loss=0.2569, pruned_loss=0.06229, ctc_loss=0.1309, cr_loss=0.4033, over 34491.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2763, pruned_loss=0.06769, ctc_loss=0.1392, cr_loss=0.4129, over 6659679.74 frames. ], batch size: 85, lr: 6.06e-03, grad_scale: 32.0 2024-09-18 04:54:06,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=347438.0, ans=0.2 2024-09-18 04:54:17,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=347484.6666666667, ans=0.125 2024-09-18 04:54:51,494 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.58 vs. limit=15.0 2024-09-18 04:54:55,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=347578.0, ans=0.125 2024-09-18 04:55:00,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=347624.6666666667, ans=0.125 2024-09-18 04:55:08,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2024-09-18 04:55:09,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=347624.6666666667, ans=0.125 2024-09-18 04:55:15,263 INFO [train.py:1198] (1/2) Epoch 20, batch 850, loss[loss=0.2347, simple_loss=0.2893, pruned_loss=0.06755, ctc_loss=0.14, cr_loss=0.4265, over 34396.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2761, pruned_loss=0.06768, ctc_loss=0.1391, cr_loss=0.4131, over 6691001.04 frames. ], batch size: 103, lr: 6.06e-03, grad_scale: 32.0 2024-09-18 04:55:28,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=347671.3333333333, ans=0.125 2024-09-18 04:55:31,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff2.min_abs, batch_count=347718.0, ans=0.1 2024-09-18 04:55:50,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=347764.6666666667, ans=0.0 2024-09-18 04:56:16,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.26 vs. limit=12.0 2024-09-18 04:56:37,122 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.476e+02 2.811e+02 3.539e+02 6.215e+02, threshold=5.621e+02, percent-clipped=2.0 2024-09-18 04:56:40,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=347904.6666666667, ans=0.0 2024-09-18 04:56:42,163 INFO [train.py:1198] (1/2) Epoch 20, batch 900, loss[loss=0.2029, simple_loss=0.2479, pruned_loss=0.05906, ctc_loss=0.1226, cr_loss=0.3825, over 34492.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2766, pruned_loss=0.06799, ctc_loss=0.1397, cr_loss=0.4145, over 6698842.50 frames. ], batch size: 85, lr: 6.06e-03, grad_scale: 32.0 2024-09-18 04:56:42,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=347904.6666666667, ans=0.125 2024-09-18 04:56:44,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=347904.6666666667, ans=0.0 2024-09-18 04:56:49,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.60 vs. limit=15.0 2024-09-18 04:56:58,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=347951.3333333333, ans=0.0 2024-09-18 04:57:01,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=347951.3333333333, ans=0.125 2024-09-18 04:57:08,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=347951.3333333333, ans=0.0 2024-09-18 04:57:55,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=348091.3333333333, ans=0.2 2024-09-18 04:57:56,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=348091.3333333333, ans=0.0 2024-09-18 04:58:04,267 INFO [train.py:1198] (1/2) Epoch 20, batch 950, loss[loss=0.2016, simple_loss=0.2548, pruned_loss=0.05495, ctc_loss=0.1173, cr_loss=0.3721, over 34678.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.277, pruned_loss=0.06804, ctc_loss=0.1399, cr_loss=0.4147, over 6700909.99 frames. ], batch size: 87, lr: 6.06e-03, grad_scale: 32.0 2024-09-18 04:58:16,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=348138.0, ans=0.125 2024-09-18 04:58:20,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=348184.6666666667, ans=0.125 2024-09-18 04:58:29,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=348184.6666666667, ans=0.125 2024-09-18 04:58:37,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=348231.3333333333, ans=0.125 2024-09-18 04:58:39,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=348231.3333333333, ans=0.0 2024-09-18 04:59:21,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=348324.6666666667, ans=0.025 2024-09-18 04:59:22,893 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.897e+02 2.981e+02 3.966e+02 5.018e+02 9.150e+02, threshold=7.932e+02, percent-clipped=13.0 2024-09-18 04:59:26,158 INFO [train.py:1198] (1/2) Epoch 20, batch 1000, loss[loss=0.2204, simple_loss=0.2662, pruned_loss=0.06565, ctc_loss=0.1347, cr_loss=0.4123, over 34497.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2773, pruned_loss=0.06813, ctc_loss=0.1401, cr_loss=0.4146, over 6694599.37 frames. ], batch size: 90, lr: 6.06e-03, grad_scale: 16.0 2024-09-18 05:00:08,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=348464.6666666667, ans=0.125 2024-09-18 05:00:44,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=348558.0, ans=0.04949747468305833 2024-09-18 05:00:50,861 INFO [train.py:1198] (1/2) Epoch 20, batch 1050, loss[loss=0.2308, simple_loss=0.2839, pruned_loss=0.06661, ctc_loss=0.1383, cr_loss=0.4212, over 34570.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2768, pruned_loss=0.06809, ctc_loss=0.1399, cr_loss=0.4136, over 6704082.84 frames. ], batch size: 99, lr: 6.05e-03, grad_scale: 16.0 2024-09-18 05:00:56,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=348604.6666666667, ans=0.125 2024-09-18 05:01:14,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2024-09-18 05:01:15,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=348651.3333333333, ans=0.0 2024-09-18 05:01:16,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.65 vs. limit=10.0 2024-09-18 05:01:39,371 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=22.5 2024-09-18 05:01:40,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=348744.6666666667, ans=0.125 2024-09-18 05:01:54,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.53 vs. limit=15.0 2024-09-18 05:01:55,584 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.90 vs. limit=15.0 2024-09-18 05:02:09,786 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 2.328e+02 2.656e+02 3.196e+02 5.709e+02, threshold=5.313e+02, percent-clipped=0.0 2024-09-18 05:02:13,065 INFO [train.py:1198] (1/2) Epoch 20, batch 1100, loss[loss=0.2168, simple_loss=0.2668, pruned_loss=0.06226, ctc_loss=0.133, cr_loss=0.3894, over 34348.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.2768, pruned_loss=0.06788, ctc_loss=0.1396, cr_loss=0.4132, over 6716758.70 frames. ], batch size: 91, lr: 6.05e-03, grad_scale: 16.0 2024-09-18 05:02:23,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=348838.0, ans=0.125 2024-09-18 05:02:25,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.24 vs. limit=12.0 2024-09-18 05:02:28,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=348884.6666666667, ans=0.0 2024-09-18 05:02:29,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=348884.6666666667, ans=0.025 2024-09-18 05:02:33,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=348884.6666666667, ans=0.025 2024-09-18 05:02:46,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=348931.3333333333, ans=0.125 2024-09-18 05:02:48,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=348931.3333333333, ans=0.125 2024-09-18 05:03:01,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=348978.0, ans=0.125 2024-09-18 05:03:04,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=348978.0, ans=0.0 2024-09-18 05:03:09,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=348978.0, ans=0.125 2024-09-18 05:03:16,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=348978.0, ans=0.0 2024-09-18 05:03:30,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=349024.6666666667, ans=0.125 2024-09-18 05:03:32,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2024-09-18 05:03:40,268 INFO [train.py:1198] (1/2) Epoch 20, batch 1150, loss[loss=0.2289, simple_loss=0.279, pruned_loss=0.06701, ctc_loss=0.139, cr_loss=0.4248, over 34354.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2767, pruned_loss=0.06797, ctc_loss=0.1398, cr_loss=0.4134, over 6714653.40 frames. ], batch size: 91, lr: 6.05e-03, grad_scale: 16.0 2024-09-18 05:03:42,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=349071.3333333333, ans=0.125 2024-09-18 05:03:47,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2024-09-18 05:04:40,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=349211.3333333333, ans=0.05 2024-09-18 05:04:54,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=349258.0, ans=0.125 2024-09-18 05:04:59,513 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.078e+02 2.512e+02 3.059e+02 3.732e+02 9.744e+02, threshold=6.119e+02, percent-clipped=3.0 2024-09-18 05:05:01,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=349304.6666666667, ans=0.05 2024-09-18 05:05:02,740 INFO [train.py:1198] (1/2) Epoch 20, batch 1200, loss[loss=0.2359, simple_loss=0.281, pruned_loss=0.0718, ctc_loss=0.1469, cr_loss=0.4479, over 34570.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2773, pruned_loss=0.06812, ctc_loss=0.1401, cr_loss=0.4145, over 6707695.18 frames. ], batch size: 99, lr: 6.05e-03, grad_scale: 32.0 2024-09-18 05:05:04,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=349304.6666666667, ans=0.1 2024-09-18 05:05:06,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=349304.6666666667, ans=0.0 2024-09-18 05:05:35,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=349398.0, ans=0.125 2024-09-18 05:05:49,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=349398.0, ans=0.125 2024-09-18 05:05:49,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=349398.0, ans=0.025 2024-09-18 05:06:03,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=349444.6666666667, ans=0.2 2024-09-18 05:06:12,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=349491.3333333333, ans=0.125 2024-09-18 05:06:25,154 INFO [train.py:1198] (1/2) Epoch 20, batch 1250, loss[loss=0.2464, simple_loss=0.2924, pruned_loss=0.07525, ctc_loss=0.1567, cr_loss=0.4637, over 34328.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2779, pruned_loss=0.06821, ctc_loss=0.1404, cr_loss=0.4156, over 6741876.16 frames. ], batch size: 107, lr: 6.05e-03, grad_scale: 32.0 2024-09-18 05:06:30,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=349538.0, ans=0.0 2024-09-18 05:06:41,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.70 vs. limit=15.0 2024-09-18 05:07:09,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=22.5 2024-09-18 05:07:10,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=349631.3333333333, ans=0.125 2024-09-18 05:07:46,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=349724.6666666667, ans=0.125 2024-09-18 05:07:48,927 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.083e+02 2.472e+02 2.747e+02 3.542e+02 1.198e+03, threshold=5.494e+02, percent-clipped=2.0 2024-09-18 05:07:52,116 INFO [train.py:1198] (1/2) Epoch 20, batch 1300, loss[loss=0.2479, simple_loss=0.2956, pruned_loss=0.07616, ctc_loss=0.1502, cr_loss=0.4437, over 33164.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2771, pruned_loss=0.06796, ctc_loss=0.1398, cr_loss=0.4149, over 6746555.37 frames. ], batch size: 130, lr: 6.04e-03, grad_scale: 32.0 2024-09-18 05:07:54,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=349771.3333333333, ans=0.2 2024-09-18 05:08:12,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=349818.0, ans=0.035 2024-09-18 05:08:38,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=349864.6666666667, ans=0.1 2024-09-18 05:08:48,703 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:09:15,280 INFO [train.py:1198] (1/2) Epoch 20, batch 1350, loss[loss=0.2251, simple_loss=0.2744, pruned_loss=0.06625, ctc_loss=0.1343, cr_loss=0.4134, over 34537.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.2767, pruned_loss=0.06779, ctc_loss=0.1396, cr_loss=0.4147, over 6765371.04 frames. ], batch size: 94, lr: 6.04e-03, grad_scale: 32.0 2024-09-18 05:10:16,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2024-09-18 05:10:27,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=350191.3333333333, ans=0.125 2024-09-18 05:10:33,828 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 2.447e+02 2.767e+02 3.560e+02 6.185e+02, threshold=5.535e+02, percent-clipped=3.0 2024-09-18 05:10:37,197 INFO [train.py:1198] (1/2) Epoch 20, batch 1400, loss[loss=0.1996, simple_loss=0.2459, pruned_loss=0.05729, ctc_loss=0.1181, cr_loss=0.3744, over 34277.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2767, pruned_loss=0.06794, ctc_loss=0.1397, cr_loss=0.4151, over 6777572.62 frames. ], batch size: 80, lr: 6.04e-03, grad_scale: 32.0 2024-09-18 05:10:52,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=350238.0, ans=0.0 2024-09-18 05:11:04,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=22.5 2024-09-18 05:11:11,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.66 vs. limit=10.0 2024-09-18 05:11:22,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=350331.3333333333, ans=0.125 2024-09-18 05:11:26,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=350331.3333333333, ans=0.0 2024-09-18 05:11:46,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.20 vs. limit=15.0 2024-09-18 05:12:03,676 INFO [train.py:1198] (1/2) Epoch 20, batch 1450, loss[loss=0.2592, simple_loss=0.3058, pruned_loss=0.08118, ctc_loss=0.1618, cr_loss=0.4482, over 34485.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.2776, pruned_loss=0.06809, ctc_loss=0.1401, cr_loss=0.4161, over 6775818.89 frames. ], batch size: 110, lr: 6.04e-03, grad_scale: 16.0 2024-09-18 05:12:05,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=350471.3333333333, ans=0.0 2024-09-18 05:12:06,649 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-09-18 05:12:35,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=15.0 2024-09-18 05:12:55,010 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:13:05,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=350611.3333333333, ans=10.0 2024-09-18 05:13:06,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=350611.3333333333, ans=0.2 2024-09-18 05:13:21,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=350658.0, ans=0.1 2024-09-18 05:13:26,027 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.085e+02 2.548e+02 3.088e+02 4.086e+02 5.547e+02, threshold=6.177e+02, percent-clipped=1.0 2024-09-18 05:13:26,054 INFO [train.py:1198] (1/2) Epoch 20, batch 1500, loss[loss=0.242, simple_loss=0.2931, pruned_loss=0.07153, ctc_loss=0.1502, cr_loss=0.4432, over 34455.00 frames. ], tot_loss[loss=0.2297, simple_loss=0.278, pruned_loss=0.06827, ctc_loss=0.1404, cr_loss=0.4167, over 6775513.36 frames. ], batch size: 100, lr: 6.04e-03, grad_scale: 8.0 2024-09-18 05:13:26,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=350704.6666666667, ans=0.0 2024-09-18 05:13:51,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=350751.3333333333, ans=0.1 2024-09-18 05:14:10,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2024-09-18 05:14:16,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=12.0 2024-09-18 05:14:18,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=350844.6666666667, ans=0.2 2024-09-18 05:14:48,097 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=22.5 2024-09-18 05:14:50,865 INFO [train.py:1198] (1/2) Epoch 20, batch 1550, loss[loss=0.2436, simple_loss=0.2916, pruned_loss=0.07418, ctc_loss=0.1512, cr_loss=0.424, over 34419.00 frames. ], tot_loss[loss=0.2298, simple_loss=0.2779, pruned_loss=0.06841, ctc_loss=0.1407, cr_loss=0.4164, over 6746711.88 frames. ], batch size: 105, lr: 6.03e-03, grad_scale: 8.0 2024-09-18 05:14:58,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=350938.0, ans=0.0 2024-09-18 05:15:08,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=350984.6666666667, ans=0.95 2024-09-18 05:15:13,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=350984.6666666667, ans=0.125 2024-09-18 05:15:44,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=351078.0, ans=0.125 2024-09-18 05:16:04,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=351124.6666666667, ans=0.125 2024-09-18 05:16:15,245 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.546e+02 3.174e+02 3.920e+02 7.647e+02, threshold=6.347e+02, percent-clipped=3.0 2024-09-18 05:16:15,266 INFO [train.py:1198] (1/2) Epoch 20, batch 1600, loss[loss=0.2448, simple_loss=0.2911, pruned_loss=0.07522, ctc_loss=0.1517, cr_loss=0.444, over 34570.00 frames. ], tot_loss[loss=0.23, simple_loss=0.2781, pruned_loss=0.06855, ctc_loss=0.1409, cr_loss=0.4172, over 6725645.96 frames. ], batch size: 99, lr: 6.03e-03, grad_scale: 16.0 2024-09-18 05:16:24,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.41 vs. limit=15.0 2024-09-18 05:16:35,250 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:16:37,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2024-09-18 05:16:42,056 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.83 vs. limit=10.0 2024-09-18 05:17:25,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.28 vs. limit=15.0 2024-09-18 05:17:33,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=351358.0, ans=15.0 2024-09-18 05:17:34,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=351358.0, ans=0.125 2024-09-18 05:17:35,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=351404.6666666667, ans=0.125 2024-09-18 05:17:35,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=351404.6666666667, ans=0.0 2024-09-18 05:17:37,077 INFO [train.py:1198] (1/2) Epoch 20, batch 1650, loss[loss=0.2421, simple_loss=0.2914, pruned_loss=0.07285, ctc_loss=0.1469, cr_loss=0.4429, over 34417.00 frames. ], tot_loss[loss=0.2298, simple_loss=0.2777, pruned_loss=0.06851, ctc_loss=0.1407, cr_loss=0.4168, over 6719309.14 frames. ], batch size: 103, lr: 6.03e-03, grad_scale: 16.0 2024-09-18 05:17:39,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=351404.6666666667, ans=0.1 2024-09-18 05:17:39,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=351404.6666666667, ans=0.125 2024-09-18 05:17:44,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=351404.6666666667, ans=0.0 2024-09-18 05:17:52,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=22.5 2024-09-18 05:18:06,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=351451.3333333333, ans=0.025 2024-09-18 05:18:07,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2024-09-18 05:18:10,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=351498.0, ans=0.025 2024-09-18 05:18:10,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=351498.0, ans=0.2 2024-09-18 05:18:10,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=351498.0, ans=0.125 2024-09-18 05:18:43,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=351544.6666666667, ans=0.125 2024-09-18 05:19:02,992 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.528e+02 2.903e+02 3.994e+02 7.934e+02, threshold=5.806e+02, percent-clipped=1.0 2024-09-18 05:19:03,013 INFO [train.py:1198] (1/2) Epoch 20, batch 1700, loss[loss=0.1943, simple_loss=0.2431, pruned_loss=0.05383, ctc_loss=0.1169, cr_loss=0.3604, over 34295.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.2775, pruned_loss=0.06815, ctc_loss=0.1401, cr_loss=0.4156, over 6744795.38 frames. ], batch size: 80, lr: 6.03e-03, grad_scale: 16.0 2024-09-18 05:19:13,392 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:19:14,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=351638.0, ans=0.0 2024-09-18 05:20:05,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=351778.0, ans=0.125 2024-09-18 05:20:08,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2024-09-18 05:20:25,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.37 vs. limit=10.0 2024-09-18 05:20:25,950 INFO [train.py:1198] (1/2) Epoch 20, batch 1750, loss[loss=0.2047, simple_loss=0.2518, pruned_loss=0.05912, ctc_loss=0.1239, cr_loss=0.3647, over 34136.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2772, pruned_loss=0.06802, ctc_loss=0.14, cr_loss=0.4152, over 6753778.48 frames. ], batch size: 78, lr: 6.03e-03, grad_scale: 16.0 2024-09-18 05:20:50,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=351918.0, ans=0.0 2024-09-18 05:20:54,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=351918.0, ans=0.1 2024-09-18 05:20:57,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.97 vs. limit=10.0 2024-09-18 05:21:12,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=351964.6666666667, ans=10.0 2024-09-18 05:21:16,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=352011.3333333333, ans=0.0 2024-09-18 05:21:25,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=352011.3333333333, ans=0.0 2024-09-18 05:21:36,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.90 vs. limit=22.5 2024-09-18 05:21:38,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=352058.0, ans=0.1 2024-09-18 05:21:45,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352058.0, ans=0.1 2024-09-18 05:21:47,822 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.460e+02 2.866e+02 3.783e+02 5.788e+02, threshold=5.733e+02, percent-clipped=0.0 2024-09-18 05:21:47,843 INFO [train.py:1198] (1/2) Epoch 20, batch 1800, loss[loss=0.2459, simple_loss=0.2939, pruned_loss=0.07473, ctc_loss=0.1526, cr_loss=0.4463, over 34695.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2774, pruned_loss=0.06792, ctc_loss=0.1398, cr_loss=0.4152, over 6756928.63 frames. ], batch size: 97, lr: 6.02e-03, grad_scale: 16.0 2024-09-18 05:22:03,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=352104.6666666667, ans=0.125 2024-09-18 05:22:09,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=352151.3333333333, ans=0.125 2024-09-18 05:22:11,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=352151.3333333333, ans=0.2 2024-09-18 05:22:17,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.42 vs. limit=6.0 2024-09-18 05:22:33,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=352198.0, ans=0.125 2024-09-18 05:22:44,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=352244.6666666667, ans=0.0 2024-09-18 05:22:45,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.11 vs. limit=10.0 2024-09-18 05:22:51,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=352244.6666666667, ans=0.0 2024-09-18 05:23:03,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=15.0 2024-09-18 05:23:09,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=352291.3333333333, ans=0.0 2024-09-18 05:23:14,402 INFO [train.py:1198] (1/2) Epoch 20, batch 1850, loss[loss=0.2413, simple_loss=0.2932, pruned_loss=0.07102, ctc_loss=0.1479, cr_loss=0.4472, over 34452.00 frames. ], tot_loss[loss=0.2283, simple_loss=0.2769, pruned_loss=0.06759, ctc_loss=0.1392, cr_loss=0.4145, over 6764755.61 frames. ], batch size: 100, lr: 6.02e-03, grad_scale: 16.0 2024-09-18 05:23:14,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=352338.0, ans=0.125 2024-09-18 05:23:19,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352338.0, ans=0.1 2024-09-18 05:23:30,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.18 vs. limit=15.0 2024-09-18 05:23:36,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=352384.6666666667, ans=0.025 2024-09-18 05:23:51,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=352431.3333333333, ans=0.125 2024-09-18 05:23:56,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=352431.3333333333, ans=0.125 2024-09-18 05:24:06,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=352478.0, ans=0.1 2024-09-18 05:24:07,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352478.0, ans=0.1 2024-09-18 05:24:11,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=352478.0, ans=0.2 2024-09-18 05:24:11,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=352478.0, ans=0.125 2024-09-18 05:24:19,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=352524.6666666667, ans=0.125 2024-09-18 05:24:20,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=352524.6666666667, ans=0.0 2024-09-18 05:24:36,753 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.043e+02 2.587e+02 3.160e+02 4.387e+02 9.464e+02, threshold=6.321e+02, percent-clipped=11.0 2024-09-18 05:24:36,779 INFO [train.py:1198] (1/2) Epoch 20, batch 1900, loss[loss=0.2364, simple_loss=0.2898, pruned_loss=0.06843, ctc_loss=0.1427, cr_loss=0.4385, over 34429.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2774, pruned_loss=0.06772, ctc_loss=0.1395, cr_loss=0.415, over 6773121.21 frames. ], batch size: 103, lr: 6.02e-03, grad_scale: 16.0 2024-09-18 05:24:43,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=352571.3333333333, ans=0.2 2024-09-18 05:24:51,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=352618.0, ans=0.125 2024-09-18 05:25:28,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=352711.3333333333, ans=0.0 2024-09-18 05:26:00,977 INFO [train.py:1198] (1/2) Epoch 20, batch 1950, loss[loss=0.2243, simple_loss=0.2716, pruned_loss=0.06653, ctc_loss=0.1366, cr_loss=0.4154, over 34755.00 frames. ], tot_loss[loss=0.2299, simple_loss=0.2786, pruned_loss=0.06822, ctc_loss=0.1404, cr_loss=0.4173, over 6790526.78 frames. ], batch size: 92, lr: 6.02e-03, grad_scale: 16.0 2024-09-18 05:26:23,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=352851.3333333333, ans=0.2 2024-09-18 05:26:44,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=352898.0, ans=0.0 2024-09-18 05:26:51,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=352944.6666666667, ans=0.125 2024-09-18 05:26:57,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=352944.6666666667, ans=0.0 2024-09-18 05:27:01,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=352944.6666666667, ans=0.125 2024-09-18 05:27:04,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=352944.6666666667, ans=0.125 2024-09-18 05:27:07,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=352991.3333333333, ans=0.0 2024-09-18 05:27:12,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352991.3333333333, ans=0.1 2024-09-18 05:27:14,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352991.3333333333, ans=0.1 2024-09-18 05:27:17,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=352991.3333333333, ans=0.125 2024-09-18 05:27:25,710 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.995e+02 2.535e+02 2.944e+02 3.552e+02 8.263e+02, threshold=5.889e+02, percent-clipped=2.0 2024-09-18 05:27:25,732 INFO [train.py:1198] (1/2) Epoch 20, batch 2000, loss[loss=0.2048, simple_loss=0.2526, pruned_loss=0.05902, ctc_loss=0.1208, cr_loss=0.3706, over 34151.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.279, pruned_loss=0.06838, ctc_loss=0.1406, cr_loss=0.4175, over 6766464.06 frames. ], batch size: 78, lr: 6.02e-03, grad_scale: 32.0 2024-09-18 05:27:27,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=353038.0, ans=0.0 2024-09-18 05:27:36,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.61 vs. limit=10.0 2024-09-18 05:27:37,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=353038.0, ans=0.0 2024-09-18 05:27:39,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=353038.0, ans=0.125 2024-09-18 05:27:53,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=353084.6666666667, ans=0.0 2024-09-18 05:27:59,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=353131.3333333333, ans=0.0 2024-09-18 05:28:13,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=353131.3333333333, ans=0.0 2024-09-18 05:28:32,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=353224.6666666667, ans=0.025 2024-09-18 05:28:32,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=353224.6666666667, ans=0.0 2024-09-18 05:28:37,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=353224.6666666667, ans=0.0 2024-09-18 05:28:37,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=353224.6666666667, ans=0.0 2024-09-18 05:28:47,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=353271.3333333333, ans=0.025 2024-09-18 05:28:49,049 INFO [train.py:1198] (1/2) Epoch 20, batch 2050, loss[loss=0.2043, simple_loss=0.2517, pruned_loss=0.05884, ctc_loss=0.1216, cr_loss=0.3737, over 34476.00 frames. ], tot_loss[loss=0.229, simple_loss=0.2777, pruned_loss=0.06791, ctc_loss=0.1396, cr_loss=0.415, over 6757473.17 frames. ], batch size: 82, lr: 6.01e-03, grad_scale: 16.0 2024-09-18 05:28:52,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=353271.3333333333, ans=0.025 2024-09-18 05:29:12,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=353318.0, ans=0.125 2024-09-18 05:29:15,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=353318.0, ans=0.0 2024-09-18 05:30:00,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.18 vs. limit=10.0 2024-09-18 05:30:15,044 INFO [train.py:1198] (1/2) Epoch 20, batch 2100, loss[loss=0.2194, simple_loss=0.2685, pruned_loss=0.06357, ctc_loss=0.1337, cr_loss=0.4099, over 34561.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2768, pruned_loss=0.06754, ctc_loss=0.1388, cr_loss=0.4131, over 6770335.61 frames. ], batch size: 94, lr: 6.01e-03, grad_scale: 16.0 2024-09-18 05:30:16,612 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.378e+02 2.746e+02 3.483e+02 6.069e+02, threshold=5.492e+02, percent-clipped=1.0 2024-09-18 05:30:23,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=353504.6666666667, ans=0.125 2024-09-18 05:30:46,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=353598.0, ans=0.5 2024-09-18 05:30:53,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=353598.0, ans=0.0 2024-09-18 05:30:56,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=353598.0, ans=0.1 2024-09-18 05:31:15,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=353644.6666666667, ans=0.125 2024-09-18 05:31:37,187 INFO [train.py:1198] (1/2) Epoch 20, batch 2150, loss[loss=0.2162, simple_loss=0.2669, pruned_loss=0.06136, ctc_loss=0.1344, cr_loss=0.3973, over 34327.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2761, pruned_loss=0.06718, ctc_loss=0.1383, cr_loss=0.4123, over 6789330.97 frames. ], batch size: 91, lr: 6.01e-03, grad_scale: 16.0 2024-09-18 05:31:39,356 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:31:53,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=353784.6666666667, ans=0.2 2024-09-18 05:32:02,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2024-09-18 05:32:20,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=353831.3333333333, ans=0.0 2024-09-18 05:32:35,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=353878.0, ans=0.1 2024-09-18 05:32:38,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=353878.0, ans=0.1 2024-09-18 05:32:48,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=353924.6666666667, ans=0.125 2024-09-18 05:32:56,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=353924.6666666667, ans=0.125 2024-09-18 05:32:58,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=353971.3333333333, ans=0.5 2024-09-18 05:32:59,597 INFO [train.py:1198] (1/2) Epoch 20, batch 2200, loss[loss=0.2404, simple_loss=0.289, pruned_loss=0.07283, ctc_loss=0.1474, cr_loss=0.4165, over 34469.00 frames. ], tot_loss[loss=0.2277, simple_loss=0.2762, pruned_loss=0.06742, ctc_loss=0.1386, cr_loss=0.4127, over 6781509.61 frames. ], batch size: 100, lr: 6.01e-03, grad_scale: 16.0 2024-09-18 05:33:01,200 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.024e+02 2.314e+02 2.913e+02 3.924e+02 6.870e+02, threshold=5.827e+02, percent-clipped=5.0 2024-09-18 05:33:13,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=353971.3333333333, ans=0.125 2024-09-18 05:33:17,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.45 vs. limit=22.5 2024-09-18 05:33:33,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=12.0 2024-09-18 05:33:36,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354064.6666666667, ans=0.1 2024-09-18 05:33:38,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.41 vs. limit=10.0 2024-09-18 05:34:16,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354158.0, ans=0.1 2024-09-18 05:34:25,861 INFO [train.py:1198] (1/2) Epoch 20, batch 2250, loss[loss=0.2372, simple_loss=0.2852, pruned_loss=0.07157, ctc_loss=0.1457, cr_loss=0.4235, over 34401.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.276, pruned_loss=0.0672, ctc_loss=0.1383, cr_loss=0.4122, over 6779608.88 frames. ], batch size: 95, lr: 6.01e-03, grad_scale: 16.0 2024-09-18 05:34:32,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=354204.6666666667, ans=0.0 2024-09-18 05:34:47,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=354251.3333333333, ans=0.125 2024-09-18 05:34:59,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.76 vs. limit=15.0 2024-09-18 05:35:10,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=354298.0, ans=0.025 2024-09-18 05:35:35,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=354391.3333333333, ans=0.0 2024-09-18 05:35:38,948 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:35:40,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=354391.3333333333, ans=0.125 2024-09-18 05:35:48,616 INFO [train.py:1198] (1/2) Epoch 20, batch 2300, loss[loss=0.2011, simple_loss=0.2506, pruned_loss=0.05631, ctc_loss=0.1196, cr_loss=0.3742, over 34291.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2749, pruned_loss=0.06679, ctc_loss=0.1377, cr_loss=0.4103, over 6764980.11 frames. ], batch size: 83, lr: 6.00e-03, grad_scale: 16.0 2024-09-18 05:35:50,241 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.548e+02 3.297e+02 4.335e+02 7.686e+02, threshold=6.594e+02, percent-clipped=7.0 2024-09-18 05:36:11,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=354484.6666666667, ans=0.1 2024-09-18 05:36:13,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=354484.6666666667, ans=0.125 2024-09-18 05:36:18,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=354484.6666666667, ans=0.125 2024-09-18 05:36:18,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.37 vs. limit=15.0 2024-09-18 05:36:34,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=354531.3333333333, ans=0.125 2024-09-18 05:36:59,826 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:37:19,174 INFO [train.py:1198] (1/2) Epoch 20, batch 2350, loss[loss=0.2303, simple_loss=0.2796, pruned_loss=0.06831, ctc_loss=0.1423, cr_loss=0.3988, over 34719.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2754, pruned_loss=0.06712, ctc_loss=0.1382, cr_loss=0.4115, over 6771205.23 frames. ], batch size: 97, lr: 6.00e-03, grad_scale: 16.0 2024-09-18 05:38:14,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354811.3333333333, ans=0.1 2024-09-18 05:38:24,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2024-09-18 05:38:37,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=354858.0, ans=0.125 2024-09-18 05:38:43,759 INFO [train.py:1198] (1/2) Epoch 20, batch 2400, loss[loss=0.2182, simple_loss=0.2656, pruned_loss=0.06422, ctc_loss=0.1324, cr_loss=0.398, over 34591.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2757, pruned_loss=0.0672, ctc_loss=0.1385, cr_loss=0.4119, over 6776032.69 frames. ], batch size: 89, lr: 6.00e-03, grad_scale: 32.0 2024-09-18 05:38:45,362 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.870e+02 2.460e+02 2.835e+02 3.521e+02 5.791e+02, threshold=5.670e+02, percent-clipped=0.0 2024-09-18 05:38:53,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=354904.6666666667, ans=0.125 2024-09-18 05:39:00,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=354951.3333333333, ans=0.05 2024-09-18 05:39:34,578 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.68 vs. limit=22.5 2024-09-18 05:39:40,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=355044.6666666667, ans=0.1 2024-09-18 05:39:42,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.45 vs. limit=22.5 2024-09-18 05:40:00,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=355091.3333333333, ans=0.0 2024-09-18 05:40:06,636 INFO [train.py:1198] (1/2) Epoch 20, batch 2450, loss[loss=0.2398, simple_loss=0.2847, pruned_loss=0.07416, ctc_loss=0.1492, cr_loss=0.4193, over 34432.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2767, pruned_loss=0.06753, ctc_loss=0.1392, cr_loss=0.4133, over 6749117.49 frames. ], batch size: 95, lr: 6.00e-03, grad_scale: 32.0 2024-09-18 05:40:42,190 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2024-09-18 05:40:47,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.71 vs. limit=6.0 2024-09-18 05:41:06,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=355278.0, ans=0.1 2024-09-18 05:41:19,658 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:41:25,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2024-09-18 05:41:30,846 INFO [train.py:1198] (1/2) Epoch 20, batch 2500, loss[loss=0.2277, simple_loss=0.2851, pruned_loss=0.06389, ctc_loss=0.1325, cr_loss=0.4007, over 34455.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.2768, pruned_loss=0.06763, ctc_loss=0.1393, cr_loss=0.4139, over 6760941.35 frames. ], batch size: 100, lr: 6.00e-03, grad_scale: 32.0 2024-09-18 05:41:32,550 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.361e+02 2.674e+02 3.410e+02 6.220e+02, threshold=5.348e+02, percent-clipped=3.0 2024-09-18 05:42:21,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=355511.3333333333, ans=0.125 2024-09-18 05:42:37,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=355558.0, ans=0.125 2024-09-18 05:42:55,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.96 vs. limit=22.5 2024-09-18 05:42:55,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.80 vs. limit=22.5 2024-09-18 05:42:55,893 INFO [train.py:1198] (1/2) Epoch 20, batch 2550, loss[loss=0.2081, simple_loss=0.2525, pruned_loss=0.06153, ctc_loss=0.127, cr_loss=0.3795, over 34182.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.2769, pruned_loss=0.06771, ctc_loss=0.1393, cr_loss=0.4139, over 6764515.79 frames. ], batch size: 78, lr: 5.99e-03, grad_scale: 32.0 2024-09-18 05:42:56,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=12.0 2024-09-18 05:43:04,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=355604.6666666667, ans=0.0 2024-09-18 05:43:15,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=355651.3333333333, ans=0.1 2024-09-18 05:44:16,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=15.0 2024-09-18 05:44:18,681 INFO [train.py:1198] (1/2) Epoch 20, batch 2600, loss[loss=0.2208, simple_loss=0.2708, pruned_loss=0.0637, ctc_loss=0.1337, cr_loss=0.4172, over 34340.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2774, pruned_loss=0.0678, ctc_loss=0.1396, cr_loss=0.4149, over 6760745.48 frames. ], batch size: 91, lr: 5.99e-03, grad_scale: 32.0 2024-09-18 05:44:20,305 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.045e+02 2.464e+02 3.176e+02 4.067e+02 8.650e+02, threshold=6.352e+02, percent-clipped=11.0 2024-09-18 05:44:35,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=355884.6666666667, ans=0.0 2024-09-18 05:45:29,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.29 vs. limit=15.0 2024-09-18 05:45:30,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=356024.6666666667, ans=0.025 2024-09-18 05:45:44,261 INFO [train.py:1198] (1/2) Epoch 20, batch 2650, loss[loss=0.252, simple_loss=0.3006, pruned_loss=0.07653, ctc_loss=0.1588, cr_loss=0.467, over 34224.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2776, pruned_loss=0.06781, ctc_loss=0.1397, cr_loss=0.4154, over 6769300.05 frames. ], batch size: 117, lr: 5.99e-03, grad_scale: 32.0 2024-09-18 05:45:47,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.75 vs. limit=22.5 2024-09-18 05:45:51,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=356071.3333333333, ans=0.125 2024-09-18 05:46:01,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=356118.0, ans=0.025 2024-09-18 05:46:28,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=356164.6666666667, ans=0.0 2024-09-18 05:46:30,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.78 vs. limit=15.0 2024-09-18 05:46:36,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=356211.3333333333, ans=0.0 2024-09-18 05:46:41,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356211.3333333333, ans=0.1 2024-09-18 05:46:58,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.35 vs. limit=6.0 2024-09-18 05:47:06,096 INFO [train.py:1198] (1/2) Epoch 20, batch 2700, loss[loss=0.2578, simple_loss=0.303, pruned_loss=0.08044, ctc_loss=0.1657, cr_loss=0.462, over 34611.00 frames. ], tot_loss[loss=0.2289, simple_loss=0.2777, pruned_loss=0.0678, ctc_loss=0.1397, cr_loss=0.4155, over 6764409.73 frames. ], batch size: 102, lr: 5.99e-03, grad_scale: 32.0 2024-09-18 05:47:07,674 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.358e+02 2.704e+02 3.402e+02 6.037e+02, threshold=5.407e+02, percent-clipped=0.0 2024-09-18 05:47:11,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=356304.6666666667, ans=0.1 2024-09-18 05:47:35,056 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.53 vs. limit=15.0 2024-09-18 05:47:37,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=356398.0, ans=0.125 2024-09-18 05:47:41,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2024-09-18 05:47:41,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=15.0 2024-09-18 05:47:49,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=356398.0, ans=0.125 2024-09-18 05:48:11,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=356444.6666666667, ans=0.0 2024-09-18 05:48:31,017 INFO [train.py:1198] (1/2) Epoch 20, batch 2750, loss[loss=0.2329, simple_loss=0.2765, pruned_loss=0.07198, ctc_loss=0.143, cr_loss=0.4179, over 34659.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2763, pruned_loss=0.06724, ctc_loss=0.1387, cr_loss=0.4135, over 6762089.93 frames. ], batch size: 88, lr: 5.99e-03, grad_scale: 32.0 2024-09-18 05:48:46,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=356584.6666666667, ans=0.0 2024-09-18 05:48:48,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=356584.6666666667, ans=0.0 2024-09-18 05:48:56,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=356584.6666666667, ans=0.1 2024-09-18 05:49:12,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=356631.3333333333, ans=0.0 2024-09-18 05:49:13,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.62 vs. limit=15.0 2024-09-18 05:49:26,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=356678.0, ans=0.09899494936611666 2024-09-18 05:49:55,929 INFO [train.py:1198] (1/2) Epoch 20, batch 2800, loss[loss=0.2583, simple_loss=0.2966, pruned_loss=0.08371, ctc_loss=0.1725, cr_loss=0.4539, over 23891.00 frames. ], tot_loss[loss=0.2284, simple_loss=0.277, pruned_loss=0.06764, ctc_loss=0.1394, cr_loss=0.4148, over 6741013.93 frames. ], batch size: 244, lr: 5.98e-03, grad_scale: 32.0 2024-09-18 05:49:57,556 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.836e+02 2.467e+02 2.763e+02 3.323e+02 7.754e+02, threshold=5.526e+02, percent-clipped=3.0 2024-09-18 05:49:59,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=356771.3333333333, ans=0.0 2024-09-18 05:49:59,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=356771.3333333333, ans=0.125 2024-09-18 05:50:27,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=356864.6666666667, ans=0.125 2024-09-18 05:50:35,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=356864.6666666667, ans=0.125 2024-09-18 05:50:37,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=356864.6666666667, ans=0.125 2024-09-18 05:50:40,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.82 vs. limit=15.0 2024-09-18 05:50:50,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=356911.3333333333, ans=0.0 2024-09-18 05:51:00,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=356958.0, ans=0.125 2024-09-18 05:51:04,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=356958.0, ans=0.025 2024-09-18 05:51:18,183 INFO [train.py:1198] (1/2) Epoch 20, batch 2850, loss[loss=0.2181, simple_loss=0.2656, pruned_loss=0.06412, ctc_loss=0.1337, cr_loss=0.3929, over 34494.00 frames. ], tot_loss[loss=0.2293, simple_loss=0.2776, pruned_loss=0.06812, ctc_loss=0.1403, cr_loss=0.4159, over 6725030.79 frames. ], batch size: 90, lr: 5.98e-03, grad_scale: 32.0 2024-09-18 05:51:52,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=357098.0, ans=0.0 2024-09-18 05:51:57,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=357098.0, ans=0.125 2024-09-18 05:52:24,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.77 vs. limit=22.5 2024-09-18 05:52:36,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=357191.3333333333, ans=0.0 2024-09-18 05:52:38,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=357191.3333333333, ans=0.0 2024-09-18 05:52:42,859 INFO [train.py:1198] (1/2) Epoch 20, batch 2900, loss[loss=0.2294, simple_loss=0.2786, pruned_loss=0.06778, ctc_loss=0.1404, cr_loss=0.4115, over 34555.00 frames. ], tot_loss[loss=0.2303, simple_loss=0.2788, pruned_loss=0.06847, ctc_loss=0.1409, cr_loss=0.418, over 6755328.05 frames. ], batch size: 94, lr: 5.98e-03, grad_scale: 32.0 2024-09-18 05:52:43,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=357238.0, ans=0.2 2024-09-18 05:52:43,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.62 vs. limit=15.0 2024-09-18 05:52:44,386 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.061e+02 2.445e+02 2.925e+02 3.475e+02 8.008e+02, threshold=5.849e+02, percent-clipped=3.0 2024-09-18 05:52:44,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=357238.0, ans=0.025 2024-09-18 05:52:46,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.98 vs. limit=15.0 2024-09-18 05:53:01,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=357284.6666666667, ans=0.125 2024-09-18 05:53:03,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=357284.6666666667, ans=0.0 2024-09-18 05:53:10,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=357284.6666666667, ans=0.0 2024-09-18 05:53:23,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=357331.3333333333, ans=0.125 2024-09-18 05:53:30,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=357331.3333333333, ans=0.125 2024-09-18 05:53:33,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=357378.0, ans=0.0 2024-09-18 05:53:38,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=357378.0, ans=0.5 2024-09-18 05:53:43,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2024-09-18 05:53:48,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357378.0, ans=0.1 2024-09-18 05:53:58,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=357424.6666666667, ans=0.07 2024-09-18 05:54:08,092 INFO [train.py:1198] (1/2) Epoch 20, batch 2950, loss[loss=0.2145, simple_loss=0.2608, pruned_loss=0.06323, ctc_loss=0.1302, cr_loss=0.3919, over 34636.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2773, pruned_loss=0.06785, ctc_loss=0.1397, cr_loss=0.4151, over 6749973.69 frames. ], batch size: 88, lr: 5.98e-03, grad_scale: 32.0 2024-09-18 05:54:08,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.69 vs. limit=15.0 2024-09-18 05:54:10,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=357471.3333333333, ans=0.0 2024-09-18 05:54:15,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.61 vs. limit=22.5 2024-09-18 05:54:18,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=357471.3333333333, ans=0.125 2024-09-18 05:54:21,477 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:54:37,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=357518.0, ans=0.0 2024-09-18 05:55:02,882 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 05:55:14,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=357658.0, ans=0.0 2024-09-18 05:55:32,562 INFO [train.py:1198] (1/2) Epoch 20, batch 3000, loss[loss=0.2344, simple_loss=0.2816, pruned_loss=0.07113, ctc_loss=0.1406, cr_loss=0.4217, over 34497.00 frames. ], tot_loss[loss=0.2288, simple_loss=0.2774, pruned_loss=0.06784, ctc_loss=0.1397, cr_loss=0.4151, over 6748731.76 frames. ], batch size: 94, lr: 5.98e-03, grad_scale: 32.0 2024-09-18 05:55:32,563 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 05:55:49,481 INFO [train.py:1230] (1/2) Epoch 20, validation: loss=0.1487, simple_loss=0.2455, pruned_loss=0.02183, ctc_loss=0.04103, cr_loss=1.832e-14, over 944034.00 frames. 2024-09-18 05:55:49,481 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 05:55:51,099 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.474e+02 2.887e+02 3.651e+02 8.088e+02, threshold=5.775e+02, percent-clipped=9.0 2024-09-18 05:56:12,017 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2024-09-18 05:56:27,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=357798.0, ans=0.125 2024-09-18 05:56:46,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.19 vs. limit=15.0 2024-09-18 05:57:02,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=357891.3333333333, ans=0.125 2024-09-18 05:57:07,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.81 vs. limit=15.0 2024-09-18 05:57:13,359 INFO [train.py:1198] (1/2) Epoch 20, batch 3050, loss[loss=0.2178, simple_loss=0.2635, pruned_loss=0.06459, ctc_loss=0.1338, cr_loss=0.4037, over 34570.00 frames. ], tot_loss[loss=0.2295, simple_loss=0.2782, pruned_loss=0.06805, ctc_loss=0.1401, cr_loss=0.4158, over 6743311.87 frames. ], batch size: 89, lr: 5.98e-03, grad_scale: 32.0 2024-09-18 05:57:14,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.16 vs. limit=15.0 2024-09-18 05:57:25,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=357938.0, ans=0.2 2024-09-18 05:57:28,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357984.6666666667, ans=0.1 2024-09-18 05:57:38,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2024-09-18 05:57:47,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=358031.3333333333, ans=0.05 2024-09-18 05:57:53,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=358031.3333333333, ans=0.125 2024-09-18 05:57:57,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=358031.3333333333, ans=0.0 2024-09-18 05:58:17,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=358124.6666666667, ans=0.125 2024-09-18 05:58:33,744 INFO [train.py:1198] (1/2) Epoch 20, batch 3100, loss[loss=0.2427, simple_loss=0.2931, pruned_loss=0.07226, ctc_loss=0.1502, cr_loss=0.4441, over 34223.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2778, pruned_loss=0.06792, ctc_loss=0.1399, cr_loss=0.4154, over 6743463.00 frames. ], batch size: 117, lr: 5.97e-03, grad_scale: 32.0 2024-09-18 05:58:35,308 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 2.398e+02 2.661e+02 3.188e+02 5.619e+02, threshold=5.321e+02, percent-clipped=0.0 2024-09-18 05:58:45,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=358171.3333333333, ans=0.0 2024-09-18 05:58:59,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=358218.0, ans=0.025 2024-09-18 05:59:02,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=358218.0, ans=0.0 2024-09-18 05:59:02,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=358218.0, ans=0.125 2024-09-18 05:59:16,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=2.95 vs. limit=15.0 2024-09-18 05:59:31,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.53 vs. limit=5.0 2024-09-18 05:59:54,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.90 vs. limit=10.0 2024-09-18 05:59:54,958 INFO [train.py:1198] (1/2) Epoch 20, batch 3150, loss[loss=0.2435, simple_loss=0.2989, pruned_loss=0.07059, ctc_loss=0.1482, cr_loss=0.4311, over 33799.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.2779, pruned_loss=0.06793, ctc_loss=0.1399, cr_loss=0.4155, over 6747906.52 frames. ], batch size: 122, lr: 5.97e-03, grad_scale: 32.0 2024-09-18 06:00:04,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=358404.6666666667, ans=0.125 2024-09-18 06:00:09,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=358451.3333333333, ans=0.05 2024-09-18 06:00:10,342 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=9.76 vs. limit=15.0 2024-09-18 06:00:10,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.86 vs. limit=22.5 2024-09-18 06:00:19,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=358451.3333333333, ans=0.2 2024-09-18 06:00:26,135 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:00:42,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=358544.6666666667, ans=0.0 2024-09-18 06:00:42,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=358544.6666666667, ans=0.0 2024-09-18 06:00:51,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.52 vs. limit=10.0 2024-09-18 06:01:17,498 INFO [train.py:1198] (1/2) Epoch 20, batch 3200, loss[loss=0.2233, simple_loss=0.274, pruned_loss=0.06501, ctc_loss=0.1322, cr_loss=0.403, over 34532.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2773, pruned_loss=0.06775, ctc_loss=0.1394, cr_loss=0.4145, over 6761803.44 frames. ], batch size: 94, lr: 5.97e-03, grad_scale: 32.0 2024-09-18 06:01:19,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.142e+02 2.574e+02 3.153e+02 4.091e+02 6.416e+02, threshold=6.306e+02, percent-clipped=7.0 2024-09-18 06:01:26,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=358638.0, ans=0.125 2024-09-18 06:01:50,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=358731.3333333333, ans=0.0 2024-09-18 06:02:07,074 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.74 vs. limit=22.5 2024-09-18 06:02:11,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=358778.0, ans=10.0 2024-09-18 06:02:38,728 INFO [train.py:1198] (1/2) Epoch 20, batch 3250, loss[loss=0.2513, simple_loss=0.2994, pruned_loss=0.07736, ctc_loss=0.1541, cr_loss=0.445, over 34651.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2778, pruned_loss=0.06794, ctc_loss=0.1396, cr_loss=0.4153, over 6770851.11 frames. ], batch size: 98, lr: 5.97e-03, grad_scale: 32.0 2024-09-18 06:02:53,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=358918.0, ans=0.04949747468305833 2024-09-18 06:02:55,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.19 vs. limit=22.5 2024-09-18 06:03:04,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=358918.0, ans=0.125 2024-09-18 06:04:00,962 INFO [train.py:1198] (1/2) Epoch 20, batch 3300, loss[loss=0.226, simple_loss=0.2779, pruned_loss=0.06561, ctc_loss=0.1375, cr_loss=0.3836, over 32860.00 frames. ], tot_loss[loss=0.2278, simple_loss=0.2763, pruned_loss=0.06748, ctc_loss=0.1388, cr_loss=0.4133, over 6769147.11 frames. ], batch size: 130, lr: 5.97e-03, grad_scale: 32.0 2024-09-18 06:04:01,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=359104.6666666667, ans=0.04949747468305833 2024-09-18 06:04:02,663 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.020e+02 2.533e+02 3.029e+02 3.775e+02 5.901e+02, threshold=6.058e+02, percent-clipped=0.0 2024-09-18 06:04:03,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=359104.6666666667, ans=0.025 2024-09-18 06:04:09,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=359104.6666666667, ans=0.5 2024-09-18 06:04:14,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=359104.6666666667, ans=0.2 2024-09-18 06:04:16,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=359151.3333333333, ans=0.125 2024-09-18 06:04:20,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=359151.3333333333, ans=0.1 2024-09-18 06:04:22,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=359151.3333333333, ans=0.125 2024-09-18 06:04:46,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=359198.0, ans=0.1 2024-09-18 06:04:57,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=359244.6666666667, ans=0.125 2024-09-18 06:05:12,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-09-18 06:05:21,742 INFO [train.py:1198] (1/2) Epoch 20, batch 3350, loss[loss=0.2307, simple_loss=0.2834, pruned_loss=0.06682, ctc_loss=0.1406, cr_loss=0.4041, over 33767.00 frames. ], tot_loss[loss=0.2285, simple_loss=0.277, pruned_loss=0.06779, ctc_loss=0.1395, cr_loss=0.4145, over 6744153.81 frames. ], batch size: 122, lr: 5.96e-03, grad_scale: 32.0 2024-09-18 06:06:00,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=359431.3333333333, ans=0.0 2024-09-18 06:06:03,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=359431.3333333333, ans=0.0 2024-09-18 06:06:16,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=359478.0, ans=0.0 2024-09-18 06:06:29,994 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=15.0 2024-09-18 06:06:37,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=359524.6666666667, ans=0.95 2024-09-18 06:06:43,785 INFO [train.py:1198] (1/2) Epoch 20, batch 3400, loss[loss=0.2015, simple_loss=0.2477, pruned_loss=0.05791, ctc_loss=0.1211, cr_loss=0.3783, over 34152.00 frames. ], tot_loss[loss=0.2287, simple_loss=0.277, pruned_loss=0.06793, ctc_loss=0.1397, cr_loss=0.4146, over 6733714.28 frames. ], batch size: 78, lr: 5.96e-03, grad_scale: 32.0 2024-09-18 06:06:44,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=359571.3333333333, ans=0.5 2024-09-18 06:06:45,292 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.987e+02 2.428e+02 2.845e+02 3.695e+02 6.823e+02, threshold=5.691e+02, percent-clipped=1.0 2024-09-18 06:06:56,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=359571.3333333333, ans=0.125 2024-09-18 06:07:02,588 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.27 vs. limit=15.0 2024-09-18 06:07:22,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.57 vs. limit=15.0 2024-09-18 06:07:29,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=359664.6666666667, ans=0.2 2024-09-18 06:07:37,596 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:07:38,001 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.38 vs. limit=12.0 2024-09-18 06:07:42,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=359711.3333333333, ans=0.1 2024-09-18 06:07:49,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.21 vs. limit=10.0 2024-09-18 06:08:04,255 INFO [train.py:1198] (1/2) Epoch 20, batch 3450, loss[loss=0.2426, simple_loss=0.2923, pruned_loss=0.07225, ctc_loss=0.1502, cr_loss=0.4591, over 33163.00 frames. ], tot_loss[loss=0.2292, simple_loss=0.2775, pruned_loss=0.06808, ctc_loss=0.1401, cr_loss=0.4154, over 6746269.09 frames. ], batch size: 130, lr: 5.96e-03, grad_scale: 32.0 2024-09-18 06:08:15,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=359804.6666666667, ans=0.1 2024-09-18 06:08:15,413 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:08:44,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=359898.0, ans=0.125 2024-09-18 06:08:47,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=359898.0, ans=0.125 2024-09-18 06:08:52,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=359944.6666666667, ans=0.125 2024-09-18 06:09:00,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=359944.6666666667, ans=0.1 2024-09-18 06:09:00,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=359944.6666666667, ans=0.2 2024-09-18 06:09:02,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=359944.6666666667, ans=0.0 2024-09-18 06:09:05,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=359944.6666666667, ans=0.0 2024-09-18 06:09:08,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=359991.3333333333, ans=0.125 2024-09-18 06:09:21,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=359991.3333333333, ans=0.125 2024-09-18 06:09:26,083 INFO [train.py:1198] (1/2) Epoch 20, batch 3500, loss[loss=0.1852, simple_loss=0.2397, pruned_loss=0.04796, ctc_loss=0.1054, cr_loss=0.3425, over 34447.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.2766, pruned_loss=0.06765, ctc_loss=0.1392, cr_loss=0.4136, over 6747631.32 frames. ], batch size: 85, lr: 5.96e-03, grad_scale: 32.0 2024-09-18 06:09:27,699 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.432e+02 2.974e+02 3.501e+02 5.542e+02, threshold=5.948e+02, percent-clipped=1.0 2024-09-18 06:09:41,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=360084.6666666667, ans=0.04949747468305833 2024-09-18 06:09:42,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=360084.6666666667, ans=0.125 2024-09-18 06:09:44,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=360084.6666666667, ans=0.1 2024-09-18 06:09:50,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=360084.6666666667, ans=10.0 2024-09-18 06:10:40,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=360224.6666666667, ans=0.2 2024-09-18 06:10:47,252 INFO [train.py:1198] (1/2) Epoch 20, batch 3550, loss[loss=0.2394, simple_loss=0.292, pruned_loss=0.07012, ctc_loss=0.1475, cr_loss=0.4255, over 34359.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2766, pruned_loss=0.06752, ctc_loss=0.139, cr_loss=0.4135, over 6757223.18 frames. ], batch size: 103, lr: 5.96e-03, grad_scale: 16.0 2024-09-18 06:11:05,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=360318.0, ans=0.2 2024-09-18 06:11:24,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=360364.6666666667, ans=0.0 2024-09-18 06:11:30,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=360364.6666666667, ans=0.125 2024-09-18 06:11:37,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=360411.3333333333, ans=0.125 2024-09-18 06:11:37,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=360411.3333333333, ans=0.125 2024-09-18 06:11:37,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=360411.3333333333, ans=0.125 2024-09-18 06:11:52,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=15.0 2024-09-18 06:11:52,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=360458.0, ans=0.0 2024-09-18 06:12:04,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=360458.0, ans=0.1 2024-09-18 06:12:07,309 INFO [train.py:1198] (1/2) Epoch 20, batch 3600, loss[loss=0.2214, simple_loss=0.2671, pruned_loss=0.06638, ctc_loss=0.1338, cr_loss=0.4037, over 34490.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.2769, pruned_loss=0.06756, ctc_loss=0.139, cr_loss=0.4139, over 6765668.88 frames. ], batch size: 90, lr: 5.95e-03, grad_scale: 32.0 2024-09-18 06:12:10,473 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.945e+02 2.515e+02 3.042e+02 3.863e+02 5.544e+02, threshold=6.085e+02, percent-clipped=0.0 2024-09-18 06:12:12,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=360504.6666666667, ans=0.1 2024-09-18 06:12:30,079 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.49 vs. limit=22.5 2024-09-18 06:12:37,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=360551.3333333333, ans=0.1 2024-09-18 06:12:39,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-18 06:12:46,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=360598.0, ans=0.125 2024-09-18 06:12:50,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=360598.0, ans=0.0 2024-09-18 06:13:06,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=360644.6666666667, ans=0.125 2024-09-18 06:13:23,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=360691.3333333333, ans=0.025 2024-09-18 06:13:28,357 INFO [train.py:1198] (1/2) Epoch 20, batch 3650, loss[loss=0.2394, simple_loss=0.2902, pruned_loss=0.07118, ctc_loss=0.1462, cr_loss=0.426, over 34480.00 frames. ], tot_loss[loss=0.2274, simple_loss=0.2761, pruned_loss=0.06722, ctc_loss=0.1384, cr_loss=0.4122, over 6768811.32 frames. ], batch size: 110, lr: 5.95e-03, grad_scale: 32.0 2024-09-18 06:13:43,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=360784.6666666667, ans=0.125 2024-09-18 06:13:56,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.03 vs. limit=15.0 2024-09-18 06:14:09,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=360831.3333333333, ans=0.125 2024-09-18 06:14:09,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=360831.3333333333, ans=0.2 2024-09-18 06:14:20,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=360878.0, ans=0.0 2024-09-18 06:14:25,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=360878.0, ans=0.125 2024-09-18 06:14:49,179 INFO [train.py:1198] (1/2) Epoch 20, batch 3700, loss[loss=0.2358, simple_loss=0.2893, pruned_loss=0.06801, ctc_loss=0.1438, cr_loss=0.4387, over 34641.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2762, pruned_loss=0.06701, ctc_loss=0.138, cr_loss=0.412, over 6783284.20 frames. ], batch size: 102, lr: 5.95e-03, grad_scale: 32.0 2024-09-18 06:14:52,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.462e+02 2.917e+02 4.016e+02 6.705e+02, threshold=5.834e+02, percent-clipped=3.0 2024-09-18 06:15:00,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=360971.3333333333, ans=0.125 2024-09-18 06:15:16,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=361018.0, ans=0.0 2024-09-18 06:15:45,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=361111.3333333333, ans=0.0 2024-09-18 06:15:53,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=361158.0, ans=0.2 2024-09-18 06:16:07,189 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.59 vs. limit=5.0 2024-09-18 06:16:11,008 INFO [train.py:1198] (1/2) Epoch 20, batch 3750, loss[loss=0.2428, simple_loss=0.2884, pruned_loss=0.07485, ctc_loss=0.1497, cr_loss=0.438, over 34295.00 frames. ], tot_loss[loss=0.2306, simple_loss=0.2795, pruned_loss=0.06838, ctc_loss=0.1407, cr_loss=0.4179, over 6785573.14 frames. ], batch size: 113, lr: 5.95e-03, grad_scale: 32.0 2024-09-18 06:16:27,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.47 vs. limit=15.0 2024-09-18 06:16:33,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361251.3333333333, ans=0.1 2024-09-18 06:16:37,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2024-09-18 06:16:41,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=361298.0, ans=0.125 2024-09-18 06:16:47,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2024-09-18 06:17:01,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=361344.6666666667, ans=0.1 2024-09-18 06:17:14,892 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=5.31 vs. limit=12.0 2024-09-18 06:17:32,016 INFO [train.py:1198] (1/2) Epoch 20, batch 3800, loss[loss=0.2758, simple_loss=0.3079, pruned_loss=0.09326, ctc_loss=0.1852, cr_loss=0.5048, over 29823.00 frames. ], tot_loss[loss=0.2343, simple_loss=0.2825, pruned_loss=0.07017, ctc_loss=0.1441, cr_loss=0.4236, over 6674782.56 frames. ], batch size: 176, lr: 5.95e-03, grad_scale: 16.0 2024-09-18 06:17:37,052 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.131e+02 2.328e+02 2.534e+02 2.809e+02 5.324e+02, threshold=5.069e+02, percent-clipped=0.0 2024-09-18 06:17:51,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=361484.6666666667, ans=0.125 2024-09-18 06:18:47,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=361624.6666666667, ans=22.5 2024-09-18 06:18:55,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=361671.3333333333, ans=0.0 2024-09-18 06:18:56,538 INFO [train.py:1198] (1/2) Epoch 20, batch 3850, loss[loss=0.2466, simple_loss=0.287, pruned_loss=0.07785, ctc_loss=0.169, cr_loss=0.4183, over 23497.00 frames. ], tot_loss[loss=0.2387, simple_loss=0.2852, pruned_loss=0.07266, ctc_loss=0.1492, cr_loss=0.4274, over 6252389.19 frames. ], batch size: 245, lr: 5.94e-03, grad_scale: 16.0 2024-09-18 06:19:23,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=361718.0, ans=0.125 2024-09-18 06:19:34,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.84 vs. limit=22.5 2024-09-18 06:20:28,519 INFO [train.py:1198] (1/2) Epoch 21, batch 0, loss[loss=0.2092, simple_loss=0.259, pruned_loss=0.05964, ctc_loss=0.1244, cr_loss=0.3812, over 34488.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.259, pruned_loss=0.05964, ctc_loss=0.1244, cr_loss=0.3812, over 34488.00 frames. ], batch size: 85, lr: 5.80e-03, grad_scale: 32.0 2024-09-18 06:20:28,520 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 06:20:45,296 INFO [train.py:1230] (1/2) Epoch 21, validation: loss=0.1493, simple_loss=0.2471, pruned_loss=0.02167, ctc_loss=0.04118, cr_loss=1.825e-14, over 944034.00 frames. 2024-09-18 06:20:45,296 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 06:21:21,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=361886.0, ans=0.125 2024-09-18 06:21:21,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=361886.0, ans=0.0 2024-09-18 06:21:31,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=361886.0, ans=0.0 2024-09-18 06:21:32,388 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.188e+02 2.653e+02 2.821e+02 3.206e+02 6.547e+02, threshold=5.641e+02, percent-clipped=2.0 2024-09-18 06:21:39,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=361932.6666666667, ans=0.125 2024-09-18 06:21:44,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=361932.6666666667, ans=0.0 2024-09-18 06:21:57,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=361979.3333333333, ans=0.125 2024-09-18 06:22:04,312 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.16 vs. limit=15.0 2024-09-18 06:22:09,869 INFO [train.py:1198] (1/2) Epoch 21, batch 50, loss[loss=0.2085, simple_loss=0.2542, pruned_loss=0.0611, ctc_loss=0.1265, cr_loss=0.3831, over 34455.00 frames. ], tot_loss[loss=0.2294, simple_loss=0.2775, pruned_loss=0.06826, ctc_loss=0.1407, cr_loss=0.4155, over 1480718.79 frames. ], batch size: 82, lr: 5.80e-03, grad_scale: 32.0 2024-09-18 06:22:14,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.91 vs. limit=15.0 2024-09-18 06:22:20,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=362026.0, ans=0.0 2024-09-18 06:23:01,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=362166.0, ans=0.125 2024-09-18 06:23:03,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=362166.0, ans=0.125 2024-09-18 06:23:16,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=362212.6666666667, ans=0.2 2024-09-18 06:23:18,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=362212.6666666667, ans=0.125 2024-09-18 06:23:32,688 INFO [train.py:1198] (1/2) Epoch 21, batch 100, loss[loss=0.2184, simple_loss=0.2697, pruned_loss=0.06237, ctc_loss=0.1325, cr_loss=0.395, over 34578.00 frames. ], tot_loss[loss=0.232, simple_loss=0.2804, pruned_loss=0.06913, ctc_loss=0.1424, cr_loss=0.4204, over 2628242.46 frames. ], batch size: 89, lr: 5.79e-03, grad_scale: 32.0 2024-09-18 06:23:39,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=362259.3333333333, ans=0.0 2024-09-18 06:23:56,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=362306.0, ans=0.0 2024-09-18 06:23:57,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=362306.0, ans=0.125 2024-09-18 06:24:16,983 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.024e+02 2.506e+02 3.003e+02 3.861e+02 6.572e+02, threshold=6.007e+02, percent-clipped=5.0 2024-09-18 06:24:25,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.93 vs. limit=12.0 2024-09-18 06:24:32,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=362399.3333333333, ans=0.125 2024-09-18 06:24:58,551 INFO [train.py:1198] (1/2) Epoch 21, batch 150, loss[loss=0.2088, simple_loss=0.2523, pruned_loss=0.062, ctc_loss=0.1253, cr_loss=0.4028, over 34453.00 frames. ], tot_loss[loss=0.2279, simple_loss=0.2769, pruned_loss=0.06732, ctc_loss=0.139, cr_loss=0.4137, over 3555911.39 frames. ], batch size: 82, lr: 5.79e-03, grad_scale: 32.0 2024-09-18 06:25:05,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=362492.6666666667, ans=0.0 2024-09-18 06:26:19,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=362726.0, ans=0.0 2024-09-18 06:26:21,132 INFO [train.py:1198] (1/2) Epoch 21, batch 200, loss[loss=0.2378, simple_loss=0.2826, pruned_loss=0.07314, ctc_loss=0.1456, cr_loss=0.4404, over 32243.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2757, pruned_loss=0.06685, ctc_loss=0.138, cr_loss=0.4129, over 4271695.50 frames. ], batch size: 145, lr: 5.79e-03, grad_scale: 32.0 2024-09-18 06:26:23,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=362726.0, ans=0.125 2024-09-18 06:26:28,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=362726.0, ans=0.125 2024-09-18 06:26:36,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=362772.6666666667, ans=0.125 2024-09-18 06:26:37,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=362772.6666666667, ans=0.1 2024-09-18 06:26:52,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=362819.3333333333, ans=0.2 2024-09-18 06:26:59,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=362819.3333333333, ans=0.125 2024-09-18 06:27:01,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=362819.3333333333, ans=0.0 2024-09-18 06:27:05,646 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.454e+02 3.027e+02 4.043e+02 7.669e+02, threshold=6.054e+02, percent-clipped=3.0 2024-09-18 06:27:35,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=362912.6666666667, ans=0.125 2024-09-18 06:27:43,656 INFO [train.py:1198] (1/2) Epoch 21, batch 250, loss[loss=0.2469, simple_loss=0.2906, pruned_loss=0.07743, ctc_loss=0.153, cr_loss=0.4422, over 34217.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2761, pruned_loss=0.06708, ctc_loss=0.1383, cr_loss=0.4136, over 4834347.41 frames. ], batch size: 117, lr: 5.79e-03, grad_scale: 32.0 2024-09-18 06:27:49,136 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:28:12,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=363006.0, ans=0.1 2024-09-18 06:28:29,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=363052.6666666667, ans=0.1 2024-09-18 06:28:40,253 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-09-18 06:28:43,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=363099.3333333333, ans=0.0 2024-09-18 06:29:02,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=363146.0, ans=0.125 2024-09-18 06:29:04,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=363146.0, ans=0.125 2024-09-18 06:29:10,761 INFO [train.py:1198] (1/2) Epoch 21, batch 300, loss[loss=0.2491, simple_loss=0.295, pruned_loss=0.07687, ctc_loss=0.1544, cr_loss=0.4639, over 34364.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.2756, pruned_loss=0.06686, ctc_loss=0.138, cr_loss=0.4124, over 5262136.80 frames. ], batch size: 107, lr: 5.79e-03, grad_scale: 32.0 2024-09-18 06:29:13,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2024-09-18 06:29:24,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=363192.6666666667, ans=0.2 2024-09-18 06:29:29,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=363239.3333333333, ans=0.1 2024-09-18 06:29:29,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=363239.3333333333, ans=0.025 2024-09-18 06:29:46,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.02 vs. limit=15.0 2024-09-18 06:29:52,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=363286.0, ans=0.0 2024-09-18 06:29:55,021 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.977e+02 2.371e+02 2.605e+02 3.186e+02 5.657e+02, threshold=5.211e+02, percent-clipped=0.0 2024-09-18 06:29:56,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=363286.0, ans=0.125 2024-09-18 06:30:32,818 INFO [train.py:1198] (1/2) Epoch 21, batch 350, loss[loss=0.1992, simple_loss=0.2459, pruned_loss=0.05714, ctc_loss=0.119, cr_loss=0.3598, over 34303.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.276, pruned_loss=0.06708, ctc_loss=0.1384, cr_loss=0.4128, over 5595032.02 frames. ], batch size: 83, lr: 5.78e-03, grad_scale: 32.0 2024-09-18 06:30:48,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.98 vs. limit=22.5 2024-09-18 06:31:42,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=363612.6666666667, ans=0.125 2024-09-18 06:31:44,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=363612.6666666667, ans=0.125 2024-09-18 06:31:55,398 INFO [train.py:1198] (1/2) Epoch 21, batch 400, loss[loss=0.229, simple_loss=0.2793, pruned_loss=0.06743, ctc_loss=0.1395, cr_loss=0.4011, over 34437.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.2758, pruned_loss=0.06677, ctc_loss=0.1378, cr_loss=0.4123, over 5863363.05 frames. ], batch size: 95, lr: 5.78e-03, grad_scale: 32.0 2024-09-18 06:31:55,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=363659.3333333333, ans=0.1 2024-09-18 06:32:21,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=363706.0, ans=0.125 2024-09-18 06:32:30,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=363706.0, ans=0.0 2024-09-18 06:32:36,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=363752.6666666667, ans=0.0 2024-09-18 06:32:44,714 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.366e+02 2.794e+02 3.641e+02 6.348e+02, threshold=5.588e+02, percent-clipped=3.0 2024-09-18 06:32:45,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.77 vs. limit=22.5 2024-09-18 06:33:01,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=363799.3333333333, ans=0.1 2024-09-18 06:33:08,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=363846.0, ans=0.0 2024-09-18 06:33:15,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=363846.0, ans=0.025 2024-09-18 06:33:16,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=363846.0, ans=0.125 2024-09-18 06:33:16,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=363846.0, ans=0.0 2024-09-18 06:33:17,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.17 vs. limit=22.5 2024-09-18 06:33:23,059 INFO [train.py:1198] (1/2) Epoch 21, batch 450, loss[loss=0.2429, simple_loss=0.2935, pruned_loss=0.07273, ctc_loss=0.1485, cr_loss=0.43, over 34706.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2762, pruned_loss=0.06685, ctc_loss=0.138, cr_loss=0.4129, over 6052536.97 frames. ], batch size: 97, lr: 5.78e-03, grad_scale: 32.0 2024-09-18 06:33:33,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=363892.6666666667, ans=0.0 2024-09-18 06:33:44,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.94 vs. limit=6.0 2024-09-18 06:34:41,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=364079.3333333333, ans=0.125 2024-09-18 06:34:46,247 INFO [train.py:1198] (1/2) Epoch 21, batch 500, loss[loss=0.2473, simple_loss=0.2948, pruned_loss=0.07539, ctc_loss=0.1539, cr_loss=0.4557, over 34448.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2751, pruned_loss=0.06642, ctc_loss=0.1371, cr_loss=0.4114, over 6218996.43 frames. ], batch size: 110, lr: 5.78e-03, grad_scale: 16.0 2024-09-18 06:35:12,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364172.6666666667, ans=0.1 2024-09-18 06:35:32,463 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.281e+02 2.841e+02 3.563e+02 5.182e+02, threshold=5.682e+02, percent-clipped=0.0 2024-09-18 06:35:49,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=364266.0, ans=0.125 2024-09-18 06:35:56,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.65 vs. limit=15.0 2024-09-18 06:35:59,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=364312.6666666667, ans=0.2 2024-09-18 06:36:12,939 INFO [train.py:1198] (1/2) Epoch 21, batch 550, loss[loss=0.231, simple_loss=0.2856, pruned_loss=0.06622, ctc_loss=0.1365, cr_loss=0.4187, over 33776.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2753, pruned_loss=0.06642, ctc_loss=0.1372, cr_loss=0.4116, over 6328667.31 frames. ], batch size: 122, lr: 5.78e-03, grad_scale: 16.0 2024-09-18 06:36:13,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=364359.3333333333, ans=0.025 2024-09-18 06:36:36,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=364406.0, ans=0.1 2024-09-18 06:36:41,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=364406.0, ans=0.0 2024-09-18 06:36:44,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=364452.6666666667, ans=0.0 2024-09-18 06:36:49,581 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:36:49,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=364452.6666666667, ans=0.1 2024-09-18 06:36:49,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=364452.6666666667, ans=0.025 2024-09-18 06:36:59,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=364452.6666666667, ans=0.0 2024-09-18 06:37:24,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=364546.0, ans=0.125 2024-09-18 06:37:34,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=364592.6666666667, ans=0.0 2024-09-18 06:37:34,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2024-09-18 06:37:35,358 INFO [train.py:1198] (1/2) Epoch 21, batch 600, loss[loss=0.2513, simple_loss=0.2951, pruned_loss=0.07839, ctc_loss=0.1617, cr_loss=0.4591, over 34199.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2756, pruned_loss=0.06658, ctc_loss=0.1374, cr_loss=0.4121, over 6430924.02 frames. ], batch size: 117, lr: 5.78e-03, grad_scale: 16.0 2024-09-18 06:37:38,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=364592.6666666667, ans=0.125 2024-09-18 06:37:55,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=364639.3333333333, ans=0.0 2024-09-18 06:37:55,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=364639.3333333333, ans=0.2 2024-09-18 06:38:02,957 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.53 vs. limit=15.0 2024-09-18 06:38:21,077 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.484e+02 2.963e+02 3.714e+02 1.084e+03, threshold=5.925e+02, percent-clipped=5.0 2024-09-18 06:38:34,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=364732.6666666667, ans=0.125 2024-09-18 06:38:47,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=364779.3333333333, ans=0.2 2024-09-18 06:38:55,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=364826.0, ans=0.1 2024-09-18 06:38:56,507 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=6.0 2024-09-18 06:38:57,146 INFO [train.py:1198] (1/2) Epoch 21, batch 650, loss[loss=0.2136, simple_loss=0.2636, pruned_loss=0.06119, ctc_loss=0.1275, cr_loss=0.3932, over 34558.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2748, pruned_loss=0.06619, ctc_loss=0.1367, cr_loss=0.4108, over 6521731.57 frames. ], batch size: 94, lr: 5.77e-03, grad_scale: 16.0 2024-09-18 06:38:57,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=364826.0, ans=0.125 2024-09-18 06:38:59,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=364826.0, ans=0.125 2024-09-18 06:40:04,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=364966.0, ans=0.0 2024-09-18 06:40:14,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=365012.6666666667, ans=0.125 2024-09-18 06:40:22,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=365059.3333333333, ans=0.125 2024-09-18 06:40:22,734 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.70 vs. limit=15.0 2024-09-18 06:40:23,522 INFO [train.py:1198] (1/2) Epoch 21, batch 700, loss[loss=0.2252, simple_loss=0.2717, pruned_loss=0.06731, ctc_loss=0.1372, cr_loss=0.415, over 34605.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2756, pruned_loss=0.06646, ctc_loss=0.1372, cr_loss=0.412, over 6579175.50 frames. ], batch size: 89, lr: 5.77e-03, grad_scale: 16.0 2024-09-18 06:40:31,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.18 vs. limit=22.5 2024-09-18 06:40:35,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=365059.3333333333, ans=0.125 2024-09-18 06:40:43,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=15.0 2024-09-18 06:40:54,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.12 vs. limit=15.0 2024-09-18 06:41:09,754 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.047e+02 2.629e+02 3.274e+02 4.540e+02 8.078e+02, threshold=6.548e+02, percent-clipped=10.0 2024-09-18 06:41:15,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=365199.3333333333, ans=0.125 2024-09-18 06:41:16,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=365199.3333333333, ans=0.125 2024-09-18 06:41:20,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=365199.3333333333, ans=0.2 2024-09-18 06:41:21,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=365199.3333333333, ans=0.125 2024-09-18 06:41:27,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=365199.3333333333, ans=22.5 2024-09-18 06:41:38,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=365246.0, ans=0.125 2024-09-18 06:41:38,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365246.0, ans=0.1 2024-09-18 06:41:44,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=365292.6666666667, ans=0.0 2024-09-18 06:41:46,072 INFO [train.py:1198] (1/2) Epoch 21, batch 750, loss[loss=0.2308, simple_loss=0.2793, pruned_loss=0.06879, ctc_loss=0.1406, cr_loss=0.4135, over 34414.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.275, pruned_loss=0.06623, ctc_loss=0.1368, cr_loss=0.4108, over 6624437.79 frames. ], batch size: 95, lr: 5.77e-03, grad_scale: 16.0 2024-09-18 06:41:56,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=22.5 2024-09-18 06:42:12,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=365339.3333333333, ans=0.05 2024-09-18 06:42:13,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=365339.3333333333, ans=0.125 2024-09-18 06:42:17,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365386.0, ans=0.1 2024-09-18 06:42:30,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=365386.0, ans=10.0 2024-09-18 06:42:31,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.73 vs. limit=15.0 2024-09-18 06:42:32,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=365386.0, ans=0.0 2024-09-18 06:43:04,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=365479.3333333333, ans=0.0 2024-09-18 06:43:08,002 INFO [train.py:1198] (1/2) Epoch 21, batch 800, loss[loss=0.2036, simple_loss=0.2559, pruned_loss=0.05606, ctc_loss=0.1196, cr_loss=0.3802, over 34480.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2747, pruned_loss=0.06617, ctc_loss=0.1367, cr_loss=0.4107, over 6660461.12 frames. ], batch size: 85, lr: 5.77e-03, grad_scale: 32.0 2024-09-18 06:43:28,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=365572.6666666667, ans=0.0 2024-09-18 06:43:36,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365572.6666666667, ans=0.1 2024-09-18 06:43:52,750 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.57 vs. limit=6.0 2024-09-18 06:43:58,267 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.373e+02 2.868e+02 3.433e+02 5.046e+02, threshold=5.736e+02, percent-clipped=0.0 2024-09-18 06:44:08,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=365666.0, ans=0.125 2024-09-18 06:44:18,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=365712.6666666667, ans=0.125 2024-09-18 06:44:34,156 INFO [train.py:1198] (1/2) Epoch 21, batch 850, loss[loss=0.2376, simple_loss=0.2868, pruned_loss=0.07089, ctc_loss=0.1464, cr_loss=0.432, over 34389.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2746, pruned_loss=0.06606, ctc_loss=0.1364, cr_loss=0.4107, over 6693565.77 frames. ], batch size: 103, lr: 5.77e-03, grad_scale: 32.0 2024-09-18 06:44:58,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=365806.0, ans=0.125 2024-09-18 06:45:22,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365899.3333333333, ans=0.1 2024-09-18 06:45:28,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365899.3333333333, ans=0.1 2024-09-18 06:45:56,634 INFO [train.py:1198] (1/2) Epoch 21, batch 900, loss[loss=0.1924, simple_loss=0.2455, pruned_loss=0.05162, ctc_loss=0.1109, cr_loss=0.3458, over 34515.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2749, pruned_loss=0.06628, ctc_loss=0.1367, cr_loss=0.4112, over 6699412.41 frames. ], batch size: 85, lr: 5.76e-03, grad_scale: 32.0 2024-09-18 06:46:15,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.45 vs. limit=15.0 2024-09-18 06:46:18,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=366039.3333333333, ans=0.125 2024-09-18 06:46:37,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=366086.0, ans=0.1 2024-09-18 06:46:42,425 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.525e+02 3.019e+02 3.627e+02 7.841e+02, threshold=6.037e+02, percent-clipped=3.0 2024-09-18 06:47:15,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=366179.3333333333, ans=0.0 2024-09-18 06:47:18,709 INFO [train.py:1198] (1/2) Epoch 21, batch 950, loss[loss=0.2122, simple_loss=0.2604, pruned_loss=0.06143, ctc_loss=0.1258, cr_loss=0.401, over 34704.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2757, pruned_loss=0.06664, ctc_loss=0.1374, cr_loss=0.4128, over 6703848.26 frames. ], batch size: 87, lr: 5.76e-03, grad_scale: 32.0 2024-09-18 06:47:22,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=366226.0, ans=0.125 2024-09-18 06:47:29,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=366226.0, ans=0.0 2024-09-18 06:47:41,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.17 vs. limit=10.0 2024-09-18 06:47:46,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=366272.6666666667, ans=0.0 2024-09-18 06:47:53,390 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.78 vs. limit=22.5 2024-09-18 06:48:08,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=366319.3333333333, ans=0.125 2024-09-18 06:48:33,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2024-09-18 06:48:43,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.39 vs. limit=12.0 2024-09-18 06:48:45,703 INFO [train.py:1198] (1/2) Epoch 21, batch 1000, loss[loss=0.2264, simple_loss=0.2686, pruned_loss=0.0691, ctc_loss=0.1439, cr_loss=0.4292, over 34488.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2764, pruned_loss=0.06698, ctc_loss=0.1381, cr_loss=0.4143, over 6695420.93 frames. ], batch size: 90, lr: 5.76e-03, grad_scale: 32.0 2024-09-18 06:49:02,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=366506.0, ans=10.0 2024-09-18 06:49:20,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=366552.6666666667, ans=0.0 2024-09-18 06:49:32,017 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.305e+02 2.678e+02 3.392e+02 6.330e+02, threshold=5.356e+02, percent-clipped=1.0 2024-09-18 06:50:07,937 INFO [train.py:1198] (1/2) Epoch 21, batch 1050, loss[loss=0.2251, simple_loss=0.279, pruned_loss=0.06398, ctc_loss=0.1357, cr_loss=0.4036, over 34577.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2757, pruned_loss=0.0667, ctc_loss=0.1376, cr_loss=0.4131, over 6705513.36 frames. ], batch size: 99, lr: 5.76e-03, grad_scale: 32.0 2024-09-18 06:50:32,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2024-09-18 06:51:05,681 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.55 vs. limit=15.0 2024-09-18 06:51:11,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=366832.6666666667, ans=0.0 2024-09-18 06:51:33,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=366926.0, ans=0.125 2024-09-18 06:51:35,127 INFO [train.py:1198] (1/2) Epoch 21, batch 1100, loss[loss=0.2176, simple_loss=0.2653, pruned_loss=0.06333, ctc_loss=0.1336, cr_loss=0.4139, over 34358.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2756, pruned_loss=0.06669, ctc_loss=0.1376, cr_loss=0.4124, over 6715921.73 frames. ], batch size: 91, lr: 5.76e-03, grad_scale: 32.0 2024-09-18 06:51:37,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.91 vs. limit=15.0 2024-09-18 06:51:48,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=366926.0, ans=0.0 2024-09-18 06:51:50,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=15.0 2024-09-18 06:52:21,383 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.938e+02 2.512e+02 2.840e+02 3.638e+02 5.548e+02, threshold=5.681e+02, percent-clipped=0.0 2024-09-18 06:52:31,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=367066.0, ans=0.125 2024-09-18 06:52:56,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=367159.3333333333, ans=0.0 2024-09-18 06:52:57,658 INFO [train.py:1198] (1/2) Epoch 21, batch 1150, loss[loss=0.2137, simple_loss=0.2646, pruned_loss=0.06126, ctc_loss=0.126, cr_loss=0.3776, over 34349.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2756, pruned_loss=0.06673, ctc_loss=0.1377, cr_loss=0.4125, over 6714457.72 frames. ], batch size: 91, lr: 5.76e-03, grad_scale: 16.0 2024-09-18 06:53:28,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-18 06:53:42,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=367252.6666666667, ans=0.125 2024-09-18 06:53:47,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=367299.3333333333, ans=0.125 2024-09-18 06:53:49,659 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:53:57,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=367299.3333333333, ans=0.0 2024-09-18 06:53:59,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=367299.3333333333, ans=0.125 2024-09-18 06:53:59,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=367299.3333333333, ans=0.0 2024-09-18 06:54:05,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=367346.0, ans=0.1 2024-09-18 06:54:05,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=367346.0, ans=0.05 2024-09-18 06:54:10,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=367346.0, ans=0.025 2024-09-18 06:54:20,360 INFO [train.py:1198] (1/2) Epoch 21, batch 1200, loss[loss=0.2353, simple_loss=0.2822, pruned_loss=0.0712, ctc_loss=0.1457, cr_loss=0.4219, over 34547.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.2766, pruned_loss=0.06715, ctc_loss=0.1385, cr_loss=0.4143, over 6708018.66 frames. ], batch size: 99, lr: 5.75e-03, grad_scale: 32.0 2024-09-18 06:54:32,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=367392.6666666667, ans=0.125 2024-09-18 06:54:55,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=367486.0, ans=0.1 2024-09-18 06:55:12,285 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.522e+02 2.974e+02 3.734e+02 7.608e+02, threshold=5.948e+02, percent-clipped=5.0 2024-09-18 06:55:14,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=367532.6666666667, ans=0.0 2024-09-18 06:55:30,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=367579.3333333333, ans=0.0 2024-09-18 06:55:38,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=367579.3333333333, ans=0.125 2024-09-18 06:55:46,637 INFO [train.py:1198] (1/2) Epoch 21, batch 1250, loss[loss=0.2328, simple_loss=0.2849, pruned_loss=0.06773, ctc_loss=0.1401, cr_loss=0.4327, over 34353.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2768, pruned_loss=0.06703, ctc_loss=0.1383, cr_loss=0.4138, over 6741296.72 frames. ], batch size: 107, lr: 5.75e-03, grad_scale: 32.0 2024-09-18 06:55:54,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2024-09-18 06:56:08,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=15.0 2024-09-18 06:56:43,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=367766.0, ans=0.125 2024-09-18 06:57:08,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=367859.3333333333, ans=0.125 2024-09-18 06:57:10,053 INFO [train.py:1198] (1/2) Epoch 21, batch 1300, loss[loss=0.225, simple_loss=0.2827, pruned_loss=0.06272, ctc_loss=0.1305, cr_loss=0.3918, over 33120.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2763, pruned_loss=0.06686, ctc_loss=0.1379, cr_loss=0.4132, over 6746731.26 frames. ], batch size: 130, lr: 5.75e-03, grad_scale: 32.0 2024-09-18 06:57:18,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=367859.3333333333, ans=0.125 2024-09-18 06:57:25,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=367906.0, ans=0.125 2024-09-18 06:57:57,748 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.422e+02 2.819e+02 3.542e+02 9.386e+02, threshold=5.638e+02, percent-clipped=4.0 2024-09-18 06:58:07,002 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.92 vs. limit=10.0 2024-09-18 06:58:22,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=368046.0, ans=0.125 2024-09-18 06:58:29,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=368046.0, ans=0.125 2024-09-18 06:58:32,353 INFO [train.py:1198] (1/2) Epoch 21, batch 1350, loss[loss=0.2317, simple_loss=0.2815, pruned_loss=0.06794, ctc_loss=0.1421, cr_loss=0.4395, over 34548.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2759, pruned_loss=0.06656, ctc_loss=0.1374, cr_loss=0.4121, over 6763620.93 frames. ], batch size: 94, lr: 5.75e-03, grad_scale: 32.0 2024-09-18 06:59:06,129 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 06:59:43,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=368279.3333333333, ans=0.0 2024-09-18 06:59:58,569 INFO [train.py:1198] (1/2) Epoch 21, batch 1400, loss[loss=0.202, simple_loss=0.2513, pruned_loss=0.05659, ctc_loss=0.1189, cr_loss=0.3918, over 34272.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2756, pruned_loss=0.0665, ctc_loss=0.1371, cr_loss=0.4123, over 6775624.95 frames. ], batch size: 80, lr: 5.75e-03, grad_scale: 32.0 2024-09-18 07:00:15,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=368372.6666666667, ans=0.0 2024-09-18 07:00:45,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=368419.3333333333, ans=0.0 2024-09-18 07:00:46,274 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.505e+02 2.925e+02 3.721e+02 5.836e+02, threshold=5.850e+02, percent-clipped=1.0 2024-09-18 07:00:53,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=368466.0, ans=0.0 2024-09-18 07:00:58,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2024-09-18 07:01:08,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=368512.6666666667, ans=0.2 2024-09-18 07:01:20,990 INFO [train.py:1198] (1/2) Epoch 21, batch 1450, loss[loss=0.2437, simple_loss=0.2908, pruned_loss=0.07409, ctc_loss=0.1512, cr_loss=0.4557, over 34473.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2759, pruned_loss=0.06646, ctc_loss=0.1373, cr_loss=0.4129, over 6773189.96 frames. ], batch size: 110, lr: 5.74e-03, grad_scale: 32.0 2024-09-18 07:01:22,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=368559.3333333333, ans=0.2 2024-09-18 07:01:57,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=368652.6666666667, ans=0.2 2024-09-18 07:02:05,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.70 vs. limit=12.0 2024-09-18 07:02:06,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=368652.6666666667, ans=0.125 2024-09-18 07:02:09,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=368699.3333333333, ans=0.125 2024-09-18 07:02:32,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=368746.0, ans=0.125 2024-09-18 07:02:39,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=368746.0, ans=10.0 2024-09-18 07:02:45,613 INFO [train.py:1198] (1/2) Epoch 21, batch 1500, loss[loss=0.2479, simple_loss=0.2945, pruned_loss=0.07634, ctc_loss=0.1537, cr_loss=0.4478, over 34436.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2763, pruned_loss=0.06666, ctc_loss=0.1375, cr_loss=0.4129, over 6772541.47 frames. ], batch size: 100, lr: 5.74e-03, grad_scale: 32.0 2024-09-18 07:03:10,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2024-09-18 07:03:18,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.86 vs. limit=15.0 2024-09-18 07:03:24,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=368886.0, ans=0.125 2024-09-18 07:03:35,683 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.309e+02 2.636e+02 3.364e+02 8.269e+02, threshold=5.271e+02, percent-clipped=1.0 2024-09-18 07:03:59,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=368979.3333333333, ans=0.1 2024-09-18 07:04:10,419 INFO [train.py:1198] (1/2) Epoch 21, batch 1550, loss[loss=0.2341, simple_loss=0.2807, pruned_loss=0.07051, ctc_loss=0.1458, cr_loss=0.4315, over 34419.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2767, pruned_loss=0.06705, ctc_loss=0.1383, cr_loss=0.4143, over 6745315.53 frames. ], batch size: 105, lr: 5.74e-03, grad_scale: 32.0 2024-09-18 07:04:39,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.68 vs. limit=22.5 2024-09-18 07:04:53,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=369119.3333333333, ans=0.0 2024-09-18 07:04:55,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=369119.3333333333, ans=0.0 2024-09-18 07:05:02,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=369166.0, ans=0.0 2024-09-18 07:05:08,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=369166.0, ans=0.125 2024-09-18 07:05:12,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=10.42 vs. limit=12.0 2024-09-18 07:05:33,074 INFO [train.py:1198] (1/2) Epoch 21, batch 1600, loss[loss=0.232, simple_loss=0.2883, pruned_loss=0.0656, ctc_loss=0.1398, cr_loss=0.4148, over 34573.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2764, pruned_loss=0.067, ctc_loss=0.1383, cr_loss=0.4143, over 6723884.01 frames. ], batch size: 99, lr: 5.74e-03, grad_scale: 32.0 2024-09-18 07:06:03,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=369306.0, ans=0.05 2024-09-18 07:06:06,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369352.6666666667, ans=0.1 2024-09-18 07:06:10,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=369352.6666666667, ans=0.125 2024-09-18 07:06:22,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.75 vs. limit=12.0 2024-09-18 07:06:22,856 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.616e+02 2.987e+02 3.947e+02 6.472e+02, threshold=5.975e+02, percent-clipped=6.0 2024-09-18 07:06:27,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-18 07:06:40,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.25 vs. limit=15.0 2024-09-18 07:06:47,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.25 vs. limit=22.5 2024-09-18 07:06:50,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=369446.0, ans=0.2 2024-09-18 07:06:59,634 INFO [train.py:1198] (1/2) Epoch 21, batch 1650, loss[loss=0.2296, simple_loss=0.2825, pruned_loss=0.06593, ctc_loss=0.1395, cr_loss=0.4227, over 34383.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2762, pruned_loss=0.0669, ctc_loss=0.1382, cr_loss=0.4133, over 6717787.01 frames. ], batch size: 103, lr: 5.74e-03, grad_scale: 32.0 2024-09-18 07:07:11,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=369492.6666666667, ans=0.0 2024-09-18 07:07:27,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=369539.3333333333, ans=0.0 2024-09-18 07:07:34,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=369586.0, ans=0.125 2024-09-18 07:08:01,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=369632.6666666667, ans=0.2 2024-09-18 07:08:06,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=369679.3333333333, ans=0.125 2024-09-18 07:08:22,132 INFO [train.py:1198] (1/2) Epoch 21, batch 1700, loss[loss=0.1919, simple_loss=0.2424, pruned_loss=0.05221, ctc_loss=0.1131, cr_loss=0.3592, over 34315.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2759, pruned_loss=0.0666, ctc_loss=0.1376, cr_loss=0.4117, over 6743838.19 frames. ], batch size: 80, lr: 5.74e-03, grad_scale: 32.0 2024-09-18 07:08:43,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=369772.6666666667, ans=0.0 2024-09-18 07:08:56,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=369819.3333333333, ans=0.125 2024-09-18 07:09:03,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=369819.3333333333, ans=0.025 2024-09-18 07:09:11,428 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.445e+02 2.798e+02 3.596e+02 8.456e+02, threshold=5.595e+02, percent-clipped=2.0 2024-09-18 07:09:17,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2024-09-18 07:09:31,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=369912.6666666667, ans=0.1 2024-09-18 07:09:38,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=369912.6666666667, ans=0.025 2024-09-18 07:09:41,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=369912.6666666667, ans=0.025 2024-09-18 07:09:46,263 INFO [train.py:1198] (1/2) Epoch 21, batch 1750, loss[loss=0.2002, simple_loss=0.2473, pruned_loss=0.05719, ctc_loss=0.1172, cr_loss=0.3796, over 34133.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2755, pruned_loss=0.06647, ctc_loss=0.1372, cr_loss=0.4115, over 6753058.41 frames. ], batch size: 78, lr: 5.73e-03, grad_scale: 16.0 2024-09-18 07:10:34,254 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.24 vs. limit=15.0 2024-09-18 07:10:46,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=370099.3333333333, ans=0.125 2024-09-18 07:10:49,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.03 vs. limit=15.0 2024-09-18 07:10:55,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=370146.0, ans=0.025 2024-09-18 07:10:55,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=370146.0, ans=0.0 2024-09-18 07:11:11,254 INFO [train.py:1198] (1/2) Epoch 21, batch 1800, loss[loss=0.2511, simple_loss=0.296, pruned_loss=0.07834, ctc_loss=0.1568, cr_loss=0.4564, over 34707.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2757, pruned_loss=0.0666, ctc_loss=0.1373, cr_loss=0.4115, over 6756604.26 frames. ], batch size: 97, lr: 5.73e-03, grad_scale: 16.0 2024-09-18 07:11:27,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.73 vs. limit=6.0 2024-09-18 07:11:55,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.00 vs. limit=15.0 2024-09-18 07:11:59,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=370332.6666666667, ans=0.07 2024-09-18 07:12:00,631 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.933e+02 2.493e+02 3.199e+02 4.567e+02 6.767e+02, threshold=6.398e+02, percent-clipped=3.0 2024-09-18 07:12:14,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=370332.6666666667, ans=0.0 2024-09-18 07:12:33,905 INFO [train.py:1198] (1/2) Epoch 21, batch 1850, loss[loss=0.221, simple_loss=0.2816, pruned_loss=0.05976, ctc_loss=0.1269, cr_loss=0.3863, over 34465.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2753, pruned_loss=0.06647, ctc_loss=0.1372, cr_loss=0.4115, over 6762030.66 frames. ], batch size: 100, lr: 5.73e-03, grad_scale: 16.0 2024-09-18 07:12:40,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=370426.0, ans=0.1 2024-09-18 07:12:47,844 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.58 vs. limit=22.5 2024-09-18 07:13:02,336 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2024-09-18 07:13:49,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=370612.6666666667, ans=0.025 2024-09-18 07:13:57,367 INFO [train.py:1198] (1/2) Epoch 21, batch 1900, loss[loss=0.2325, simple_loss=0.2839, pruned_loss=0.06759, ctc_loss=0.1418, cr_loss=0.437, over 34376.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2759, pruned_loss=0.06663, ctc_loss=0.1375, cr_loss=0.4122, over 6771236.74 frames. ], batch size: 103, lr: 5.73e-03, grad_scale: 16.0 2024-09-18 07:14:07,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=370659.3333333333, ans=0.09899494936611666 2024-09-18 07:14:09,363 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:14:32,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=370752.6666666667, ans=0.1 2024-09-18 07:14:48,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=370799.3333333333, ans=0.0 2024-09-18 07:14:49,354 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.621e+02 3.071e+02 3.993e+02 7.457e+02, threshold=6.141e+02, percent-clipped=1.0 2024-09-18 07:15:09,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=370846.0, ans=0.025 2024-09-18 07:15:22,133 INFO [train.py:1198] (1/2) Epoch 21, batch 1950, loss[loss=0.2256, simple_loss=0.2757, pruned_loss=0.06592, ctc_loss=0.1366, cr_loss=0.4072, over 34343.00 frames. ], tot_loss[loss=0.2275, simple_loss=0.2769, pruned_loss=0.06697, ctc_loss=0.1381, cr_loss=0.4135, over 6788535.94 frames. ], batch size: 91, lr: 5.73e-03, grad_scale: 16.0 2024-09-18 07:15:25,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=370892.6666666667, ans=0.1 2024-09-18 07:15:44,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=370939.3333333333, ans=0.125 2024-09-18 07:15:57,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=370986.0, ans=0.125 2024-09-18 07:16:02,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=370986.0, ans=0.2 2024-09-18 07:16:17,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=371032.6666666667, ans=0.2 2024-09-18 07:16:44,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.00 vs. limit=12.0 2024-09-18 07:16:45,117 INFO [train.py:1198] (1/2) Epoch 21, batch 2000, loss[loss=0.2075, simple_loss=0.2526, pruned_loss=0.06086, ctc_loss=0.1265, cr_loss=0.3852, over 34170.00 frames. ], tot_loss[loss=0.2281, simple_loss=0.2774, pruned_loss=0.06722, ctc_loss=0.1386, cr_loss=0.414, over 6763680.50 frames. ], batch size: 78, lr: 5.72e-03, grad_scale: 32.0 2024-09-18 07:16:55,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=371126.0, ans=0.1 2024-09-18 07:17:01,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=371126.0, ans=0.1 2024-09-18 07:17:37,324 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.397e+02 2.911e+02 3.586e+02 7.170e+02, threshold=5.823e+02, percent-clipped=3.0 2024-09-18 07:17:47,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=371266.0, ans=0.125 2024-09-18 07:18:01,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=371312.6666666667, ans=0.1 2024-09-18 07:18:01,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=371312.6666666667, ans=0.0 2024-09-18 07:18:01,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=371312.6666666667, ans=0.2 2024-09-18 07:18:09,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=371312.6666666667, ans=0.07 2024-09-18 07:18:12,522 INFO [train.py:1198] (1/2) Epoch 21, batch 2050, loss[loss=0.1931, simple_loss=0.241, pruned_loss=0.05425, ctc_loss=0.1134, cr_loss=0.3493, over 34447.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2763, pruned_loss=0.06689, ctc_loss=0.138, cr_loss=0.4124, over 6754144.38 frames. ], batch size: 82, lr: 5.72e-03, grad_scale: 32.0 2024-09-18 07:18:30,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=371406.0, ans=0.125 2024-09-18 07:19:06,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.05 vs. limit=10.0 2024-09-18 07:19:12,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=371499.3333333333, ans=0.1 2024-09-18 07:19:34,897 INFO [train.py:1198] (1/2) Epoch 21, batch 2100, loss[loss=0.2274, simple_loss=0.2737, pruned_loss=0.06754, ctc_loss=0.1425, cr_loss=0.4355, over 34551.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2759, pruned_loss=0.06671, ctc_loss=0.1377, cr_loss=0.4121, over 6767418.35 frames. ], batch size: 94, lr: 5.72e-03, grad_scale: 32.0 2024-09-18 07:19:36,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=371592.6666666667, ans=0.025 2024-09-18 07:19:38,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=371592.6666666667, ans=0.125 2024-09-18 07:19:40,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=371592.6666666667, ans=0.0 2024-09-18 07:19:51,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=371639.3333333333, ans=0.0 2024-09-18 07:20:01,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=371639.3333333333, ans=0.1 2024-09-18 07:20:14,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.64 vs. limit=6.0 2024-09-18 07:20:23,706 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.540e+02 2.991e+02 3.779e+02 7.007e+02, threshold=5.981e+02, percent-clipped=5.0 2024-09-18 07:20:32,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=371732.6666666667, ans=0.1 2024-09-18 07:20:58,184 INFO [train.py:1198] (1/2) Epoch 21, batch 2150, loss[loss=0.2344, simple_loss=0.2777, pruned_loss=0.07194, ctc_loss=0.1474, cr_loss=0.4439, over 34366.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2755, pruned_loss=0.06652, ctc_loss=0.1374, cr_loss=0.4121, over 6786901.73 frames. ], batch size: 91, lr: 5.72e-03, grad_scale: 32.0 2024-09-18 07:20:58,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=371826.0, ans=0.0 2024-09-18 07:22:06,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=372012.6666666667, ans=0.0 2024-09-18 07:22:15,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=22.5 2024-09-18 07:22:22,753 INFO [train.py:1198] (1/2) Epoch 21, batch 2200, loss[loss=0.2364, simple_loss=0.2879, pruned_loss=0.06923, ctc_loss=0.1472, cr_loss=0.4267, over 34464.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.2758, pruned_loss=0.06674, ctc_loss=0.1377, cr_loss=0.4129, over 6782150.70 frames. ], batch size: 100, lr: 5.72e-03, grad_scale: 32.0 2024-09-18 07:22:49,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=372106.0, ans=0.125 2024-09-18 07:23:02,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=372152.6666666667, ans=0.025 2024-09-18 07:23:10,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=372199.3333333333, ans=0.2 2024-09-18 07:23:11,667 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.566e+02 3.184e+02 4.262e+02 7.980e+02, threshold=6.368e+02, percent-clipped=7.0 2024-09-18 07:23:13,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=372199.3333333333, ans=0.125 2024-09-18 07:23:20,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=372199.3333333333, ans=0.1 2024-09-18 07:23:22,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=372199.3333333333, ans=0.125 2024-09-18 07:23:45,289 INFO [train.py:1198] (1/2) Epoch 21, batch 2250, loss[loss=0.2264, simple_loss=0.2759, pruned_loss=0.06678, ctc_loss=0.1382, cr_loss=0.393, over 34428.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2755, pruned_loss=0.06668, ctc_loss=0.1376, cr_loss=0.4124, over 6780625.75 frames. ], batch size: 95, lr: 5.72e-03, grad_scale: 32.0 2024-09-18 07:24:21,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=372386.0, ans=0.025 2024-09-18 07:25:08,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.41 vs. limit=15.0 2024-09-18 07:25:09,142 INFO [train.py:1198] (1/2) Epoch 21, batch 2300, loss[loss=0.195, simple_loss=0.2476, pruned_loss=0.05295, ctc_loss=0.1098, cr_loss=0.3627, over 34313.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2742, pruned_loss=0.0661, ctc_loss=0.1365, cr_loss=0.4098, over 6768158.47 frames. ], batch size: 83, lr: 5.71e-03, grad_scale: 32.0 2024-09-18 07:25:19,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=15.0 2024-09-18 07:25:22,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=372526.0, ans=0.125 2024-09-18 07:25:25,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=372572.6666666667, ans=0.1 2024-09-18 07:25:54,516 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.42 vs. limit=15.0 2024-09-18 07:25:59,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.71 vs. limit=15.0 2024-09-18 07:26:00,141 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.548e+02 3.215e+02 3.738e+02 5.573e+02, threshold=6.429e+02, percent-clipped=0.0 2024-09-18 07:26:00,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=372666.0, ans=0.125 2024-09-18 07:26:20,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=372712.6666666667, ans=0.1 2024-09-18 07:26:33,112 INFO [train.py:1198] (1/2) Epoch 21, batch 2350, loss[loss=0.2317, simple_loss=0.282, pruned_loss=0.06813, ctc_loss=0.1415, cr_loss=0.4206, over 34698.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2746, pruned_loss=0.06627, ctc_loss=0.1369, cr_loss=0.4111, over 6774010.14 frames. ], batch size: 97, lr: 5.71e-03, grad_scale: 32.0 2024-09-18 07:26:58,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.10 vs. limit=22.5 2024-09-18 07:27:46,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=372946.0, ans=22.5 2024-09-18 07:27:55,424 INFO [train.py:1198] (1/2) Epoch 21, batch 2400, loss[loss=0.219, simple_loss=0.2655, pruned_loss=0.06455, ctc_loss=0.1347, cr_loss=0.4128, over 34564.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2751, pruned_loss=0.06646, ctc_loss=0.1372, cr_loss=0.4119, over 6778044.33 frames. ], batch size: 89, lr: 5.71e-03, grad_scale: 32.0 2024-09-18 07:28:02,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=372992.6666666667, ans=0.2 2024-09-18 07:28:03,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=372992.6666666667, ans=0.125 2024-09-18 07:28:13,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=373039.3333333333, ans=0.125 2024-09-18 07:28:21,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.03 vs. limit=10.0 2024-09-18 07:28:34,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=373086.0, ans=0.2 2024-09-18 07:28:46,911 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.531e+02 3.035e+02 3.819e+02 6.565e+02, threshold=6.070e+02, percent-clipped=1.0 2024-09-18 07:28:50,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.04 vs. limit=22.5 2024-09-18 07:29:13,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=373179.3333333333, ans=0.125 2024-09-18 07:29:19,992 INFO [train.py:1198] (1/2) Epoch 21, batch 2450, loss[loss=0.2301, simple_loss=0.2804, pruned_loss=0.06733, ctc_loss=0.1395, cr_loss=0.4305, over 34416.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.276, pruned_loss=0.06683, ctc_loss=0.1379, cr_loss=0.4134, over 6750708.86 frames. ], batch size: 95, lr: 5.71e-03, grad_scale: 32.0 2024-09-18 07:30:16,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=373366.0, ans=0.0 2024-09-18 07:30:28,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.63 vs. limit=22.5 2024-09-18 07:30:41,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=373412.6666666667, ans=0.025 2024-09-18 07:30:43,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.80 vs. limit=15.0 2024-09-18 07:30:50,865 INFO [train.py:1198] (1/2) Epoch 21, batch 2500, loss[loss=0.2329, simple_loss=0.2837, pruned_loss=0.068, ctc_loss=0.1436, cr_loss=0.4325, over 34438.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2757, pruned_loss=0.0667, ctc_loss=0.1376, cr_loss=0.4128, over 6762655.06 frames. ], batch size: 100, lr: 5.71e-03, grad_scale: 32.0 2024-09-18 07:30:59,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=373459.3333333333, ans=0.125 2024-09-18 07:31:20,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=373506.0, ans=0.0 2024-09-18 07:31:20,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=373506.0, ans=0.0 2024-09-18 07:31:25,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=373552.6666666667, ans=0.2 2024-09-18 07:31:39,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=373599.3333333333, ans=0.0 2024-09-18 07:31:40,412 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.366e+02 2.571e+02 3.043e+02 5.121e+02, threshold=5.141e+02, percent-clipped=0.0 2024-09-18 07:31:44,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.21 vs. limit=15.0 2024-09-18 07:31:56,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2024-09-18 07:32:12,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=373646.0, ans=0.2 2024-09-18 07:32:15,387 INFO [train.py:1198] (1/2) Epoch 21, batch 2550, loss[loss=0.2082, simple_loss=0.255, pruned_loss=0.0607, ctc_loss=0.1265, cr_loss=0.3685, over 34148.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2758, pruned_loss=0.06667, ctc_loss=0.1375, cr_loss=0.4125, over 6764969.00 frames. ], batch size: 78, lr: 5.71e-03, grad_scale: 32.0 2024-09-18 07:32:20,860 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=8.787e-02 2024-09-18 07:32:32,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=373739.3333333333, ans=0.125 2024-09-18 07:32:33,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=373739.3333333333, ans=0.125 2024-09-18 07:32:39,266 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.01 vs. limit=6.0 2024-09-18 07:32:46,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=373786.0, ans=0.1 2024-09-18 07:32:51,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=373786.0, ans=0.0 2024-09-18 07:32:59,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=373786.0, ans=0.125 2024-09-18 07:33:17,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=373832.6666666667, ans=0.0 2024-09-18 07:33:30,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=373879.3333333333, ans=0.1 2024-09-18 07:33:39,639 INFO [train.py:1198] (1/2) Epoch 21, batch 2600, loss[loss=0.2381, simple_loss=0.2822, pruned_loss=0.07328, ctc_loss=0.1498, cr_loss=0.4374, over 34357.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2764, pruned_loss=0.06687, ctc_loss=0.138, cr_loss=0.414, over 6760947.54 frames. ], batch size: 91, lr: 5.70e-03, grad_scale: 32.0 2024-09-18 07:33:54,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=373972.6666666667, ans=0.125 2024-09-18 07:33:59,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=373972.6666666667, ans=0.0 2024-09-18 07:34:20,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=374019.3333333333, ans=0.0 2024-09-18 07:34:28,394 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 2.678e+02 3.393e+02 4.833e+02 8.455e+02, threshold=6.786e+02, percent-clipped=19.0 2024-09-18 07:34:36,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=374066.0, ans=0.125 2024-09-18 07:34:56,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=374112.6666666667, ans=10.0 2024-09-18 07:35:00,862 INFO [train.py:1198] (1/2) Epoch 21, batch 2650, loss[loss=0.235, simple_loss=0.2874, pruned_loss=0.06838, ctc_loss=0.144, cr_loss=0.4287, over 34223.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2766, pruned_loss=0.06684, ctc_loss=0.1379, cr_loss=0.4139, over 6769081.43 frames. ], batch size: 117, lr: 5.70e-03, grad_scale: 32.0 2024-09-18 07:35:02,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=374159.3333333333, ans=0.125 2024-09-18 07:35:09,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=374159.3333333333, ans=0.0 2024-09-18 07:35:17,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=374206.0, ans=0.0 2024-09-18 07:35:21,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=374206.0, ans=0.125 2024-09-18 07:35:24,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=374206.0, ans=0.2 2024-09-18 07:35:30,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=374206.0, ans=0.125 2024-09-18 07:35:57,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=374299.3333333333, ans=0.2 2024-09-18 07:36:00,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=374299.3333333333, ans=0.0 2024-09-18 07:36:19,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=374346.0, ans=15.0 2024-09-18 07:36:25,158 INFO [train.py:1198] (1/2) Epoch 21, batch 2700, loss[loss=0.2317, simple_loss=0.2821, pruned_loss=0.06832, ctc_loss=0.1414, cr_loss=0.4089, over 34607.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.2769, pruned_loss=0.06701, ctc_loss=0.1383, cr_loss=0.4144, over 6764269.46 frames. ], batch size: 102, lr: 5.70e-03, grad_scale: 32.0 2024-09-18 07:36:53,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=374439.3333333333, ans=0.2 2024-09-18 07:37:05,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=374486.0, ans=0.0 2024-09-18 07:37:16,925 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.091e+02 2.492e+02 2.863e+02 3.801e+02 6.008e+02, threshold=5.725e+02, percent-clipped=0.0 2024-09-18 07:37:22,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=374532.6666666667, ans=0.125 2024-09-18 07:37:25,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=374532.6666666667, ans=0.125 2024-09-18 07:37:40,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=374579.3333333333, ans=0.025 2024-09-18 07:37:50,153 INFO [train.py:1198] (1/2) Epoch 21, batch 2750, loss[loss=0.2184, simple_loss=0.2664, pruned_loss=0.06431, ctc_loss=0.1292, cr_loss=0.3967, over 34614.00 frames. ], tot_loss[loss=0.2265, simple_loss=0.2758, pruned_loss=0.06658, ctc_loss=0.1374, cr_loss=0.4128, over 6761362.78 frames. ], batch size: 88, lr: 5.70e-03, grad_scale: 16.0 2024-09-18 07:37:52,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.99 vs. limit=15.0 2024-09-18 07:37:58,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=374626.0, ans=0.1 2024-09-18 07:38:13,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=374672.6666666667, ans=0.0 2024-09-18 07:38:17,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=374672.6666666667, ans=0.0 2024-09-18 07:38:21,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=374719.3333333333, ans=0.125 2024-09-18 07:38:22,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.56 vs. limit=12.0 2024-09-18 07:38:33,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.04 vs. limit=15.0 2024-09-18 07:38:43,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=374766.0, ans=0.125 2024-09-18 07:39:01,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2024-09-18 07:39:12,679 INFO [train.py:1198] (1/2) Epoch 21, batch 2800, loss[loss=0.251, simple_loss=0.2878, pruned_loss=0.08167, ctc_loss=0.1655, cr_loss=0.4446, over 25292.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.2758, pruned_loss=0.06672, ctc_loss=0.1376, cr_loss=0.4125, over 6741624.65 frames. ], batch size: 244, lr: 5.70e-03, grad_scale: 32.0 2024-09-18 07:39:18,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.38 vs. limit=15.0 2024-09-18 07:39:27,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=374906.0, ans=0.125 2024-09-18 07:39:34,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=374906.0, ans=0.0 2024-09-18 07:39:38,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=15.0 2024-09-18 07:39:56,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=374952.6666666667, ans=0.125 2024-09-18 07:39:59,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=374952.6666666667, ans=0.125 2024-09-18 07:39:59,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=374952.6666666667, ans=0.1 2024-09-18 07:40:00,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=374952.6666666667, ans=0.125 2024-09-18 07:40:00,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=374952.6666666667, ans=0.125 2024-09-18 07:40:05,381 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.900e+02 2.572e+02 3.083e+02 3.930e+02 8.646e+02, threshold=6.166e+02, percent-clipped=4.0 2024-09-18 07:40:17,932 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.41 vs. limit=12.0 2024-09-18 07:40:18,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=375046.0, ans=0.125 2024-09-18 07:40:22,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=375046.0, ans=0.04949747468305833 2024-09-18 07:40:25,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=375046.0, ans=0.125 2024-09-18 07:40:36,411 INFO [train.py:1198] (1/2) Epoch 21, batch 2850, loss[loss=0.2248, simple_loss=0.2711, pruned_loss=0.06762, ctc_loss=0.1373, cr_loss=0.3974, over 34489.00 frames. ], tot_loss[loss=0.2272, simple_loss=0.2761, pruned_loss=0.06703, ctc_loss=0.1382, cr_loss=0.4131, over 6725307.13 frames. ], batch size: 90, lr: 5.69e-03, grad_scale: 32.0 2024-09-18 07:40:36,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=375092.6666666667, ans=0.125 2024-09-18 07:40:42,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=22.5 2024-09-18 07:40:58,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=375139.3333333333, ans=0.0 2024-09-18 07:41:00,972 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.63 vs. limit=15.0 2024-09-18 07:41:03,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=375139.3333333333, ans=0.1 2024-09-18 07:41:20,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=375186.0, ans=0.125 2024-09-18 07:41:51,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=375279.3333333333, ans=0.0 2024-09-18 07:41:55,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=375279.3333333333, ans=0.0 2024-09-18 07:42:01,407 INFO [train.py:1198] (1/2) Epoch 21, batch 2900, loss[loss=0.2168, simple_loss=0.2719, pruned_loss=0.06018, ctc_loss=0.1258, cr_loss=0.4015, over 34508.00 frames. ], tot_loss[loss=0.2282, simple_loss=0.2773, pruned_loss=0.06732, ctc_loss=0.1387, cr_loss=0.4152, over 6755478.86 frames. ], batch size: 94, lr: 5.69e-03, grad_scale: 32.0 2024-09-18 07:42:20,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375372.6666666667, ans=0.1 2024-09-18 07:42:26,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=375372.6666666667, ans=0.0 2024-09-18 07:42:35,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=375419.3333333333, ans=0.0 2024-09-18 07:42:43,319 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:42:44,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=375419.3333333333, ans=0.0 2024-09-18 07:42:48,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=375419.3333333333, ans=0.0 2024-09-18 07:42:52,507 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.563e+02 3.007e+02 4.140e+02 6.707e+02, threshold=6.014e+02, percent-clipped=5.0 2024-09-18 07:43:02,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=375466.0, ans=0.0 2024-09-18 07:43:04,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=375466.0, ans=0.1 2024-09-18 07:43:12,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=375512.6666666667, ans=0.125 2024-09-18 07:43:14,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=375512.6666666667, ans=0.0 2024-09-18 07:43:17,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=375512.6666666667, ans=0.0 2024-09-18 07:43:19,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=375512.6666666667, ans=0.125 2024-09-18 07:43:25,803 INFO [train.py:1198] (1/2) Epoch 21, batch 2950, loss[loss=0.214, simple_loss=0.261, pruned_loss=0.06291, ctc_loss=0.1273, cr_loss=0.3927, over 34624.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.276, pruned_loss=0.06686, ctc_loss=0.1378, cr_loss=0.4134, over 6749881.54 frames. ], batch size: 88, lr: 5.69e-03, grad_scale: 32.0 2024-09-18 07:43:48,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2024-09-18 07:44:50,732 INFO [train.py:1198] (1/2) Epoch 21, batch 3000, loss[loss=0.2252, simple_loss=0.2776, pruned_loss=0.06505, ctc_loss=0.1318, cr_loss=0.4072, over 34530.00 frames. ], tot_loss[loss=0.2269, simple_loss=0.2761, pruned_loss=0.06683, ctc_loss=0.1378, cr_loss=0.4135, over 6751356.33 frames. ], batch size: 94, lr: 5.69e-03, grad_scale: 16.0 2024-09-18 07:44:50,732 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 07:45:06,430 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.9125, 2.9729, 2.6865, 3.0726, 2.9264, 2.0759, 2.8868, 2.9613], device='cuda:1') 2024-09-18 07:45:07,660 INFO [train.py:1230] (1/2) Epoch 21, validation: loss=0.1487, simple_loss=0.2456, pruned_loss=0.02178, ctc_loss=0.04084, cr_loss=1.824e-14, over 944034.00 frames. 2024-09-18 07:45:07,660 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 07:45:24,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=375839.3333333333, ans=0.05 2024-09-18 07:45:25,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.67 vs. limit=10.0 2024-09-18 07:45:39,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=375886.0, ans=0.125 2024-09-18 07:45:50,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=375886.0, ans=0.0 2024-09-18 07:45:58,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=375932.6666666667, ans=0.0 2024-09-18 07:45:59,883 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.138e+02 2.469e+02 2.899e+02 3.711e+02 8.952e+02, threshold=5.798e+02, percent-clipped=5.0 2024-09-18 07:46:03,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=375932.6666666667, ans=0.0 2024-09-18 07:46:08,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=375932.6666666667, ans=0.125 2024-09-18 07:46:11,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375979.3333333333, ans=0.1 2024-09-18 07:46:29,560 INFO [train.py:1198] (1/2) Epoch 21, batch 3050, loss[loss=0.2193, simple_loss=0.2665, pruned_loss=0.0649, ctc_loss=0.1312, cr_loss=0.4019, over 34605.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2765, pruned_loss=0.06692, ctc_loss=0.1382, cr_loss=0.4138, over 6742649.16 frames. ], batch size: 89, lr: 5.69e-03, grad_scale: 16.0 2024-09-18 07:46:29,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=376026.0, ans=0.0 2024-09-18 07:47:18,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=376166.0, ans=0.125 2024-09-18 07:47:26,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=376166.0, ans=0.07 2024-09-18 07:47:37,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=376212.6666666667, ans=0.125 2024-09-18 07:47:52,022 INFO [train.py:1198] (1/2) Epoch 21, batch 3100, loss[loss=0.2431, simple_loss=0.2945, pruned_loss=0.07237, ctc_loss=0.1496, cr_loss=0.4278, over 34292.00 frames. ], tot_loss[loss=0.227, simple_loss=0.2763, pruned_loss=0.06681, ctc_loss=0.1381, cr_loss=0.4139, over 6741549.28 frames. ], batch size: 117, lr: 5.69e-03, grad_scale: 16.0 2024-09-18 07:48:10,245 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:48:26,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=376352.6666666667, ans=0.1 2024-09-18 07:48:28,668 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2024-09-18 07:48:44,090 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.391e+02 2.807e+02 3.835e+02 8.950e+02, threshold=5.613e+02, percent-clipped=5.0 2024-09-18 07:49:13,516 INFO [train.py:1198] (1/2) Epoch 21, batch 3150, loss[loss=0.235, simple_loss=0.2861, pruned_loss=0.06889, ctc_loss=0.1447, cr_loss=0.4293, over 33882.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.276, pruned_loss=0.06666, ctc_loss=0.1378, cr_loss=0.4132, over 6748290.12 frames. ], batch size: 122, lr: 5.68e-03, grad_scale: 16.0 2024-09-18 07:49:30,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=376539.3333333333, ans=0.2 2024-09-18 07:50:35,789 INFO [train.py:1198] (1/2) Epoch 21, batch 3200, loss[loss=0.2206, simple_loss=0.277, pruned_loss=0.06122, ctc_loss=0.1292, cr_loss=0.4006, over 34515.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2753, pruned_loss=0.06632, ctc_loss=0.1371, cr_loss=0.4122, over 6760458.02 frames. ], batch size: 94, lr: 5.68e-03, grad_scale: 32.0 2024-09-18 07:50:53,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=376772.6666666667, ans=0.0 2024-09-18 07:50:56,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=22.5 2024-09-18 07:51:05,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=376772.6666666667, ans=0.1 2024-09-18 07:51:14,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=12.0 2024-09-18 07:51:27,819 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.082e+02 2.441e+02 2.982e+02 3.334e+02 5.970e+02, threshold=5.964e+02, percent-clipped=1.0 2024-09-18 07:51:29,958 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:51:56,659 INFO [train.py:1198] (1/2) Epoch 21, batch 3250, loss[loss=0.2392, simple_loss=0.2846, pruned_loss=0.07294, ctc_loss=0.1503, cr_loss=0.4501, over 34670.00 frames. ], tot_loss[loss=0.2266, simple_loss=0.276, pruned_loss=0.06654, ctc_loss=0.1375, cr_loss=0.4131, over 6770171.54 frames. ], batch size: 98, lr: 5.68e-03, grad_scale: 16.0 2024-09-18 07:52:13,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=377006.0, ans=0.125 2024-09-18 07:52:31,204 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.31 vs. limit=15.0 2024-09-18 07:53:05,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.37 vs. limit=15.0 2024-09-18 07:53:17,742 INFO [train.py:1198] (1/2) Epoch 21, batch 3300, loss[loss=0.2364, simple_loss=0.2868, pruned_loss=0.07013, ctc_loss=0.1422, cr_loss=0.4307, over 33114.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2745, pruned_loss=0.06596, ctc_loss=0.1365, cr_loss=0.4106, over 6768518.35 frames. ], batch size: 130, lr: 5.68e-03, grad_scale: 16.0 2024-09-18 07:53:18,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=377192.6666666667, ans=0.0 2024-09-18 07:53:19,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=377192.6666666667, ans=0.125 2024-09-18 07:53:46,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=377239.3333333333, ans=0.1 2024-09-18 07:54:00,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=377286.0, ans=0.2 2024-09-18 07:54:12,569 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.086e+02 2.506e+02 2.855e+02 3.735e+02 5.514e+02, threshold=5.709e+02, percent-clipped=0.0 2024-09-18 07:54:16,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=377332.6666666667, ans=0.0 2024-09-18 07:54:38,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=377426.0, ans=0.1 2024-09-18 07:54:39,728 INFO [train.py:1198] (1/2) Epoch 21, batch 3350, loss[loss=0.2386, simple_loss=0.2924, pruned_loss=0.06917, ctc_loss=0.1452, cr_loss=0.4324, over 33829.00 frames. ], tot_loss[loss=0.2261, simple_loss=0.2754, pruned_loss=0.06645, ctc_loss=0.1374, cr_loss=0.4126, over 6743875.85 frames. ], batch size: 122, lr: 5.68e-03, grad_scale: 16.0 2024-09-18 07:55:05,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=377472.6666666667, ans=0.125 2024-09-18 07:55:13,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=377519.3333333333, ans=0.1 2024-09-18 07:55:17,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=377519.3333333333, ans=0.0 2024-09-18 07:55:24,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.78 vs. limit=15.0 2024-09-18 07:55:28,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=377566.0, ans=0.125 2024-09-18 07:55:57,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=377612.6666666667, ans=0.0 2024-09-18 07:56:01,864 INFO [train.py:1198] (1/2) Epoch 21, batch 3400, loss[loss=0.2009, simple_loss=0.2518, pruned_loss=0.05594, ctc_loss=0.118, cr_loss=0.3606, over 34158.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2753, pruned_loss=0.0666, ctc_loss=0.1375, cr_loss=0.4127, over 6733578.98 frames. ], batch size: 78, lr: 5.68e-03, grad_scale: 16.0 2024-09-18 07:56:07,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=377659.3333333333, ans=0.125 2024-09-18 07:56:10,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2024-09-18 07:56:37,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=377752.6666666667, ans=0.125 2024-09-18 07:56:47,344 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:56:54,913 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.128e+02 2.515e+02 2.934e+02 3.534e+02 5.144e+02, threshold=5.869e+02, percent-clipped=0.0 2024-09-18 07:57:17,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=377846.0, ans=0.1 2024-09-18 07:57:22,036 INFO [train.py:1198] (1/2) Epoch 21, batch 3450, loss[loss=0.2373, simple_loss=0.296, pruned_loss=0.06705, ctc_loss=0.1398, cr_loss=0.4115, over 33169.00 frames. ], tot_loss[loss=0.2263, simple_loss=0.2756, pruned_loss=0.06654, ctc_loss=0.1373, cr_loss=0.4121, over 6745451.02 frames. ], batch size: 130, lr: 5.67e-03, grad_scale: 16.0 2024-09-18 07:57:27,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=377892.6666666667, ans=0.125 2024-09-18 07:57:46,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=377939.3333333333, ans=0.125 2024-09-18 07:58:18,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=378032.6666666667, ans=0.0 2024-09-18 07:58:23,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=378032.6666666667, ans=0.125 2024-09-18 07:58:43,917 INFO [train.py:1198] (1/2) Epoch 21, batch 3500, loss[loss=0.2034, simple_loss=0.2582, pruned_loss=0.05499, ctc_loss=0.1194, cr_loss=0.3715, over 34469.00 frames. ], tot_loss[loss=0.2259, simple_loss=0.2752, pruned_loss=0.06634, ctc_loss=0.1371, cr_loss=0.4118, over 6747758.04 frames. ], batch size: 85, lr: 5.67e-03, grad_scale: 16.0 2024-09-18 07:58:52,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=378126.0, ans=0.2 2024-09-18 07:59:18,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=378219.3333333333, ans=0.0 2024-09-18 07:59:32,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=378266.0, ans=0.0 2024-09-18 07:59:37,222 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.515e+02 2.832e+02 3.335e+02 6.185e+02, threshold=5.664e+02, percent-clipped=1.0 2024-09-18 07:59:45,685 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 07:59:57,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=378312.6666666667, ans=0.125 2024-09-18 08:00:03,988 INFO [train.py:1198] (1/2) Epoch 21, batch 3550, loss[loss=0.2337, simple_loss=0.2903, pruned_loss=0.06672, ctc_loss=0.1372, cr_loss=0.4052, over 34388.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2755, pruned_loss=0.06654, ctc_loss=0.1373, cr_loss=0.4123, over 6756955.11 frames. ], batch size: 103, lr: 5.67e-03, grad_scale: 16.0 2024-09-18 08:00:51,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378499.3333333333, ans=0.1 2024-09-18 08:01:00,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=378499.3333333333, ans=0.125 2024-09-18 08:01:25,559 INFO [train.py:1198] (1/2) Epoch 21, batch 3600, loss[loss=0.2103, simple_loss=0.2611, pruned_loss=0.05984, ctc_loss=0.1233, cr_loss=0.3793, over 34463.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2755, pruned_loss=0.06646, ctc_loss=0.1372, cr_loss=0.4121, over 6766693.89 frames. ], batch size: 90, lr: 5.67e-03, grad_scale: 32.0 2024-09-18 08:01:37,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=378592.6666666667, ans=0.0 2024-09-18 08:01:43,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=378639.3333333333, ans=0.125 2024-09-18 08:02:01,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=378686.0, ans=0.025 2024-09-18 08:02:18,964 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.133e+02 2.594e+02 3.040e+02 3.852e+02 7.476e+02, threshold=6.080e+02, percent-clipped=3.0 2024-09-18 08:02:33,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=378779.3333333333, ans=0.125 2024-09-18 08:02:45,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.22 vs. limit=15.0 2024-09-18 08:02:46,366 INFO [train.py:1198] (1/2) Epoch 21, batch 3650, loss[loss=0.2478, simple_loss=0.2951, pruned_loss=0.07569, ctc_loss=0.1534, cr_loss=0.4593, over 34444.00 frames. ], tot_loss[loss=0.2255, simple_loss=0.2749, pruned_loss=0.06614, ctc_loss=0.1366, cr_loss=0.4111, over 6769434.68 frames. ], batch size: 110, lr: 5.67e-03, grad_scale: 32.0 2024-09-18 08:02:57,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=378826.0, ans=0.015 2024-09-18 08:03:58,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.25 vs. limit=10.0 2024-09-18 08:04:02,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=379012.6666666667, ans=0.125 2024-09-18 08:04:05,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=379012.6666666667, ans=0.125 2024-09-18 08:04:08,050 INFO [train.py:1198] (1/2) Epoch 21, batch 3700, loss[loss=0.2416, simple_loss=0.2943, pruned_loss=0.07142, ctc_loss=0.1439, cr_loss=0.4321, over 34613.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.275, pruned_loss=0.06607, ctc_loss=0.1365, cr_loss=0.4104, over 6784180.28 frames. ], batch size: 102, lr: 5.67e-03, grad_scale: 16.0 2024-09-18 08:04:18,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=379059.3333333333, ans=0.0 2024-09-18 08:04:20,330 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.85 vs. limit=12.0 2024-09-18 08:04:21,515 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:04:48,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=379152.6666666667, ans=0.125 2024-09-18 08:04:52,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=379152.6666666667, ans=0.0 2024-09-18 08:05:03,161 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.370e+02 2.724e+02 3.430e+02 7.520e+02, threshold=5.449e+02, percent-clipped=4.0 2024-09-18 08:05:12,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.83 vs. limit=15.0 2024-09-18 08:05:26,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=379246.0, ans=0.125 2024-09-18 08:05:29,633 INFO [train.py:1198] (1/2) Epoch 21, batch 3750, loss[loss=0.2496, simple_loss=0.2978, pruned_loss=0.07599, ctc_loss=0.1545, cr_loss=0.4631, over 34380.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2782, pruned_loss=0.06734, ctc_loss=0.1389, cr_loss=0.4162, over 6785402.02 frames. ], batch size: 113, lr: 5.66e-03, grad_scale: 16.0 2024-09-18 08:05:31,585 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:05:48,295 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.87 vs. limit=15.0 2024-09-18 08:05:57,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=379339.3333333333, ans=0.125 2024-09-18 08:06:00,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=379386.0, ans=0.125 2024-09-18 08:06:11,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=379386.0, ans=0.0 2024-09-18 08:06:14,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=379386.0, ans=0.1 2024-09-18 08:06:15,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.44 vs. limit=15.0 2024-09-18 08:06:40,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=15.0 2024-09-18 08:06:42,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2024-09-18 08:06:50,946 INFO [train.py:1198] (1/2) Epoch 21, batch 3800, loss[loss=0.2554, simple_loss=0.2948, pruned_loss=0.08225, ctc_loss=0.1667, cr_loss=0.4518, over 29958.00 frames. ], tot_loss[loss=0.2319, simple_loss=0.2808, pruned_loss=0.06892, ctc_loss=0.1418, cr_loss=0.4213, over 6675729.90 frames. ], batch size: 175, lr: 5.66e-03, grad_scale: 16.0 2024-09-18 08:07:06,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=379572.6666666667, ans=0.125 2024-09-18 08:07:18,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=379572.6666666667, ans=0.0 2024-09-18 08:07:21,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=379572.6666666667, ans=0.1 2024-09-18 08:07:36,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=379619.3333333333, ans=0.0 2024-09-18 08:07:41,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=379666.0, ans=0.125 2024-09-18 08:07:48,402 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.092e+02 2.398e+02 2.644e+02 3.105e+02 1.305e+03, threshold=5.288e+02, percent-clipped=1.0 2024-09-18 08:07:59,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.19 vs. limit=15.0 2024-09-18 08:08:15,377 INFO [train.py:1198] (1/2) Epoch 21, batch 3850, loss[loss=0.2703, simple_loss=0.3042, pruned_loss=0.09035, ctc_loss=0.1849, cr_loss=0.4686, over 23881.00 frames. ], tot_loss[loss=0.2363, simple_loss=0.2835, pruned_loss=0.07136, ctc_loss=0.1469, cr_loss=0.4255, over 6249074.31 frames. ], batch size: 244, lr: 5.66e-03, grad_scale: 16.0 2024-09-18 08:08:17,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=379759.3333333333, ans=0.015 2024-09-18 08:08:24,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=379759.3333333333, ans=0.1 2024-09-18 08:08:37,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=379806.0, ans=0.2 2024-09-18 08:09:53,792 INFO [train.py:1198] (1/2) Epoch 22, batch 0, loss[loss=0.2114, simple_loss=0.2617, pruned_loss=0.06014, ctc_loss=0.1265, cr_loss=0.3866, over 34483.00 frames. ], tot_loss[loss=0.2114, simple_loss=0.2617, pruned_loss=0.06014, ctc_loss=0.1265, cr_loss=0.3866, over 34483.00 frames. ], batch size: 85, lr: 5.53e-03, grad_scale: 32.0 2024-09-18 08:09:53,793 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 08:10:03,652 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.1377, 4.0207, 3.0717, 3.8488], device='cuda:1') 2024-09-18 08:10:10,622 INFO [train.py:1230] (1/2) Epoch 22, validation: loss=0.1482, simple_loss=0.2462, pruned_loss=0.02104, ctc_loss=0.04054, cr_loss=1.912e-14, over 944034.00 frames. 2024-09-18 08:10:10,622 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 08:10:12,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=379885.3333333333, ans=0.025 2024-09-18 08:10:55,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=379978.6666666667, ans=0.025 2024-09-18 08:11:07,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=380025.3333333333, ans=0.07 2024-09-18 08:11:10,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=380025.3333333333, ans=0.125 2024-09-18 08:11:33,576 INFO [train.py:1198] (1/2) Epoch 22, batch 50, loss[loss=0.1919, simple_loss=0.2426, pruned_loss=0.05213, ctc_loss=0.1115, cr_loss=0.364, over 34503.00 frames. ], tot_loss[loss=0.2286, simple_loss=0.2774, pruned_loss=0.0676, ctc_loss=0.1396, cr_loss=0.4172, over 1481108.28 frames. ], batch size: 82, lr: 5.52e-03, grad_scale: 16.0 2024-09-18 08:11:46,891 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.123e+02 2.585e+02 2.858e+02 3.500e+02 7.817e+02, threshold=5.717e+02, percent-clipped=8.0 2024-09-18 08:12:10,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=380212.0, ans=0.04949747468305833 2024-09-18 08:12:22,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=380212.0, ans=0.125 2024-09-18 08:12:39,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=380258.6666666667, ans=0.125 2024-09-18 08:13:00,510 INFO [train.py:1198] (1/2) Epoch 22, batch 100, loss[loss=0.217, simple_loss=0.2663, pruned_loss=0.06278, ctc_loss=0.1302, cr_loss=0.4007, over 34597.00 frames. ], tot_loss[loss=0.2293, simple_loss=0.2786, pruned_loss=0.06768, ctc_loss=0.1397, cr_loss=0.4172, over 2628002.49 frames. ], batch size: 89, lr: 5.52e-03, grad_scale: 16.0 2024-09-18 08:13:30,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=380398.6666666667, ans=0.0 2024-09-18 08:13:58,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=380492.0, ans=0.125 2024-09-18 08:14:07,905 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-09-18 08:14:08,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=380538.6666666667, ans=0.0 2024-09-18 08:14:10,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=380538.6666666667, ans=0.1 2024-09-18 08:14:21,063 INFO [train.py:1198] (1/2) Epoch 22, batch 150, loss[loss=0.1969, simple_loss=0.2446, pruned_loss=0.05546, ctc_loss=0.1182, cr_loss=0.3691, over 34515.00 frames. ], tot_loss[loss=0.2264, simple_loss=0.2761, pruned_loss=0.06639, ctc_loss=0.1373, cr_loss=0.4128, over 3556162.71 frames. ], batch size: 82, lr: 5.52e-03, grad_scale: 16.0 2024-09-18 08:14:21,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=22.5 2024-09-18 08:14:24,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=380585.3333333333, ans=0.5 2024-09-18 08:14:34,247 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.479e+02 2.848e+02 3.414e+02 5.967e+02, threshold=5.696e+02, percent-clipped=1.0 2024-09-18 08:14:36,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=380632.0, ans=0.0 2024-09-18 08:15:13,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=380725.3333333333, ans=0.125 2024-09-18 08:15:21,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=380725.3333333333, ans=0.2 2024-09-18 08:15:37,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=380772.0, ans=0.125 2024-09-18 08:15:43,296 INFO [train.py:1198] (1/2) Epoch 22, batch 200, loss[loss=0.2493, simple_loss=0.2984, pruned_loss=0.0759, ctc_loss=0.1531, cr_loss=0.4465, over 32057.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2752, pruned_loss=0.06615, ctc_loss=0.1367, cr_loss=0.4108, over 4269511.82 frames. ], batch size: 145, lr: 5.52e-03, grad_scale: 16.0 2024-09-18 08:15:43,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=380818.6666666667, ans=0.125 2024-09-18 08:15:45,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=380818.6666666667, ans=0.025 2024-09-18 08:15:45,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=380818.6666666667, ans=0.125 2024-09-18 08:15:47,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.88 vs. limit=15.0 2024-09-18 08:16:09,467 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.78 vs. limit=22.5 2024-09-18 08:16:21,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.97 vs. limit=12.0 2024-09-18 08:16:22,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=380912.0, ans=0.0 2024-09-18 08:16:26,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.29 vs. limit=22.5 2024-09-18 08:17:10,203 INFO [train.py:1198] (1/2) Epoch 22, batch 250, loss[loss=0.241, simple_loss=0.2885, pruned_loss=0.0731, ctc_loss=0.1483, cr_loss=0.4423, over 34229.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2747, pruned_loss=0.06576, ctc_loss=0.136, cr_loss=0.4102, over 4833045.84 frames. ], batch size: 117, lr: 5.52e-03, grad_scale: 16.0 2024-09-18 08:17:12,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=15.0 2024-09-18 08:17:23,539 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.538e+02 3.008e+02 3.948e+02 6.628e+02, threshold=6.017e+02, percent-clipped=3.0 2024-09-18 08:17:28,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=381098.6666666667, ans=0.0 2024-09-18 08:17:28,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=381098.6666666667, ans=0.125 2024-09-18 08:17:44,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.78 vs. limit=10.0 2024-09-18 08:17:48,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=381145.3333333333, ans=0.0 2024-09-18 08:17:55,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=381145.3333333333, ans=0.0 2024-09-18 08:18:07,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=381192.0, ans=0.0 2024-09-18 08:18:08,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=381192.0, ans=0.125 2024-09-18 08:18:28,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=381238.6666666667, ans=0.2 2024-09-18 08:18:28,555 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.92 vs. limit=15.0 2024-09-18 08:18:32,705 INFO [train.py:1198] (1/2) Epoch 22, batch 300, loss[loss=0.2523, simple_loss=0.3022, pruned_loss=0.07619, ctc_loss=0.1566, cr_loss=0.4653, over 34329.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2742, pruned_loss=0.06543, ctc_loss=0.1353, cr_loss=0.4094, over 5261573.94 frames. ], batch size: 107, lr: 5.52e-03, grad_scale: 16.0 2024-09-18 08:18:33,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.90 vs. limit=15.0 2024-09-18 08:19:00,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=381332.0, ans=0.1 2024-09-18 08:19:10,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=381378.6666666667, ans=0.0 2024-09-18 08:19:34,204 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2024-09-18 08:19:49,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=381472.0, ans=0.1 2024-09-18 08:19:57,077 INFO [train.py:1198] (1/2) Epoch 22, batch 350, loss[loss=0.2013, simple_loss=0.252, pruned_loss=0.05636, ctc_loss=0.1184, cr_loss=0.3574, over 34267.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2746, pruned_loss=0.06546, ctc_loss=0.1354, cr_loss=0.4096, over 5596303.08 frames. ], batch size: 83, lr: 5.51e-03, grad_scale: 16.0 2024-09-18 08:20:08,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=381518.6666666667, ans=0.125 2024-09-18 08:20:10,080 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.024e+02 2.457e+02 2.805e+02 3.639e+02 8.505e+02, threshold=5.609e+02, percent-clipped=2.0 2024-09-18 08:20:13,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=381565.3333333333, ans=0.125 2024-09-18 08:20:36,917 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:20:42,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.32 vs. limit=15.0 2024-09-18 08:20:50,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=381658.6666666667, ans=10.0 2024-09-18 08:20:50,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=381658.6666666667, ans=0.2 2024-09-18 08:21:21,346 INFO [train.py:1198] (1/2) Epoch 22, batch 400, loss[loss=0.2338, simple_loss=0.2835, pruned_loss=0.06916, ctc_loss=0.1429, cr_loss=0.4307, over 34449.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2742, pruned_loss=0.06516, ctc_loss=0.1351, cr_loss=0.4095, over 5863783.00 frames. ], batch size: 95, lr: 5.51e-03, grad_scale: 32.0 2024-09-18 08:21:48,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=381798.6666666667, ans=0.0 2024-09-18 08:21:48,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=381798.6666666667, ans=0.1 2024-09-18 08:22:43,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2024-09-18 08:22:44,008 INFO [train.py:1198] (1/2) Epoch 22, batch 450, loss[loss=0.2323, simple_loss=0.2848, pruned_loss=0.06765, ctc_loss=0.1401, cr_loss=0.4119, over 34691.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.2745, pruned_loss=0.0654, ctc_loss=0.1355, cr_loss=0.41, over 6054376.59 frames. ], batch size: 97, lr: 5.51e-03, grad_scale: 32.0 2024-09-18 08:22:54,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=381985.3333333333, ans=0.125 2024-09-18 08:22:55,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.56 vs. limit=12.0 2024-09-18 08:22:57,260 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.066e+02 2.428e+02 2.857e+02 4.011e+02 8.591e+02, threshold=5.713e+02, percent-clipped=9.0 2024-09-18 08:23:18,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.89 vs. limit=15.0 2024-09-18 08:23:50,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=382172.0, ans=0.125 2024-09-18 08:24:10,641 INFO [train.py:1198] (1/2) Epoch 22, batch 500, loss[loss=0.2558, simple_loss=0.3042, pruned_loss=0.07878, ctc_loss=0.1574, cr_loss=0.4606, over 34421.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2739, pruned_loss=0.06525, ctc_loss=0.1352, cr_loss=0.4092, over 6221241.36 frames. ], batch size: 110, lr: 5.51e-03, grad_scale: 16.0 2024-09-18 08:24:30,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=382265.3333333333, ans=0.125 2024-09-18 08:25:30,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.22 vs. limit=22.5 2024-09-18 08:25:33,308 INFO [train.py:1198] (1/2) Epoch 22, batch 550, loss[loss=0.2317, simple_loss=0.2838, pruned_loss=0.06703, ctc_loss=0.1425, cr_loss=0.4256, over 33811.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.2741, pruned_loss=0.06557, ctc_loss=0.1358, cr_loss=0.4099, over 6330079.73 frames. ], batch size: 122, lr: 5.51e-03, grad_scale: 16.0 2024-09-18 08:25:38,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=382452.0, ans=10.0 2024-09-18 08:25:48,320 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.960e+02 2.700e+02 3.215e+02 4.367e+02 8.268e+02, threshold=6.431e+02, percent-clipped=15.0 2024-09-18 08:26:28,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=382592.0, ans=0.2 2024-09-18 08:26:56,523 INFO [train.py:1198] (1/2) Epoch 22, batch 600, loss[loss=0.2393, simple_loss=0.2892, pruned_loss=0.07134, ctc_loss=0.1464, cr_loss=0.4333, over 34280.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2745, pruned_loss=0.06568, ctc_loss=0.1359, cr_loss=0.4108, over 6432434.57 frames. ], batch size: 117, lr: 5.51e-03, grad_scale: 16.0 2024-09-18 08:27:06,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=382685.3333333333, ans=0.125 2024-09-18 08:27:35,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=382778.6666666667, ans=0.0 2024-09-18 08:28:06,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=382872.0, ans=0.0 2024-09-18 08:28:09,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=382872.0, ans=0.125 2024-09-18 08:28:19,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=382872.0, ans=0.125 2024-09-18 08:28:22,558 INFO [train.py:1198] (1/2) Epoch 22, batch 650, loss[loss=0.2254, simple_loss=0.2762, pruned_loss=0.06539, ctc_loss=0.1364, cr_loss=0.416, over 34514.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2738, pruned_loss=0.06526, ctc_loss=0.1351, cr_loss=0.4094, over 6523491.82 frames. ], batch size: 94, lr: 5.50e-03, grad_scale: 16.0 2024-09-18 08:28:29,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=382918.6666666667, ans=0.1 2024-09-18 08:28:37,336 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.502e+02 2.826e+02 3.921e+02 7.927e+02, threshold=5.653e+02, percent-clipped=4.0 2024-09-18 08:28:41,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=382965.3333333333, ans=0.0 2024-09-18 08:28:45,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=382965.3333333333, ans=0.1 2024-09-18 08:28:57,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=383012.0, ans=0.0 2024-09-18 08:29:39,781 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=15.0 2024-09-18 08:29:45,174 INFO [train.py:1198] (1/2) Epoch 22, batch 700, loss[loss=0.2215, simple_loss=0.2679, pruned_loss=0.06631, ctc_loss=0.1332, cr_loss=0.3948, over 34608.00 frames. ], tot_loss[loss=0.2242, simple_loss=0.2742, pruned_loss=0.06543, ctc_loss=0.1353, cr_loss=0.4097, over 6580261.90 frames. ], batch size: 89, lr: 5.50e-03, grad_scale: 8.0 2024-09-18 08:29:57,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=383152.0, ans=0.2 2024-09-18 08:30:00,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=383198.6666666667, ans=0.1 2024-09-18 08:30:23,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.95 vs. limit=15.0 2024-09-18 08:30:46,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=383292.0, ans=0.125 2024-09-18 08:30:59,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=383338.6666666667, ans=0.2 2024-09-18 08:31:07,510 INFO [train.py:1198] (1/2) Epoch 22, batch 750, loss[loss=0.2261, simple_loss=0.2792, pruned_loss=0.06459, ctc_loss=0.1341, cr_loss=0.4247, over 34433.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2736, pruned_loss=0.06517, ctc_loss=0.1349, cr_loss=0.409, over 6622643.32 frames. ], batch size: 95, lr: 5.50e-03, grad_scale: 8.0 2024-09-18 08:31:11,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=383385.3333333333, ans=0.125 2024-09-18 08:31:23,691 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.407e+02 2.760e+02 3.546e+02 5.684e+02, threshold=5.520e+02, percent-clipped=1.0 2024-09-18 08:31:37,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=383432.0, ans=0.125 2024-09-18 08:31:49,809 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-09-18 08:31:52,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=383478.6666666667, ans=0.0 2024-09-18 08:32:00,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=383525.3333333333, ans=0.0 2024-09-18 08:32:33,897 INFO [train.py:1198] (1/2) Epoch 22, batch 800, loss[loss=0.2005, simple_loss=0.2542, pruned_loss=0.05405, ctc_loss=0.1183, cr_loss=0.3758, over 34439.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.2734, pruned_loss=0.065, ctc_loss=0.1347, cr_loss=0.4087, over 6658485.31 frames. ], batch size: 85, lr: 5.50e-03, grad_scale: 16.0 2024-09-18 08:33:03,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=383665.3333333333, ans=0.0 2024-09-18 08:33:32,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=383758.6666666667, ans=0.0 2024-09-18 08:33:35,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=383758.6666666667, ans=0.0 2024-09-18 08:33:40,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=15.0 2024-09-18 08:33:46,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=383805.3333333333, ans=0.125 2024-09-18 08:33:49,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=383805.3333333333, ans=22.5 2024-09-18 08:33:51,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=383805.3333333333, ans=0.125 2024-09-18 08:33:56,105 INFO [train.py:1198] (1/2) Epoch 22, batch 850, loss[loss=0.2394, simple_loss=0.2898, pruned_loss=0.07143, ctc_loss=0.1424, cr_loss=0.4395, over 34430.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2732, pruned_loss=0.0649, ctc_loss=0.1346, cr_loss=0.4084, over 6690874.05 frames. ], batch size: 103, lr: 5.50e-03, grad_scale: 16.0 2024-09-18 08:34:12,334 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.850e+02 2.373e+02 2.903e+02 3.648e+02 7.416e+02, threshold=5.806e+02, percent-clipped=2.0 2024-09-18 08:34:15,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=383898.6666666667, ans=0.125 2024-09-18 08:34:19,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=383898.6666666667, ans=0.0 2024-09-18 08:34:20,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=383898.6666666667, ans=0.1 2024-09-18 08:35:10,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=384038.6666666667, ans=0.125 2024-09-18 08:35:19,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=384085.3333333333, ans=0.125 2024-09-18 08:35:20,571 INFO [train.py:1198] (1/2) Epoch 22, batch 900, loss[loss=0.2014, simple_loss=0.2516, pruned_loss=0.05565, ctc_loss=0.122, cr_loss=0.3883, over 34447.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2735, pruned_loss=0.06508, ctc_loss=0.1349, cr_loss=0.4091, over 6696759.02 frames. ], batch size: 85, lr: 5.50e-03, grad_scale: 16.0 2024-09-18 08:35:28,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.59 vs. limit=12.0 2024-09-18 08:35:57,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=384178.6666666667, ans=0.125 2024-09-18 08:36:44,309 INFO [train.py:1198] (1/2) Epoch 22, batch 950, loss[loss=0.203, simple_loss=0.2517, pruned_loss=0.0576, ctc_loss=0.1209, cr_loss=0.3738, over 34684.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2737, pruned_loss=0.06529, ctc_loss=0.1352, cr_loss=0.4098, over 6702227.10 frames. ], batch size: 87, lr: 5.49e-03, grad_scale: 16.0 2024-09-18 08:36:51,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=384318.6666666667, ans=0.125 2024-09-18 08:37:00,618 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.169e+02 2.600e+02 3.310e+02 4.426e+02 7.714e+02, threshold=6.620e+02, percent-clipped=5.0 2024-09-18 08:37:16,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=384412.0, ans=0.125 2024-09-18 08:37:34,588 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.93 vs. limit=22.5 2024-09-18 08:37:40,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=384458.6666666667, ans=0.125 2024-09-18 08:37:52,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=384505.3333333333, ans=0.2 2024-09-18 08:37:53,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=384505.3333333333, ans=0.0 2024-09-18 08:38:07,084 INFO [train.py:1198] (1/2) Epoch 22, batch 1000, loss[loss=0.2137, simple_loss=0.2626, pruned_loss=0.06184, ctc_loss=0.1265, cr_loss=0.3982, over 34506.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2746, pruned_loss=0.06564, ctc_loss=0.136, cr_loss=0.4111, over 6696474.34 frames. ], batch size: 90, lr: 5.49e-03, grad_scale: 16.0 2024-09-18 08:38:17,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=384552.0, ans=0.0 2024-09-18 08:38:32,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.91 vs. limit=15.0 2024-09-18 08:39:16,204 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.90 vs. limit=10.0 2024-09-18 08:39:18,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=384738.6666666667, ans=0.035 2024-09-18 08:39:19,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=384738.6666666667, ans=0.125 2024-09-18 08:39:20,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=384738.6666666667, ans=0.025 2024-09-18 08:39:25,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=384738.6666666667, ans=0.0 2024-09-18 08:39:31,908 INFO [train.py:1198] (1/2) Epoch 22, batch 1050, loss[loss=0.2209, simple_loss=0.2791, pruned_loss=0.06116, ctc_loss=0.1256, cr_loss=0.383, over 34549.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2738, pruned_loss=0.06531, ctc_loss=0.1354, cr_loss=0.41, over 6704334.23 frames. ], batch size: 99, lr: 5.49e-03, grad_scale: 16.0 2024-09-18 08:39:41,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=384785.3333333333, ans=0.0 2024-09-18 08:39:48,250 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.834e+02 2.398e+02 2.900e+02 3.592e+02 8.727e+02, threshold=5.800e+02, percent-clipped=2.0 2024-09-18 08:40:23,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=384925.3333333333, ans=0.035 2024-09-18 08:40:28,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=384925.3333333333, ans=15.0 2024-09-18 08:40:54,601 INFO [train.py:1198] (1/2) Epoch 22, batch 1100, loss[loss=0.2224, simple_loss=0.2713, pruned_loss=0.06513, ctc_loss=0.1328, cr_loss=0.4179, over 34392.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2735, pruned_loss=0.06504, ctc_loss=0.135, cr_loss=0.4093, over 6716595.77 frames. ], batch size: 91, lr: 5.49e-03, grad_scale: 16.0 2024-09-18 08:41:06,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=385018.6666666667, ans=0.125 2024-09-18 08:41:24,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=385065.3333333333, ans=0.0 2024-09-18 08:41:24,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=385065.3333333333, ans=0.1 2024-09-18 08:42:17,478 INFO [train.py:1198] (1/2) Epoch 22, batch 1150, loss[loss=0.2234, simple_loss=0.2748, pruned_loss=0.06398, ctc_loss=0.1358, cr_loss=0.4227, over 34367.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.2736, pruned_loss=0.06518, ctc_loss=0.1352, cr_loss=0.4095, over 6715167.91 frames. ], batch size: 91, lr: 5.49e-03, grad_scale: 16.0 2024-09-18 08:42:26,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=385252.0, ans=0.025 2024-09-18 08:42:34,042 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.424e+02 2.996e+02 3.761e+02 7.348e+02, threshold=5.992e+02, percent-clipped=4.0 2024-09-18 08:43:00,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.37 vs. limit=15.0 2024-09-18 08:43:17,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=385392.0, ans=0.0 2024-09-18 08:43:44,473 INFO [train.py:1198] (1/2) Epoch 22, batch 1200, loss[loss=0.2307, simple_loss=0.2862, pruned_loss=0.06503, ctc_loss=0.1368, cr_loss=0.4447, over 34573.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2747, pruned_loss=0.06553, ctc_loss=0.1358, cr_loss=0.4109, over 6709169.14 frames. ], batch size: 99, lr: 5.49e-03, grad_scale: 32.0 2024-09-18 08:43:44,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=385485.3333333333, ans=0.04949747468305833 2024-09-18 08:43:53,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=385485.3333333333, ans=0.125 2024-09-18 08:44:11,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=385532.0, ans=0.0 2024-09-18 08:44:21,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=385578.6666666667, ans=0.125 2024-09-18 08:44:31,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=385578.6666666667, ans=0.0 2024-09-18 08:44:37,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=385625.3333333333, ans=0.125 2024-09-18 08:45:02,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=385672.0, ans=0.1 2024-09-18 08:45:07,444 INFO [train.py:1198] (1/2) Epoch 22, batch 1250, loss[loss=0.2407, simple_loss=0.287, pruned_loss=0.07357, ctc_loss=0.15, cr_loss=0.4315, over 34338.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2752, pruned_loss=0.06579, ctc_loss=0.1362, cr_loss=0.4116, over 6742501.15 frames. ], batch size: 107, lr: 5.48e-03, grad_scale: 32.0 2024-09-18 08:45:23,930 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.990e+02 2.570e+02 2.998e+02 3.830e+02 5.500e+02, threshold=5.997e+02, percent-clipped=0.0 2024-09-18 08:45:29,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=385765.3333333333, ans=0.0 2024-09-18 08:45:31,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=385765.3333333333, ans=0.125 2024-09-18 08:45:32,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=385765.3333333333, ans=0.125 2024-09-18 08:45:34,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=385765.3333333333, ans=0.0 2024-09-18 08:45:55,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=385858.6666666667, ans=0.125 2024-09-18 08:46:31,761 INFO [train.py:1198] (1/2) Epoch 22, batch 1300, loss[loss=0.2345, simple_loss=0.287, pruned_loss=0.06811, ctc_loss=0.1439, cr_loss=0.4224, over 33080.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2747, pruned_loss=0.0658, ctc_loss=0.1361, cr_loss=0.4116, over 6746293.40 frames. ], batch size: 130, lr: 5.48e-03, grad_scale: 32.0 2024-09-18 08:46:32,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=385952.0, ans=0.0 2024-09-18 08:46:50,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=385998.6666666667, ans=0.125 2024-09-18 08:47:05,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=386045.3333333333, ans=0.2 2024-09-18 08:47:17,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=386045.3333333333, ans=0.125 2024-09-18 08:47:35,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=386092.0, ans=0.125 2024-09-18 08:47:48,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=386138.6666666667, ans=0.125 2024-09-18 08:47:56,512 INFO [train.py:1198] (1/2) Epoch 22, batch 1350, loss[loss=0.2156, simple_loss=0.2662, pruned_loss=0.06173, ctc_loss=0.1266, cr_loss=0.4083, over 34524.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2741, pruned_loss=0.06549, ctc_loss=0.1357, cr_loss=0.4108, over 6765699.92 frames. ], batch size: 94, lr: 5.48e-03, grad_scale: 32.0 2024-09-18 08:48:03,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=386185.3333333333, ans=0.125 2024-09-18 08:48:04,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-09-18 08:48:12,684 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 2.511e+02 3.076e+02 3.886e+02 7.594e+02, threshold=6.152e+02, percent-clipped=2.0 2024-09-18 08:48:14,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=386232.0, ans=0.125 2024-09-18 08:48:49,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.88 vs. limit=10.0 2024-09-18 08:48:50,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=386325.3333333333, ans=0.125 2024-09-18 08:49:16,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=386418.6666666667, ans=0.1 2024-09-18 08:49:18,355 INFO [train.py:1198] (1/2) Epoch 22, batch 1400, loss[loss=0.186, simple_loss=0.2367, pruned_loss=0.04982, ctc_loss=0.1071, cr_loss=0.3528, over 34293.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.274, pruned_loss=0.06547, ctc_loss=0.1356, cr_loss=0.4109, over 6777362.89 frames. ], batch size: 80, lr: 5.48e-03, grad_scale: 32.0 2024-09-18 08:49:25,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=386418.6666666667, ans=0.5 2024-09-18 08:49:25,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=386418.6666666667, ans=0.0 2024-09-18 08:49:26,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=386418.6666666667, ans=0.125 2024-09-18 08:49:38,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=386465.3333333333, ans=0.125 2024-09-18 08:49:51,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=386512.0, ans=0.0 2024-09-18 08:49:59,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=386512.0, ans=0.2 2024-09-18 08:50:01,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=386512.0, ans=0.1 2024-09-18 08:50:33,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=386605.3333333333, ans=0.2 2024-09-18 08:50:44,808 INFO [train.py:1198] (1/2) Epoch 22, batch 1450, loss[loss=0.2318, simple_loss=0.2841, pruned_loss=0.0675, ctc_loss=0.1383, cr_loss=0.4182, over 34466.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2745, pruned_loss=0.06556, ctc_loss=0.1357, cr_loss=0.4111, over 6774251.29 frames. ], batch size: 110, lr: 5.48e-03, grad_scale: 32.0 2024-09-18 08:50:50,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=386652.0, ans=0.07 2024-09-18 08:50:53,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=386652.0, ans=0.125 2024-09-18 08:50:53,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=386652.0, ans=0.125 2024-09-18 08:51:01,288 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.535e+02 2.892e+02 3.219e+02 5.417e+02, threshold=5.784e+02, percent-clipped=0.0 2024-09-18 08:51:02,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.16 vs. limit=15.0 2024-09-18 08:51:05,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=386698.6666666667, ans=0.125 2024-09-18 08:51:14,757 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:51:29,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=386745.3333333333, ans=0.0 2024-09-18 08:51:47,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=386792.0, ans=0.125 2024-09-18 08:51:50,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=386838.6666666667, ans=0.035 2024-09-18 08:51:58,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=386838.6666666667, ans=0.125 2024-09-18 08:51:58,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=386838.6666666667, ans=0.0 2024-09-18 08:51:59,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=15.0 2024-09-18 08:52:06,689 INFO [train.py:1198] (1/2) Epoch 22, batch 1500, loss[loss=0.2336, simple_loss=0.2858, pruned_loss=0.0681, ctc_loss=0.141, cr_loss=0.4233, over 34455.00 frames. ], tot_loss[loss=0.225, simple_loss=0.275, pruned_loss=0.06569, ctc_loss=0.1361, cr_loss=0.4119, over 6774759.51 frames. ], batch size: 100, lr: 5.48e-03, grad_scale: 32.0 2024-09-18 08:52:07,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2024-09-18 08:53:13,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=387072.0, ans=0.07 2024-09-18 08:53:15,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=387072.0, ans=0.125 2024-09-18 08:53:28,072 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:53:29,323 INFO [train.py:1198] (1/2) Epoch 22, batch 1550, loss[loss=0.2336, simple_loss=0.2853, pruned_loss=0.0687, ctc_loss=0.1388, cr_loss=0.4195, over 34398.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2752, pruned_loss=0.06603, ctc_loss=0.1366, cr_loss=0.4122, over 6746780.61 frames. ], batch size: 105, lr: 5.48e-03, grad_scale: 32.0 2024-09-18 08:53:45,946 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.602e+02 3.166e+02 4.035e+02 7.619e+02, threshold=6.332e+02, percent-clipped=4.0 2024-09-18 08:53:55,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-09-18 08:54:02,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=387212.0, ans=0.0 2024-09-18 08:54:14,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=387212.0, ans=0.0 2024-09-18 08:54:41,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=387305.3333333333, ans=0.1 2024-09-18 08:54:44,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=387305.3333333333, ans=0.125 2024-09-18 08:54:50,293 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.54 vs. limit=12.0 2024-09-18 08:54:55,759 INFO [train.py:1198] (1/2) Epoch 22, batch 1600, loss[loss=0.2477, simple_loss=0.2962, pruned_loss=0.07481, ctc_loss=0.1524, cr_loss=0.4747, over 34564.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2749, pruned_loss=0.06604, ctc_loss=0.1366, cr_loss=0.4116, over 6725356.78 frames. ], batch size: 99, lr: 5.47e-03, grad_scale: 32.0 2024-09-18 08:55:19,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=387398.6666666667, ans=22.5 2024-09-18 08:55:21,033 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:56:18,531 INFO [train.py:1198] (1/2) Epoch 22, batch 1650, loss[loss=0.2328, simple_loss=0.2867, pruned_loss=0.06682, ctc_loss=0.1409, cr_loss=0.43, over 34400.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2746, pruned_loss=0.06598, ctc_loss=0.1365, cr_loss=0.4119, over 6718253.85 frames. ], batch size: 103, lr: 5.47e-03, grad_scale: 16.0 2024-09-18 08:56:27,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=387585.3333333333, ans=0.0 2024-09-18 08:56:32,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=387585.3333333333, ans=0.125 2024-09-18 08:56:36,633 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.491e+02 3.015e+02 3.770e+02 7.441e+02, threshold=6.030e+02, percent-clipped=2.0 2024-09-18 08:57:07,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.82 vs. limit=15.0 2024-09-18 08:57:16,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=387725.3333333333, ans=0.1 2024-09-18 08:57:23,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=387772.0, ans=0.1 2024-09-18 08:57:37,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=387772.0, ans=0.125 2024-09-18 08:57:42,477 INFO [train.py:1198] (1/2) Epoch 22, batch 1700, loss[loss=0.1948, simple_loss=0.2434, pruned_loss=0.05444, ctc_loss=0.1126, cr_loss=0.3692, over 34316.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.274, pruned_loss=0.06558, ctc_loss=0.1358, cr_loss=0.4105, over 6744389.56 frames. ], batch size: 80, lr: 5.47e-03, grad_scale: 16.0 2024-09-18 08:57:54,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=387818.6666666667, ans=0.125 2024-09-18 08:57:59,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=387865.3333333333, ans=0.1 2024-09-18 08:58:23,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=387912.0, ans=0.125 2024-09-18 08:58:28,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=387912.0, ans=0.125 2024-09-18 08:58:29,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=387912.0, ans=0.05 2024-09-18 08:58:31,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=387912.0, ans=0.125 2024-09-18 08:58:31,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=387912.0, ans=0.0 2024-09-18 08:58:40,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=15.0 2024-09-18 08:58:42,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=387958.6666666667, ans=0.2 2024-09-18 08:59:00,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=388005.3333333333, ans=0.05 2024-09-18 08:59:07,215 INFO [train.py:1198] (1/2) Epoch 22, batch 1750, loss[loss=0.1922, simple_loss=0.2424, pruned_loss=0.05283, ctc_loss=0.1126, cr_loss=0.3476, over 34178.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2735, pruned_loss=0.06539, ctc_loss=0.1354, cr_loss=0.4102, over 6753422.68 frames. ], batch size: 78, lr: 5.47e-03, grad_scale: 16.0 2024-09-18 08:59:09,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=388052.0, ans=0.0 2024-09-18 08:59:24,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=388098.6666666667, ans=0.2 2024-09-18 08:59:25,272 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.407e+02 2.711e+02 3.479e+02 5.711e+02, threshold=5.423e+02, percent-clipped=0.0 2024-09-18 08:59:25,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=388098.6666666667, ans=0.125 2024-09-18 08:59:32,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388098.6666666667, ans=0.1 2024-09-18 08:59:33,982 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 08:59:37,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388098.6666666667, ans=0.1 2024-09-18 08:59:45,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=388145.3333333333, ans=0.125 2024-09-18 08:59:55,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=388192.0, ans=0.125 2024-09-18 09:00:07,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.33 vs. limit=22.5 2024-09-18 09:00:30,000 INFO [train.py:1198] (1/2) Epoch 22, batch 1800, loss[loss=0.2309, simple_loss=0.2811, pruned_loss=0.0682, ctc_loss=0.1384, cr_loss=0.4167, over 34687.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2738, pruned_loss=0.06541, ctc_loss=0.1354, cr_loss=0.4099, over 6756532.74 frames. ], batch size: 97, lr: 5.47e-03, grad_scale: 16.0 2024-09-18 09:00:30,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=388285.3333333333, ans=0.5 2024-09-18 09:00:32,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=388285.3333333333, ans=0.125 2024-09-18 09:00:53,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=388332.0, ans=0.0 2024-09-18 09:01:03,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=388378.6666666667, ans=0.125 2024-09-18 09:01:08,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=388378.6666666667, ans=0.1 2024-09-18 09:01:32,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=15.0 2024-09-18 09:01:35,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=388425.3333333333, ans=0.125 2024-09-18 09:01:43,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=388472.0, ans=0.125 2024-09-18 09:01:46,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=388472.0, ans=0.025 2024-09-18 09:01:53,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=388518.6666666667, ans=0.1 2024-09-18 09:01:54,778 INFO [train.py:1198] (1/2) Epoch 22, batch 1850, loss[loss=0.2455, simple_loss=0.2971, pruned_loss=0.07384, ctc_loss=0.1484, cr_loss=0.4101, over 34491.00 frames. ], tot_loss[loss=0.2242, simple_loss=0.2738, pruned_loss=0.06554, ctc_loss=0.1356, cr_loss=0.4105, over 6765069.06 frames. ], batch size: 100, lr: 5.47e-03, grad_scale: 16.0 2024-09-18 09:02:00,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=388518.6666666667, ans=0.125 2024-09-18 09:02:00,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=388518.6666666667, ans=0.125 2024-09-18 09:02:14,647 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.116e+02 2.681e+02 3.497e+02 4.617e+02 8.050e+02, threshold=6.994e+02, percent-clipped=12.0 2024-09-18 09:02:20,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=388565.3333333333, ans=0.125 2024-09-18 09:02:31,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=388612.0, ans=0.125 2024-09-18 09:02:39,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=388612.0, ans=0.1 2024-09-18 09:03:06,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2024-09-18 09:03:13,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=388705.3333333333, ans=0.125 2024-09-18 09:03:13,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=388705.3333333333, ans=0.125 2024-09-18 09:03:15,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=388705.3333333333, ans=0.0 2024-09-18 09:03:18,432 INFO [train.py:1198] (1/2) Epoch 22, batch 1900, loss[loss=0.2366, simple_loss=0.2895, pruned_loss=0.06875, ctc_loss=0.144, cr_loss=0.4378, over 34369.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2747, pruned_loss=0.06577, ctc_loss=0.136, cr_loss=0.4116, over 6773938.54 frames. ], batch size: 103, lr: 5.46e-03, grad_scale: 16.0 2024-09-18 09:03:40,783 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-18 09:04:41,363 INFO [train.py:1198] (1/2) Epoch 22, batch 1950, loss[loss=0.2101, simple_loss=0.2588, pruned_loss=0.05986, ctc_loss=0.1262, cr_loss=0.4117, over 34349.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2762, pruned_loss=0.06617, ctc_loss=0.1368, cr_loss=0.4143, over 6790847.50 frames. ], batch size: 91, lr: 5.46e-03, grad_scale: 16.0 2024-09-18 09:04:45,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=388985.3333333333, ans=0.025 2024-09-18 09:04:50,175 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:05:01,512 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.045e+02 2.452e+02 2.750e+02 3.573e+02 5.455e+02, threshold=5.499e+02, percent-clipped=0.0 2024-09-18 09:05:26,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.33 vs. limit=22.5 2024-09-18 09:05:44,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=389125.3333333333, ans=0.125 2024-09-18 09:05:49,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=389172.0, ans=0.5 2024-09-18 09:05:51,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=389172.0, ans=0.125 2024-09-18 09:05:56,262 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:06:07,879 INFO [train.py:1198] (1/2) Epoch 22, batch 2000, loss[loss=0.1999, simple_loss=0.2465, pruned_loss=0.05701, ctc_loss=0.1183, cr_loss=0.3894, over 34143.00 frames. ], tot_loss[loss=0.2267, simple_loss=0.2767, pruned_loss=0.06637, ctc_loss=0.1373, cr_loss=0.4152, over 6766501.27 frames. ], batch size: 78, lr: 5.46e-03, grad_scale: 32.0 2024-09-18 09:06:29,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=389265.3333333333, ans=0.0 2024-09-18 09:06:34,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=389265.3333333333, ans=0.1 2024-09-18 09:06:38,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2024-09-18 09:06:51,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=389312.0, ans=0.125 2024-09-18 09:06:56,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.09 vs. limit=10.0 2024-09-18 09:07:04,400 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:07:07,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=389358.6666666667, ans=0.125 2024-09-18 09:07:16,788 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.37 vs. limit=15.0 2024-09-18 09:07:30,341 INFO [train.py:1198] (1/2) Epoch 22, batch 2050, loss[loss=0.2034, simple_loss=0.2518, pruned_loss=0.05767, ctc_loss=0.1232, cr_loss=0.3756, over 34507.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2757, pruned_loss=0.06616, ctc_loss=0.1369, cr_loss=0.4138, over 6757769.91 frames. ], batch size: 82, lr: 5.46e-03, grad_scale: 32.0 2024-09-18 09:07:48,332 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.512e+02 3.054e+02 3.640e+02 7.597e+02, threshold=6.108e+02, percent-clipped=2.0 2024-09-18 09:08:06,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=389545.3333333333, ans=0.2 2024-09-18 09:08:26,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=389592.0, ans=0.125 2024-09-18 09:08:33,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=389592.0, ans=0.125 2024-09-18 09:08:43,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=389638.6666666667, ans=0.1 2024-09-18 09:08:54,738 INFO [train.py:1198] (1/2) Epoch 22, batch 2100, loss[loss=0.2246, simple_loss=0.2791, pruned_loss=0.06407, ctc_loss=0.1294, cr_loss=0.4039, over 34537.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2749, pruned_loss=0.06569, ctc_loss=0.136, cr_loss=0.4118, over 6771266.33 frames. ], batch size: 94, lr: 5.46e-03, grad_scale: 32.0 2024-09-18 09:09:01,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=389685.3333333333, ans=0.125 2024-09-18 09:09:03,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=389685.3333333333, ans=0.5 2024-09-18 09:09:21,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.87 vs. limit=15.0 2024-09-18 09:09:24,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=389732.0, ans=0.0 2024-09-18 09:09:37,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=389778.6666666667, ans=0.0 2024-09-18 09:09:38,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=389778.6666666667, ans=0.0 2024-09-18 09:10:09,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=389872.0, ans=10.0 2024-09-18 09:10:10,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=389872.0, ans=0.0 2024-09-18 09:10:15,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=389872.0, ans=0.125 2024-09-18 09:10:18,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.53 vs. limit=12.0 2024-09-18 09:10:18,756 INFO [train.py:1198] (1/2) Epoch 22, batch 2150, loss[loss=0.2376, simple_loss=0.2809, pruned_loss=0.07341, ctc_loss=0.1478, cr_loss=0.4505, over 34714.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2739, pruned_loss=0.06511, ctc_loss=0.1349, cr_loss=0.4102, over 6790163.41 frames. ], batch size: 92, lr: 5.46e-03, grad_scale: 32.0 2024-09-18 09:10:27,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=389918.6666666667, ans=0.125 2024-09-18 09:10:32,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=389918.6666666667, ans=0.2 2024-09-18 09:10:37,019 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.039e+02 2.441e+02 2.809e+02 3.463e+02 5.868e+02, threshold=5.618e+02, percent-clipped=0.0 2024-09-18 09:10:52,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.02 vs. limit=15.0 2024-09-18 09:10:54,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=18.88 vs. limit=15.0 2024-09-18 09:11:41,522 INFO [train.py:1198] (1/2) Epoch 22, batch 2200, loss[loss=0.2231, simple_loss=0.2805, pruned_loss=0.06212, ctc_loss=0.1297, cr_loss=0.3887, over 34459.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.274, pruned_loss=0.06522, ctc_loss=0.135, cr_loss=0.4103, over 6783952.74 frames. ], batch size: 100, lr: 5.45e-03, grad_scale: 32.0 2024-09-18 09:11:46,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=390152.0, ans=0.0 2024-09-18 09:12:09,937 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:12:34,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=390292.0, ans=0.0 2024-09-18 09:12:46,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=390292.0, ans=0.125 2024-09-18 09:13:05,976 INFO [train.py:1198] (1/2) Epoch 22, batch 2250, loss[loss=0.2292, simple_loss=0.2774, pruned_loss=0.06848, ctc_loss=0.1378, cr_loss=0.4113, over 34397.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.274, pruned_loss=0.06522, ctc_loss=0.1351, cr_loss=0.41, over 6780998.93 frames. ], batch size: 95, lr: 5.45e-03, grad_scale: 32.0 2024-09-18 09:13:16,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=390385.3333333333, ans=0.0 2024-09-18 09:13:18,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.29 vs. limit=15.0 2024-09-18 09:13:24,332 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 2.730e+02 3.521e+02 4.197e+02 7.476e+02, threshold=7.042e+02, percent-clipped=10.0 2024-09-18 09:13:41,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=390478.6666666667, ans=0.0 2024-09-18 09:14:02,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=390525.3333333333, ans=0.125 2024-09-18 09:14:09,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=390525.3333333333, ans=0.125 2024-09-18 09:14:30,595 INFO [train.py:1198] (1/2) Epoch 22, batch 2300, loss[loss=0.1971, simple_loss=0.2499, pruned_loss=0.05368, ctc_loss=0.1142, cr_loss=0.3528, over 34260.00 frames. ], tot_loss[loss=0.2233, simple_loss=0.2732, pruned_loss=0.06508, ctc_loss=0.1346, cr_loss=0.4087, over 6766345.71 frames. ], batch size: 83, lr: 5.45e-03, grad_scale: 32.0 2024-09-18 09:14:37,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=390618.6666666667, ans=0.2 2024-09-18 09:14:37,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=390618.6666666667, ans=0.125 2024-09-18 09:15:05,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=390712.0, ans=0.09899494936611666 2024-09-18 09:15:24,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=390758.6666666667, ans=0.2 2024-09-18 09:15:36,077 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.80 vs. limit=15.0 2024-09-18 09:15:52,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2024-09-18 09:15:53,280 INFO [train.py:1198] (1/2) Epoch 22, batch 2350, loss[loss=0.2262, simple_loss=0.2786, pruned_loss=0.06502, ctc_loss=0.1355, cr_loss=0.4156, over 34689.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2737, pruned_loss=0.06531, ctc_loss=0.1351, cr_loss=0.41, over 6772234.84 frames. ], batch size: 97, lr: 5.45e-03, grad_scale: 32.0 2024-09-18 09:16:00,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.96 vs. limit=15.0 2024-09-18 09:16:06,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=390852.0, ans=0.125 2024-09-18 09:16:11,502 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.028e+02 2.503e+02 3.048e+02 3.896e+02 6.411e+02, threshold=6.095e+02, percent-clipped=0.0 2024-09-18 09:16:13,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=390898.6666666667, ans=0.125 2024-09-18 09:16:13,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=390898.6666666667, ans=0.0 2024-09-18 09:16:54,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=390992.0, ans=0.0 2024-09-18 09:16:58,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=390992.0, ans=0.07 2024-09-18 09:17:01,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.41 vs. limit=6.0 2024-09-18 09:17:19,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=391085.3333333333, ans=0.0 2024-09-18 09:17:20,489 INFO [train.py:1198] (1/2) Epoch 22, batch 2400, loss[loss=0.2175, simple_loss=0.2678, pruned_loss=0.06305, ctc_loss=0.1282, cr_loss=0.3865, over 34590.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2745, pruned_loss=0.06573, ctc_loss=0.1358, cr_loss=0.4119, over 6776398.26 frames. ], batch size: 89, lr: 5.45e-03, grad_scale: 32.0 2024-09-18 09:17:20,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=391085.3333333333, ans=0.0 2024-09-18 09:17:37,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391132.0, ans=0.1 2024-09-18 09:17:38,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=391132.0, ans=0.125 2024-09-18 09:17:42,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.32 vs. limit=15.0 2024-09-18 09:17:44,092 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:17:49,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=391132.0, ans=0.0 2024-09-18 09:17:57,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391178.6666666667, ans=0.1 2024-09-18 09:18:02,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=391178.6666666667, ans=0.0 2024-09-18 09:18:15,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=391225.3333333333, ans=0.125 2024-09-18 09:18:37,779 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.01 vs. limit=15.0 2024-09-18 09:18:43,396 INFO [train.py:1198] (1/2) Epoch 22, batch 2450, loss[loss=0.2326, simple_loss=0.2861, pruned_loss=0.06768, ctc_loss=0.1376, cr_loss=0.4054, over 34432.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2755, pruned_loss=0.06616, ctc_loss=0.1366, cr_loss=0.4128, over 6750776.09 frames. ], batch size: 95, lr: 5.45e-03, grad_scale: 32.0 2024-09-18 09:18:43,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=391318.6666666667, ans=0.125 2024-09-18 09:18:45,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=391318.6666666667, ans=0.0 2024-09-18 09:18:58,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=391365.3333333333, ans=0.125 2024-09-18 09:19:01,393 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.095e+02 2.649e+02 3.082e+02 3.725e+02 6.977e+02, threshold=6.165e+02, percent-clipped=1.0 2024-09-18 09:19:01,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=391365.3333333333, ans=0.125 2024-09-18 09:19:02,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.06 vs. limit=10.0 2024-09-18 09:19:33,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=391458.6666666667, ans=0.1 2024-09-18 09:19:50,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.28 vs. limit=15.0 2024-09-18 09:20:07,737 INFO [train.py:1198] (1/2) Epoch 22, batch 2500, loss[loss=0.2307, simple_loss=0.2842, pruned_loss=0.06632, ctc_loss=0.1365, cr_loss=0.4294, over 34452.00 frames. ], tot_loss[loss=0.2256, simple_loss=0.2752, pruned_loss=0.0661, ctc_loss=0.1366, cr_loss=0.4131, over 6762548.82 frames. ], batch size: 100, lr: 5.44e-03, grad_scale: 32.0 2024-09-18 09:20:33,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.94 vs. limit=15.0 2024-09-18 09:21:03,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=391692.0, ans=0.125 2024-09-18 09:21:08,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=391692.0, ans=0.125 2024-09-18 09:21:08,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=391692.0, ans=0.04949747468305833 2024-09-18 09:21:32,850 INFO [train.py:1198] (1/2) Epoch 22, batch 2550, loss[loss=0.1954, simple_loss=0.2452, pruned_loss=0.05384, ctc_loss=0.1163, cr_loss=0.3657, over 34156.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2749, pruned_loss=0.06574, ctc_loss=0.136, cr_loss=0.4121, over 6764452.45 frames. ], batch size: 78, lr: 5.44e-03, grad_scale: 32.0 2024-09-18 09:21:50,997 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.021e+02 2.386e+02 2.969e+02 3.390e+02 1.135e+03, threshold=5.939e+02, percent-clipped=6.0 2024-09-18 09:21:52,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=391832.0, ans=0.025 2024-09-18 09:22:04,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=391878.6666666667, ans=0.0 2024-09-18 09:22:09,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=391878.6666666667, ans=0.125 2024-09-18 09:22:19,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=391878.6666666667, ans=0.1 2024-09-18 09:22:39,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=391972.0, ans=0.025 2024-09-18 09:23:01,717 INFO [train.py:1198] (1/2) Epoch 22, batch 2600, loss[loss=0.2053, simple_loss=0.2566, pruned_loss=0.05722, ctc_loss=0.1229, cr_loss=0.3742, over 34331.00 frames. ], tot_loss[loss=0.2254, simple_loss=0.2754, pruned_loss=0.06588, ctc_loss=0.1363, cr_loss=0.4127, over 6760266.18 frames. ], batch size: 91, lr: 5.44e-03, grad_scale: 32.0 2024-09-18 09:23:21,825 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2024-09-18 09:23:59,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=392158.6666666667, ans=0.125 2024-09-18 09:24:09,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=392205.3333333333, ans=0.1 2024-09-18 09:24:17,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=392205.3333333333, ans=0.125 2024-09-18 09:24:20,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=392205.3333333333, ans=0.125 2024-09-18 09:24:25,201 INFO [train.py:1198] (1/2) Epoch 22, batch 2650, loss[loss=0.2396, simple_loss=0.2948, pruned_loss=0.06908, ctc_loss=0.1434, cr_loss=0.4417, over 34291.00 frames. ], tot_loss[loss=0.2252, simple_loss=0.2754, pruned_loss=0.06564, ctc_loss=0.1359, cr_loss=0.4125, over 6768434.66 frames. ], batch size: 117, lr: 5.44e-03, grad_scale: 32.0 2024-09-18 09:24:43,161 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.400e+02 2.781e+02 3.655e+02 6.325e+02, threshold=5.563e+02, percent-clipped=1.0 2024-09-18 09:24:50,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=392298.6666666667, ans=0.0 2024-09-18 09:25:49,680 INFO [train.py:1198] (1/2) Epoch 22, batch 2700, loss[loss=0.2318, simple_loss=0.2845, pruned_loss=0.06739, ctc_loss=0.1398, cr_loss=0.4092, over 34600.00 frames. ], tot_loss[loss=0.2253, simple_loss=0.2754, pruned_loss=0.06572, ctc_loss=0.1361, cr_loss=0.4127, over 6763622.26 frames. ], batch size: 102, lr: 5.44e-03, grad_scale: 32.0 2024-09-18 09:26:08,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=392532.0, ans=0.0 2024-09-18 09:26:08,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=392532.0, ans=0.0 2024-09-18 09:26:38,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=392625.3333333333, ans=0.07 2024-09-18 09:26:39,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=392625.3333333333, ans=0.125 2024-09-18 09:26:41,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=392625.3333333333, ans=0.125 2024-09-18 09:26:56,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=392672.0, ans=0.125 2024-09-18 09:27:01,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=392672.0, ans=0.0 2024-09-18 09:27:04,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=392672.0, ans=0.125 2024-09-18 09:27:12,288 INFO [train.py:1198] (1/2) Epoch 22, batch 2750, loss[loss=0.1979, simple_loss=0.2489, pruned_loss=0.05536, ctc_loss=0.1119, cr_loss=0.3461, over 34607.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.2739, pruned_loss=0.0651, ctc_loss=0.135, cr_loss=0.4105, over 6763138.26 frames. ], batch size: 88, lr: 5.44e-03, grad_scale: 32.0 2024-09-18 09:27:29,446 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2024-09-18 09:27:32,055 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.452e+02 2.938e+02 3.558e+02 6.331e+02, threshold=5.876e+02, percent-clipped=2.0 2024-09-18 09:27:37,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=392765.3333333333, ans=0.125 2024-09-18 09:28:05,237 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.319e-02 2024-09-18 09:28:22,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=392905.3333333333, ans=0.0 2024-09-18 09:28:39,197 INFO [train.py:1198] (1/2) Epoch 22, batch 2800, loss[loss=0.2675, simple_loss=0.3081, pruned_loss=0.08679, ctc_loss=0.1757, cr_loss=0.4539, over 23133.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.2743, pruned_loss=0.06544, ctc_loss=0.1356, cr_loss=0.411, over 6740845.96 frames. ], batch size: 244, lr: 5.43e-03, grad_scale: 32.0 2024-09-18 09:28:46,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=392952.0, ans=0.025 2024-09-18 09:28:51,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=392952.0, ans=0.0 2024-09-18 09:29:01,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=392998.6666666667, ans=0.05 2024-09-18 09:29:02,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=392998.6666666667, ans=0.2 2024-09-18 09:29:09,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=392998.6666666667, ans=0.125 2024-09-18 09:29:44,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.62 vs. limit=10.0 2024-09-18 09:30:01,528 INFO [train.py:1198] (1/2) Epoch 22, batch 2850, loss[loss=0.2145, simple_loss=0.2625, pruned_loss=0.06272, ctc_loss=0.1286, cr_loss=0.3862, over 34490.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.2748, pruned_loss=0.06579, ctc_loss=0.1362, cr_loss=0.4123, over 6726494.46 frames. ], batch size: 90, lr: 5.43e-03, grad_scale: 32.0 2024-09-18 09:30:21,344 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.084e+02 2.524e+02 2.925e+02 3.620e+02 4.885e+02, threshold=5.850e+02, percent-clipped=0.0 2024-09-18 09:30:26,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=393232.0, ans=0.0 2024-09-18 09:30:46,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=393278.6666666667, ans=0.95 2024-09-18 09:30:52,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=393325.3333333333, ans=0.125 2024-09-18 09:31:04,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.95 vs. limit=15.0 2024-09-18 09:31:10,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=393372.0, ans=0.2 2024-09-18 09:31:20,716 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:31:25,334 INFO [train.py:1198] (1/2) Epoch 22, batch 2900, loss[loss=0.217, simple_loss=0.2709, pruned_loss=0.06124, ctc_loss=0.1261, cr_loss=0.3828, over 34519.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2761, pruned_loss=0.06619, ctc_loss=0.1368, cr_loss=0.4138, over 6756357.18 frames. ], batch size: 94, lr: 5.43e-03, grad_scale: 32.0 2024-09-18 09:31:27,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=393418.6666666667, ans=0.0 2024-09-18 09:32:32,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=393605.3333333333, ans=0.125 2024-09-18 09:32:45,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=393605.3333333333, ans=0.0 2024-09-18 09:32:49,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=393652.0, ans=0.1 2024-09-18 09:32:50,446 INFO [train.py:1198] (1/2) Epoch 22, batch 2950, loss[loss=0.2212, simple_loss=0.2737, pruned_loss=0.06283, ctc_loss=0.1317, cr_loss=0.4175, over 34612.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2746, pruned_loss=0.0655, ctc_loss=0.1356, cr_loss=0.411, over 6750926.74 frames. ], batch size: 88, lr: 5.43e-03, grad_scale: 32.0 2024-09-18 09:33:05,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=393698.6666666667, ans=0.2 2024-09-18 09:33:07,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=393698.6666666667, ans=0.2 2024-09-18 09:33:10,061 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.384e+02 2.808e+02 3.625e+02 6.827e+02, threshold=5.615e+02, percent-clipped=3.0 2024-09-18 09:33:10,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=393698.6666666667, ans=0.125 2024-09-18 09:33:40,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=393792.0, ans=0.0 2024-09-18 09:33:55,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=393838.6666666667, ans=0.0 2024-09-18 09:33:55,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=393838.6666666667, ans=0.1 2024-09-18 09:34:07,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=393838.6666666667, ans=0.125 2024-09-18 09:34:07,972 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.81 vs. limit=22.5 2024-09-18 09:34:13,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2024-09-18 09:34:13,658 INFO [train.py:1198] (1/2) Epoch 22, batch 3000, loss[loss=0.2094, simple_loss=0.2643, pruned_loss=0.05811, ctc_loss=0.1192, cr_loss=0.3624, over 34556.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2742, pruned_loss=0.06511, ctc_loss=0.1349, cr_loss=0.4098, over 6750187.60 frames. ], batch size: 94, lr: 5.43e-03, grad_scale: 32.0 2024-09-18 09:34:13,658 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 09:34:30,548 INFO [train.py:1230] (1/2) Epoch 22, validation: loss=0.1476, simple_loss=0.2445, pruned_loss=0.02132, ctc_loss=0.04027, cr_loss=1.806e-14, over 944034.00 frames. 2024-09-18 09:34:30,549 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 09:34:39,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=393885.3333333333, ans=0.0 2024-09-18 09:34:44,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.52 vs. limit=15.0 2024-09-18 09:34:54,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=393932.0, ans=0.0 2024-09-18 09:35:07,244 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:35:09,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.09 vs. limit=15.0 2024-09-18 09:35:21,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=394025.3333333333, ans=0.05 2024-09-18 09:35:28,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=394025.3333333333, ans=0.0 2024-09-18 09:35:34,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=394025.3333333333, ans=0.2 2024-09-18 09:35:49,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=394072.0, ans=0.0 2024-09-18 09:35:54,238 INFO [train.py:1198] (1/2) Epoch 22, batch 3050, loss[loss=0.2129, simple_loss=0.2623, pruned_loss=0.06159, ctc_loss=0.1257, cr_loss=0.3823, over 34590.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2749, pruned_loss=0.06548, ctc_loss=0.1355, cr_loss=0.4109, over 6742538.64 frames. ], batch size: 89, lr: 5.43e-03, grad_scale: 32.0 2024-09-18 09:36:07,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=394118.6666666667, ans=0.125 2024-09-18 09:36:13,686 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.425e+02 2.695e+02 3.402e+02 6.877e+02, threshold=5.390e+02, percent-clipped=4.0 2024-09-18 09:36:19,261 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.31 vs. limit=15.0 2024-09-18 09:36:24,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.34 vs. limit=6.0 2024-09-18 09:36:41,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=394258.6666666667, ans=0.0 2024-09-18 09:36:47,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=394258.6666666667, ans=0.09899494936611666 2024-09-18 09:36:49,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=394258.6666666667, ans=0.1 2024-09-18 09:37:08,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-09-18 09:37:16,738 INFO [train.py:1198] (1/2) Epoch 22, batch 3100, loss[loss=0.241, simple_loss=0.2897, pruned_loss=0.07248, ctc_loss=0.149, cr_loss=0.4394, over 34183.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2747, pruned_loss=0.0655, ctc_loss=0.1356, cr_loss=0.4104, over 6741889.56 frames. ], batch size: 117, lr: 5.42e-03, grad_scale: 32.0 2024-09-18 09:37:16,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=394352.0, ans=0.5 2024-09-18 09:37:21,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=15.0 2024-09-18 09:37:26,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=394352.0, ans=0.1 2024-09-18 09:37:26,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=394352.0, ans=0.0 2024-09-18 09:37:33,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=394398.6666666667, ans=0.1 2024-09-18 09:38:02,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=394445.3333333333, ans=0.1 2024-09-18 09:38:08,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=394492.0, ans=0.125 2024-09-18 09:38:25,455 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:38:37,890 INFO [train.py:1198] (1/2) Epoch 22, batch 3150, loss[loss=0.2382, simple_loss=0.2876, pruned_loss=0.07034, ctc_loss=0.1504, cr_loss=0.4511, over 33914.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2747, pruned_loss=0.06558, ctc_loss=0.1358, cr_loss=0.4115, over 6747355.67 frames. ], batch size: 122, lr: 5.42e-03, grad_scale: 16.0 2024-09-18 09:38:41,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=394585.3333333333, ans=0.025 2024-09-18 09:38:43,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=394585.3333333333, ans=0.0 2024-09-18 09:38:52,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=394632.0, ans=0.125 2024-09-18 09:38:58,900 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.453e+02 2.897e+02 3.664e+02 8.039e+02, threshold=5.793e+02, percent-clipped=8.0 2024-09-18 09:39:39,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=394725.3333333333, ans=0.0 2024-09-18 09:39:57,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=394818.6666666667, ans=0.2 2024-09-18 09:39:59,181 INFO [train.py:1198] (1/2) Epoch 22, batch 3200, loss[loss=0.2228, simple_loss=0.2749, pruned_loss=0.06431, ctc_loss=0.1322, cr_loss=0.3911, over 34531.00 frames. ], tot_loss[loss=0.2241, simple_loss=0.274, pruned_loss=0.06531, ctc_loss=0.1353, cr_loss=0.4102, over 6760960.79 frames. ], batch size: 94, lr: 5.42e-03, grad_scale: 32.0 2024-09-18 09:40:07,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=394818.6666666667, ans=0.125 2024-09-18 09:40:07,666 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:40:22,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=394865.3333333333, ans=0.2 2024-09-18 09:40:35,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=394912.0, ans=0.1 2024-09-18 09:40:45,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=394912.0, ans=0.125 2024-09-18 09:40:46,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=394912.0, ans=0.0 2024-09-18 09:40:50,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=394958.6666666667, ans=0.125 2024-09-18 09:41:08,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.29 vs. limit=15.0 2024-09-18 09:41:17,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=395005.3333333333, ans=0.125 2024-09-18 09:41:17,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=395005.3333333333, ans=0.02 2024-09-18 09:41:22,109 INFO [train.py:1198] (1/2) Epoch 22, batch 3250, loss[loss=0.2488, simple_loss=0.2968, pruned_loss=0.07567, ctc_loss=0.1539, cr_loss=0.4647, over 34662.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2747, pruned_loss=0.06567, ctc_loss=0.1359, cr_loss=0.4112, over 6769907.81 frames. ], batch size: 98, lr: 5.42e-03, grad_scale: 32.0 2024-09-18 09:41:36,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=395098.6666666667, ans=0.0 2024-09-18 09:41:38,542 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:41:42,881 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.496e+02 3.070e+02 3.613e+02 6.799e+02, threshold=6.141e+02, percent-clipped=3.0 2024-09-18 09:42:12,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=395192.0, ans=0.0 2024-09-18 09:42:16,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=395192.0, ans=0.125 2024-09-18 09:42:20,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=395192.0, ans=0.0 2024-09-18 09:42:42,324 INFO [train.py:1198] (1/2) Epoch 22, batch 3300, loss[loss=0.2223, simple_loss=0.2789, pruned_loss=0.06167, ctc_loss=0.1339, cr_loss=0.3918, over 33128.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.2734, pruned_loss=0.06506, ctc_loss=0.1348, cr_loss=0.4089, over 6767538.93 frames. ], batch size: 130, lr: 5.42e-03, grad_scale: 32.0 2024-09-18 09:43:03,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=395332.0, ans=0.125 2024-09-18 09:43:24,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=395378.6666666667, ans=0.125 2024-09-18 09:43:46,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=395472.0, ans=0.1 2024-09-18 09:43:48,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=395472.0, ans=0.2 2024-09-18 09:44:04,485 INFO [train.py:1198] (1/2) Epoch 22, batch 3350, loss[loss=0.2466, simple_loss=0.2994, pruned_loss=0.07342, ctc_loss=0.1498, cr_loss=0.4263, over 33774.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2745, pruned_loss=0.06566, ctc_loss=0.136, cr_loss=0.4114, over 6742464.51 frames. ], batch size: 122, lr: 5.42e-03, grad_scale: 32.0 2024-09-18 09:44:08,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=395518.6666666667, ans=0.05 2024-09-18 09:44:25,598 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.065e+02 2.400e+02 3.005e+02 3.712e+02 7.380e+02, threshold=6.010e+02, percent-clipped=2.0 2024-09-18 09:44:28,068 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=9.94 vs. limit=15.0 2024-09-18 09:44:38,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=395612.0, ans=0.125 2024-09-18 09:44:48,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=395612.0, ans=0.125 2024-09-18 09:44:59,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=395658.6666666667, ans=0.2 2024-09-18 09:45:08,143 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 09:45:08,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.59 vs. limit=15.0 2024-09-18 09:45:25,695 INFO [train.py:1198] (1/2) Epoch 22, batch 3400, loss[loss=0.1922, simple_loss=0.239, pruned_loss=0.05402, ctc_loss=0.1152, cr_loss=0.3605, over 34098.00 frames. ], tot_loss[loss=0.2247, simple_loss=0.2745, pruned_loss=0.06568, ctc_loss=0.136, cr_loss=0.4111, over 6733040.26 frames. ], batch size: 78, lr: 5.42e-03, grad_scale: 32.0 2024-09-18 09:45:27,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=395752.0, ans=0.025 2024-09-18 09:45:32,722 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=22.5 2024-09-18 09:45:35,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-09-18 09:45:59,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-09-18 09:46:02,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=395845.3333333333, ans=0.2 2024-09-18 09:46:15,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=395892.0, ans=0.0 2024-09-18 09:46:26,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=395892.0, ans=0.09899494936611666 2024-09-18 09:46:33,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=395938.6666666667, ans=0.125 2024-09-18 09:46:41,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=395938.6666666667, ans=0.125 2024-09-18 09:46:47,526 INFO [train.py:1198] (1/2) Epoch 22, batch 3450, loss[loss=0.2272, simple_loss=0.2844, pruned_loss=0.06399, ctc_loss=0.1308, cr_loss=0.3967, over 33098.00 frames. ], tot_loss[loss=0.2251, simple_loss=0.275, pruned_loss=0.06577, ctc_loss=0.1362, cr_loss=0.4117, over 6745221.83 frames. ], batch size: 130, lr: 5.41e-03, grad_scale: 16.0 2024-09-18 09:46:54,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=395985.3333333333, ans=0.0 2024-09-18 09:47:00,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=395985.3333333333, ans=0.125 2024-09-18 09:47:07,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=396032.0, ans=0.125 2024-09-18 09:47:10,063 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.024e+02 2.532e+02 3.050e+02 4.037e+02 5.692e+02, threshold=6.099e+02, percent-clipped=0.0 2024-09-18 09:47:13,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=396032.0, ans=0.2 2024-09-18 09:47:46,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=396125.3333333333, ans=0.0 2024-09-18 09:47:48,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=396125.3333333333, ans=0.0 2024-09-18 09:48:09,320 INFO [train.py:1198] (1/2) Epoch 22, batch 3500, loss[loss=0.1939, simple_loss=0.2481, pruned_loss=0.05207, ctc_loss=0.1108, cr_loss=0.3376, over 34491.00 frames. ], tot_loss[loss=0.2245, simple_loss=0.2744, pruned_loss=0.06556, ctc_loss=0.1357, cr_loss=0.4103, over 6747560.66 frames. ], batch size: 85, lr: 5.41e-03, grad_scale: 16.0 2024-09-18 09:48:19,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=396218.6666666667, ans=0.1 2024-09-18 09:48:48,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=396312.0, ans=0.1 2024-09-18 09:49:07,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.41 vs. limit=15.0 2024-09-18 09:49:23,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=396405.3333333333, ans=0.125 2024-09-18 09:49:29,348 INFO [train.py:1198] (1/2) Epoch 22, batch 3550, loss[loss=0.2388, simple_loss=0.2924, pruned_loss=0.06977, ctc_loss=0.1421, cr_loss=0.4322, over 34391.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2743, pruned_loss=0.06537, ctc_loss=0.1352, cr_loss=0.4098, over 6756696.63 frames. ], batch size: 103, lr: 5.41e-03, grad_scale: 16.0 2024-09-18 09:49:44,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=396498.6666666667, ans=0.125 2024-09-18 09:49:52,711 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 2.463e+02 2.859e+02 3.685e+02 6.925e+02, threshold=5.719e+02, percent-clipped=1.0 2024-09-18 09:50:10,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=396545.3333333333, ans=0.125 2024-09-18 09:50:51,024 INFO [train.py:1198] (1/2) Epoch 22, batch 3600, loss[loss=0.2094, simple_loss=0.26, pruned_loss=0.05934, ctc_loss=0.1223, cr_loss=0.3917, over 34518.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2749, pruned_loss=0.06565, ctc_loss=0.1357, cr_loss=0.4108, over 6766587.99 frames. ], batch size: 90, lr: 5.41e-03, grad_scale: 32.0 2024-09-18 09:50:56,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=396685.3333333333, ans=0.125 2024-09-18 09:51:04,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2024-09-18 09:51:05,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=396732.0, ans=0.125 2024-09-18 09:51:05,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=396732.0, ans=0.025 2024-09-18 09:51:07,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=396732.0, ans=0.09899494936611666 2024-09-18 09:51:20,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=396732.0, ans=0.0 2024-09-18 09:51:55,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=396872.0, ans=0.2 2024-09-18 09:51:58,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=396872.0, ans=0.0 2024-09-18 09:52:05,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=396872.0, ans=0.1 2024-09-18 09:52:09,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=396872.0, ans=0.025 2024-09-18 09:52:10,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2024-09-18 09:52:12,871 INFO [train.py:1198] (1/2) Epoch 22, batch 3650, loss[loss=0.2267, simple_loss=0.2786, pruned_loss=0.06564, ctc_loss=0.1364, cr_loss=0.4066, over 34466.00 frames. ], tot_loss[loss=0.2242, simple_loss=0.2742, pruned_loss=0.06537, ctc_loss=0.1353, cr_loss=0.4099, over 6769542.20 frames. ], batch size: 110, lr: 5.41e-03, grad_scale: 16.0 2024-09-18 09:52:25,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=396918.6666666667, ans=0.125 2024-09-18 09:52:36,815 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.016e+02 2.494e+02 3.154e+02 4.472e+02 1.004e+03, threshold=6.309e+02, percent-clipped=13.0 2024-09-18 09:52:45,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=397012.0, ans=0.1 2024-09-18 09:53:13,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=397058.6666666667, ans=0.125 2024-09-18 09:53:15,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=397105.3333333333, ans=0.0 2024-09-18 09:53:33,300 INFO [train.py:1198] (1/2) Epoch 22, batch 3700, loss[loss=0.2272, simple_loss=0.282, pruned_loss=0.06393, ctc_loss=0.1373, cr_loss=0.4245, over 34631.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2742, pruned_loss=0.06519, ctc_loss=0.135, cr_loss=0.4098, over 6784761.65 frames. ], batch size: 102, lr: 5.41e-03, grad_scale: 16.0 2024-09-18 09:53:51,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=397198.6666666667, ans=0.0 2024-09-18 09:54:19,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=397245.3333333333, ans=0.1 2024-09-18 09:54:25,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=397292.0, ans=0.95 2024-09-18 09:54:37,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=12.0 2024-09-18 09:54:40,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=397338.6666666667, ans=0.125 2024-09-18 09:54:54,863 INFO [train.py:1198] (1/2) Epoch 22, batch 3750, loss[loss=0.2314, simple_loss=0.2818, pruned_loss=0.06785, ctc_loss=0.1413, cr_loss=0.426, over 34301.00 frames. ], tot_loss[loss=0.2271, simple_loss=0.2774, pruned_loss=0.0664, ctc_loss=0.1373, cr_loss=0.415, over 6785374.75 frames. ], batch size: 113, lr: 5.40e-03, grad_scale: 16.0 2024-09-18 09:55:12,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=397432.0, ans=0.0 2024-09-18 09:55:18,833 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.091e+02 2.320e+02 2.521e+02 2.848e+02 4.458e+02, threshold=5.043e+02, percent-clipped=0.0 2024-09-18 09:55:27,606 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.16 vs. limit=15.0 2024-09-18 09:55:32,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=397478.6666666667, ans=0.2 2024-09-18 09:56:00,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=397572.0, ans=0.0 2024-09-18 09:56:16,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.33 vs. limit=15.0 2024-09-18 09:56:16,653 INFO [train.py:1198] (1/2) Epoch 22, batch 3800, loss[loss=0.2495, simple_loss=0.2869, pruned_loss=0.08054, ctc_loss=0.1625, cr_loss=0.4609, over 29756.00 frames. ], tot_loss[loss=0.2304, simple_loss=0.28, pruned_loss=0.068, ctc_loss=0.1403, cr_loss=0.42, over 6675036.10 frames. ], batch size: 175, lr: 5.40e-03, grad_scale: 16.0 2024-09-18 09:56:33,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=397665.3333333333, ans=0.05 2024-09-18 09:56:40,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2024-09-18 09:57:10,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=12.75 vs. limit=15.0 2024-09-18 09:57:16,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=397758.6666666667, ans=10.0 2024-09-18 09:57:40,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=11.15 vs. limit=12.0 2024-09-18 09:57:40,897 INFO [train.py:1198] (1/2) Epoch 22, batch 3850, loss[loss=0.2562, simple_loss=0.2949, pruned_loss=0.08233, ctc_loss=0.1737, cr_loss=0.4535, over 23279.00 frames. ], tot_loss[loss=0.2349, simple_loss=0.2827, pruned_loss=0.07053, ctc_loss=0.1455, cr_loss=0.424, over 6247194.94 frames. ], batch size: 245, lr: 5.40e-03, grad_scale: 16.0 2024-09-18 09:57:41,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=397852.0, ans=0.1 2024-09-18 09:57:42,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=397852.0, ans=0.125 2024-09-18 09:57:44,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=397852.0, ans=0.125 2024-09-18 09:57:44,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=397852.0, ans=0.0 2024-09-18 09:58:05,786 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.270e+02 2.680e+02 2.909e+02 3.281e+02 7.479e+02, threshold=5.818e+02, percent-clipped=1.0 2024-09-18 09:58:07,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=397898.6666666667, ans=0.2 2024-09-18 09:59:17,383 INFO [train.py:1198] (1/2) Epoch 23, batch 0, loss[loss=0.195, simple_loss=0.247, pruned_loss=0.05268, ctc_loss=0.1136, cr_loss=0.3702, over 34496.00 frames. ], tot_loss[loss=0.195, simple_loss=0.247, pruned_loss=0.05268, ctc_loss=0.1136, cr_loss=0.3702, over 34496.00 frames. ], batch size: 85, lr: 5.28e-03, grad_scale: 32.0 2024-09-18 09:59:17,384 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 09:59:34,174 INFO [train.py:1230] (1/2) Epoch 23, validation: loss=0.1487, simple_loss=0.2462, pruned_loss=0.02161, ctc_loss=0.04021, cr_loss=1.888e-14, over 944034.00 frames. 2024-09-18 09:59:34,174 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 09:59:36,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=397973.3333333333, ans=0.125 2024-09-18 10:00:31,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=398113.3333333333, ans=0.0 2024-09-18 10:00:39,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=398113.3333333333, ans=0.2 2024-09-18 10:00:46,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=398160.0, ans=0.125 2024-09-18 10:01:00,887 INFO [train.py:1198] (1/2) Epoch 23, batch 50, loss[loss=0.2026, simple_loss=0.2528, pruned_loss=0.05664, ctc_loss=0.1189, cr_loss=0.3833, over 34485.00 frames. ], tot_loss[loss=0.2257, simple_loss=0.2757, pruned_loss=0.06591, ctc_loss=0.1364, cr_loss=0.4135, over 1479905.54 frames. ], batch size: 82, lr: 5.28e-03, grad_scale: 32.0 2024-09-18 10:01:15,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=398253.3333333333, ans=0.2 2024-09-18 10:02:05,029 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.969e+02 2.427e+02 2.778e+02 3.549e+02 6.348e+02, threshold=5.555e+02, percent-clipped=2.0 2024-09-18 10:02:23,105 INFO [train.py:1198] (1/2) Epoch 23, batch 100, loss[loss=0.2101, simple_loss=0.2619, pruned_loss=0.05898, ctc_loss=0.1243, cr_loss=0.3845, over 34582.00 frames. ], tot_loss[loss=0.2273, simple_loss=0.2771, pruned_loss=0.06664, ctc_loss=0.1377, cr_loss=0.4164, over 2627700.67 frames. ], batch size: 89, lr: 5.28e-03, grad_scale: 32.0 2024-09-18 10:02:40,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.39 vs. limit=15.0 2024-09-18 10:03:08,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=398533.3333333333, ans=0.025 2024-09-18 10:03:14,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=398580.0, ans=0.125 2024-09-18 10:03:45,304 INFO [train.py:1198] (1/2) Epoch 23, batch 150, loss[loss=0.1938, simple_loss=0.2465, pruned_loss=0.05217, ctc_loss=0.112, cr_loss=0.3581, over 34506.00 frames. ], tot_loss[loss=0.2246, simple_loss=0.2749, pruned_loss=0.06538, ctc_loss=0.1355, cr_loss=0.4125, over 3555893.43 frames. ], batch size: 82, lr: 5.27e-03, grad_scale: 32.0 2024-09-18 10:03:45,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=398673.3333333333, ans=0.0 2024-09-18 10:04:02,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=398673.3333333333, ans=0.2 2024-09-18 10:04:11,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=398720.0, ans=0.125 2024-09-18 10:04:29,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.10 vs. limit=10.0 2024-09-18 10:04:39,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=398813.3333333333, ans=0.2 2024-09-18 10:04:40,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.95 vs. limit=15.0 2024-09-18 10:04:41,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=398813.3333333333, ans=22.5 2024-09-18 10:04:50,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=398813.3333333333, ans=0.1 2024-09-18 10:04:53,394 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.891e+02 2.441e+02 3.047e+02 3.705e+02 5.466e+02, threshold=6.095e+02, percent-clipped=0.0 2024-09-18 10:05:02,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=398860.0, ans=0.95 2024-09-18 10:05:05,330 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:05:11,681 INFO [train.py:1198] (1/2) Epoch 23, batch 200, loss[loss=0.2377, simple_loss=0.2859, pruned_loss=0.07109, ctc_loss=0.1495, cr_loss=0.4361, over 31865.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2742, pruned_loss=0.06518, ctc_loss=0.1351, cr_loss=0.4117, over 4271223.59 frames. ], batch size: 145, lr: 5.27e-03, grad_scale: 32.0 2024-09-18 10:05:46,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=399000.0, ans=0.0 2024-09-18 10:05:54,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=399000.0, ans=0.2 2024-09-18 10:05:56,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=399000.0, ans=0.125 2024-09-18 10:06:12,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=399046.6666666667, ans=0.125 2024-09-18 10:06:31,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=399093.3333333333, ans=0.2 2024-09-18 10:06:32,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=399140.0, ans=0.125 2024-09-18 10:06:34,075 INFO [train.py:1198] (1/2) Epoch 23, batch 250, loss[loss=0.254, simple_loss=0.2959, pruned_loss=0.07989, ctc_loss=0.1629, cr_loss=0.4925, over 34233.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2743, pruned_loss=0.06516, ctc_loss=0.1351, cr_loss=0.4121, over 4832262.18 frames. ], batch size: 117, lr: 5.27e-03, grad_scale: 32.0 2024-09-18 10:06:48,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=6.0 2024-09-18 10:06:49,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=399186.6666666667, ans=0.95 2024-09-18 10:06:51,023 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:07:33,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=399280.0, ans=0.07 2024-09-18 10:07:37,895 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.502e+02 2.867e+02 3.455e+02 6.252e+02, threshold=5.734e+02, percent-clipped=2.0 2024-09-18 10:07:38,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=399326.6666666667, ans=0.125 2024-09-18 10:07:39,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.12 vs. limit=15.0 2024-09-18 10:07:48,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.40 vs. limit=10.0 2024-09-18 10:08:00,468 INFO [train.py:1198] (1/2) Epoch 23, batch 300, loss[loss=0.2491, simple_loss=0.296, pruned_loss=0.07602, ctc_loss=0.1559, cr_loss=0.4729, over 34335.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.2736, pruned_loss=0.06489, ctc_loss=0.1346, cr_loss=0.411, over 5261071.63 frames. ], batch size: 107, lr: 5.27e-03, grad_scale: 32.0 2024-09-18 10:08:06,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.25 vs. limit=15.0 2024-09-18 10:08:19,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=399420.0, ans=0.07 2024-09-18 10:08:27,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=399420.0, ans=0.1 2024-09-18 10:08:27,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=399420.0, ans=0.2 2024-09-18 10:08:57,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=399513.3333333333, ans=0.0 2024-09-18 10:09:15,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=399560.0, ans=0.07 2024-09-18 10:09:23,207 INFO [train.py:1198] (1/2) Epoch 23, batch 350, loss[loss=0.1909, simple_loss=0.2434, pruned_loss=0.05143, ctc_loss=0.1081, cr_loss=0.3493, over 34300.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2738, pruned_loss=0.06467, ctc_loss=0.1343, cr_loss=0.4102, over 5598312.78 frames. ], batch size: 83, lr: 5.27e-03, grad_scale: 32.0 2024-09-18 10:09:28,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=399606.6666666667, ans=0.125 2024-09-18 10:09:53,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=399653.3333333333, ans=0.125 2024-09-18 10:09:58,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=399700.0, ans=0.2 2024-09-18 10:09:58,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399700.0, ans=0.1 2024-09-18 10:10:01,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=399700.0, ans=0.125 2024-09-18 10:10:01,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=399700.0, ans=0.0 2024-09-18 10:10:09,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=399700.0, ans=0.2 2024-09-18 10:10:09,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=399700.0, ans=0.025 2024-09-18 10:10:11,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=399746.6666666667, ans=0.2 2024-09-18 10:10:17,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.65 vs. limit=15.0 2024-09-18 10:10:22,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.45 vs. limit=6.0 2024-09-18 10:10:26,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=399746.6666666667, ans=0.125 2024-09-18 10:10:27,602 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.444e+02 2.918e+02 3.757e+02 6.538e+02, threshold=5.835e+02, percent-clipped=6.0 2024-09-18 10:10:27,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=399793.3333333333, ans=0.0 2024-09-18 10:10:31,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=399793.3333333333, ans=0.1 2024-09-18 10:10:39,556 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:10:45,801 INFO [train.py:1198] (1/2) Epoch 23, batch 400, loss[loss=0.2248, simple_loss=0.275, pruned_loss=0.06546, ctc_loss=0.136, cr_loss=0.4128, over 34428.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.273, pruned_loss=0.06428, ctc_loss=0.1336, cr_loss=0.4087, over 5865836.40 frames. ], batch size: 95, lr: 5.27e-03, grad_scale: 32.0 2024-09-18 10:10:47,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=399840.0, ans=0.0 2024-09-18 10:11:41,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.67 vs. limit=6.0 2024-09-18 10:12:03,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=400026.6666666667, ans=0.0 2024-09-18 10:12:12,285 INFO [train.py:1198] (1/2) Epoch 23, batch 450, loss[loss=0.2315, simple_loss=0.2829, pruned_loss=0.0672, ctc_loss=0.142, cr_loss=0.4333, over 34697.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2736, pruned_loss=0.06468, ctc_loss=0.1342, cr_loss=0.41, over 6056207.61 frames. ], batch size: 97, lr: 5.27e-03, grad_scale: 32.0 2024-09-18 10:12:13,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.81 vs. limit=15.0 2024-09-18 10:12:18,359 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2024-09-18 10:12:20,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=400073.3333333333, ans=0.2 2024-09-18 10:12:24,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=400073.3333333333, ans=0.025 2024-09-18 10:12:42,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=400120.0, ans=0.2 2024-09-18 10:12:42,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=400120.0, ans=0.2 2024-09-18 10:13:17,097 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.949e+02 2.402e+02 2.812e+02 3.576e+02 7.134e+02, threshold=5.624e+02, percent-clipped=2.0 2024-09-18 10:13:35,086 INFO [train.py:1198] (1/2) Epoch 23, batch 500, loss[loss=0.2483, simple_loss=0.2983, pruned_loss=0.07519, ctc_loss=0.1509, cr_loss=0.4434, over 34487.00 frames. ], tot_loss[loss=0.2227, simple_loss=0.2732, pruned_loss=0.06455, ctc_loss=0.1339, cr_loss=0.4099, over 6221138.49 frames. ], batch size: 110, lr: 5.26e-03, grad_scale: 32.0 2024-09-18 10:13:35,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=400306.6666666667, ans=0.0 2024-09-18 10:13:57,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.22 vs. limit=15.0 2024-09-18 10:13:58,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=400353.3333333333, ans=0.125 2024-09-18 10:14:15,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=400400.0, ans=0.025 2024-09-18 10:14:57,861 INFO [train.py:1198] (1/2) Epoch 23, batch 550, loss[loss=0.231, simple_loss=0.2843, pruned_loss=0.06691, ctc_loss=0.1364, cr_loss=0.4125, over 33804.00 frames. ], tot_loss[loss=0.2228, simple_loss=0.2732, pruned_loss=0.06463, ctc_loss=0.134, cr_loss=0.4096, over 6330513.10 frames. ], batch size: 122, lr: 5.26e-03, grad_scale: 32.0 2024-09-18 10:15:02,094 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.20 vs. limit=15.0 2024-09-18 10:15:06,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=400540.0, ans=0.125 2024-09-18 10:15:11,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=400540.0, ans=0.0 2024-09-18 10:15:24,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=400586.6666666667, ans=0.0 2024-09-18 10:15:50,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=400680.0, ans=0.025 2024-09-18 10:16:06,456 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.969e+02 2.437e+02 2.767e+02 3.679e+02 6.421e+02, threshold=5.534e+02, percent-clipped=2.0 2024-09-18 10:16:16,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=400726.6666666667, ans=0.1 2024-09-18 10:16:24,495 INFO [train.py:1198] (1/2) Epoch 23, batch 600, loss[loss=0.2325, simple_loss=0.2864, pruned_loss=0.06714, ctc_loss=0.1372, cr_loss=0.4249, over 34230.00 frames. ], tot_loss[loss=0.2228, simple_loss=0.2735, pruned_loss=0.06451, ctc_loss=0.1339, cr_loss=0.4096, over 6433391.79 frames. ], batch size: 117, lr: 5.26e-03, grad_scale: 32.0 2024-09-18 10:16:32,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=400773.3333333333, ans=0.1 2024-09-18 10:16:34,513 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:16:36,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=400773.3333333333, ans=0.2 2024-09-18 10:16:55,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=400866.6666666667, ans=0.5 2024-09-18 10:17:36,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=400960.0, ans=0.125 2024-09-18 10:17:40,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2024-09-18 10:17:46,381 INFO [train.py:1198] (1/2) Epoch 23, batch 650, loss[loss=0.2456, simple_loss=0.2916, pruned_loss=0.07539, ctc_loss=0.1534, cr_loss=0.4553, over 34544.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2724, pruned_loss=0.06386, ctc_loss=0.1327, cr_loss=0.4073, over 6524782.89 frames. ], batch size: 94, lr: 5.26e-03, grad_scale: 32.0 2024-09-18 10:17:53,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=401006.6666666667, ans=0.125 2024-09-18 10:18:06,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=401053.3333333333, ans=0.1 2024-09-18 10:18:23,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=401100.0, ans=0.0 2024-09-18 10:18:23,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.01 vs. limit=15.0 2024-09-18 10:18:39,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=401146.6666666667, ans=0.125 2024-09-18 10:18:50,477 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.439e+02 2.850e+02 3.924e+02 7.491e+02, threshold=5.700e+02, percent-clipped=6.0 2024-09-18 10:19:00,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=401193.3333333333, ans=0.2 2024-09-18 10:19:02,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=401193.3333333333, ans=0.0 2024-09-18 10:19:02,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=401193.3333333333, ans=0.0 2024-09-18 10:19:10,758 INFO [train.py:1198] (1/2) Epoch 23, batch 700, loss[loss=0.2244, simple_loss=0.2685, pruned_loss=0.06795, ctc_loss=0.1396, cr_loss=0.4117, over 34590.00 frames. ], tot_loss[loss=0.222, simple_loss=0.273, pruned_loss=0.06407, ctc_loss=0.1331, cr_loss=0.4081, over 6582405.00 frames. ], batch size: 89, lr: 5.26e-03, grad_scale: 32.0 2024-09-18 10:19:44,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=401333.3333333333, ans=0.09899494936611666 2024-09-18 10:19:44,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=401333.3333333333, ans=0.125 2024-09-18 10:20:01,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=401380.0, ans=0.125 2024-09-18 10:20:04,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=401380.0, ans=0.025 2024-09-18 10:20:11,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.77 vs. limit=22.5 2024-09-18 10:20:14,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=401380.0, ans=0.125 2024-09-18 10:20:16,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=401380.0, ans=0.2 2024-09-18 10:20:33,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=401473.3333333333, ans=0.025 2024-09-18 10:20:34,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=401473.3333333333, ans=0.125 2024-09-18 10:20:35,298 INFO [train.py:1198] (1/2) Epoch 23, batch 750, loss[loss=0.2305, simple_loss=0.2807, pruned_loss=0.06713, ctc_loss=0.1412, cr_loss=0.4448, over 34393.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.2731, pruned_loss=0.0642, ctc_loss=0.1334, cr_loss=0.4088, over 6623878.62 frames. ], batch size: 95, lr: 5.26e-03, grad_scale: 16.0 2024-09-18 10:20:48,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=401473.3333333333, ans=0.0 2024-09-18 10:20:56,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=401520.0, ans=0.0 2024-09-18 10:21:15,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=401566.6666666667, ans=0.1 2024-09-18 10:21:18,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=401566.6666666667, ans=0.1 2024-09-18 10:21:26,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=401613.3333333333, ans=0.125 2024-09-18 10:21:38,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-09-18 10:21:41,211 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.457e+02 2.922e+02 3.964e+02 6.435e+02, threshold=5.845e+02, percent-clipped=3.0 2024-09-18 10:21:56,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=401706.6666666667, ans=0.0 2024-09-18 10:21:57,681 INFO [train.py:1198] (1/2) Epoch 23, batch 800, loss[loss=0.196, simple_loss=0.2481, pruned_loss=0.05349, ctc_loss=0.1134, cr_loss=0.3588, over 34494.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.273, pruned_loss=0.06436, ctc_loss=0.1336, cr_loss=0.4089, over 6660534.14 frames. ], batch size: 85, lr: 5.26e-03, grad_scale: 32.0 2024-09-18 10:22:07,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=401706.6666666667, ans=0.125 2024-09-18 10:22:12,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=401753.3333333333, ans=0.125 2024-09-18 10:22:24,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=401753.3333333333, ans=0.0 2024-09-18 10:22:28,984 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:22:29,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.53 vs. limit=10.0 2024-09-18 10:22:35,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=401800.0, ans=0.2 2024-09-18 10:22:43,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=401800.0, ans=0.0 2024-09-18 10:22:47,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=401846.6666666667, ans=0.1 2024-09-18 10:22:48,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=401846.6666666667, ans=0.125 2024-09-18 10:22:56,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.27 vs. limit=15.0 2024-09-18 10:23:17,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=401893.3333333333, ans=0.125 2024-09-18 10:23:19,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.01 vs. limit=22.5 2024-09-18 10:23:23,340 INFO [train.py:1198] (1/2) Epoch 23, batch 850, loss[loss=0.2239, simple_loss=0.2822, pruned_loss=0.06152, ctc_loss=0.1317, cr_loss=0.4041, over 34378.00 frames. ], tot_loss[loss=0.222, simple_loss=0.2725, pruned_loss=0.06422, ctc_loss=0.1333, cr_loss=0.4077, over 6691413.13 frames. ], batch size: 103, lr: 5.25e-03, grad_scale: 32.0 2024-09-18 10:23:32,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2024-09-18 10:23:45,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.25 vs. limit=6.0 2024-09-18 10:23:53,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=401986.6666666667, ans=0.125 2024-09-18 10:24:13,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=22.5 2024-09-18 10:24:22,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=402080.0, ans=0.125 2024-09-18 10:24:29,311 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.426e+02 2.917e+02 3.557e+02 8.354e+02, threshold=5.835e+02, percent-clipped=2.0 2024-09-18 10:24:31,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=402126.6666666667, ans=0.05 2024-09-18 10:24:43,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2024-09-18 10:24:43,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.74 vs. limit=10.0 2024-09-18 10:24:45,716 INFO [train.py:1198] (1/2) Epoch 23, batch 900, loss[loss=0.1944, simple_loss=0.2447, pruned_loss=0.05308, ctc_loss=0.1147, cr_loss=0.377, over 34515.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.2729, pruned_loss=0.06443, ctc_loss=0.1336, cr_loss=0.4082, over 6695036.30 frames. ], batch size: 85, lr: 5.25e-03, grad_scale: 32.0 2024-09-18 10:24:47,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=402173.3333333333, ans=0.0 2024-09-18 10:24:57,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=402173.3333333333, ans=10.0 2024-09-18 10:25:00,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=402220.0, ans=0.2 2024-09-18 10:25:09,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=12.0 2024-09-18 10:25:36,142 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:25:37,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=402313.3333333333, ans=0.2 2024-09-18 10:25:39,853 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2024-09-18 10:25:44,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=402313.3333333333, ans=0.05 2024-09-18 10:25:50,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=402360.0, ans=0.2 2024-09-18 10:25:54,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.36 vs. limit=15.0 2024-09-18 10:26:01,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.92 vs. limit=6.0 2024-09-18 10:26:02,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=402360.0, ans=0.125 2024-09-18 10:26:08,365 INFO [train.py:1198] (1/2) Epoch 23, batch 950, loss[loss=0.2035, simple_loss=0.2529, pruned_loss=0.05748, ctc_loss=0.1196, cr_loss=0.3812, over 34696.00 frames. ], tot_loss[loss=0.2228, simple_loss=0.2732, pruned_loss=0.06462, ctc_loss=0.1341, cr_loss=0.409, over 6699116.84 frames. ], batch size: 87, lr: 5.25e-03, grad_scale: 32.0 2024-09-18 10:26:08,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=402406.6666666667, ans=0.125 2024-09-18 10:26:10,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=402406.6666666667, ans=0.125 2024-09-18 10:26:21,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=402406.6666666667, ans=0.0 2024-09-18 10:26:30,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=402453.3333333333, ans=15.0 2024-09-18 10:26:40,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=402453.3333333333, ans=0.125 2024-09-18 10:26:41,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=402500.0, ans=0.125 2024-09-18 10:26:53,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=402500.0, ans=0.125 2024-09-18 10:27:18,193 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.524e+02 2.974e+02 3.689e+02 6.279e+02, threshold=5.948e+02, percent-clipped=1.0 2024-09-18 10:27:24,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.96 vs. limit=15.0 2024-09-18 10:27:34,476 INFO [train.py:1198] (1/2) Epoch 23, batch 1000, loss[loss=0.2156, simple_loss=0.2625, pruned_loss=0.06365, ctc_loss=0.1288, cr_loss=0.3905, over 34466.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2742, pruned_loss=0.06517, ctc_loss=0.1351, cr_loss=0.4113, over 6693447.24 frames. ], batch size: 90, lr: 5.25e-03, grad_scale: 32.0 2024-09-18 10:27:48,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=402640.0, ans=0.125 2024-09-18 10:28:09,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=402733.3333333333, ans=0.2 2024-09-18 10:28:30,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=402780.0, ans=0.125 2024-09-18 10:28:42,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=402826.6666666667, ans=0.125 2024-09-18 10:28:42,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=402826.6666666667, ans=0.025 2024-09-18 10:28:42,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=12.0 2024-09-18 10:28:51,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.74 vs. limit=15.0 2024-09-18 10:28:55,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=402873.3333333333, ans=0.09899494936611666 2024-09-18 10:28:57,088 INFO [train.py:1198] (1/2) Epoch 23, batch 1050, loss[loss=0.2277, simple_loss=0.2825, pruned_loss=0.06465, ctc_loss=0.135, cr_loss=0.4147, over 34537.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2732, pruned_loss=0.0647, ctc_loss=0.1342, cr_loss=0.4097, over 6702979.43 frames. ], batch size: 99, lr: 5.25e-03, grad_scale: 32.0 2024-09-18 10:29:02,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=402873.3333333333, ans=0.125 2024-09-18 10:29:29,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=22.5 2024-09-18 10:30:03,459 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.416e+02 2.705e+02 3.329e+02 5.093e+02, threshold=5.409e+02, percent-clipped=0.0 2024-09-18 10:30:22,045 INFO [train.py:1198] (1/2) Epoch 23, batch 1100, loss[loss=0.2253, simple_loss=0.2773, pruned_loss=0.06491, ctc_loss=0.1356, cr_loss=0.4106, over 34703.00 frames. ], tot_loss[loss=0.2225, simple_loss=0.273, pruned_loss=0.06447, ctc_loss=0.1338, cr_loss=0.4091, over 6716870.99 frames. ], batch size: 92, lr: 5.25e-03, grad_scale: 32.0 2024-09-18 10:30:30,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=403106.6666666667, ans=0.1 2024-09-18 10:30:35,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403106.6666666667, ans=0.1 2024-09-18 10:30:39,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff2.min_abs, batch_count=403153.3333333333, ans=0.1 2024-09-18 10:31:14,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=403246.6666666667, ans=0.125 2024-09-18 10:31:26,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=403246.6666666667, ans=0.125 2024-09-18 10:31:35,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.69 vs. limit=10.0 2024-09-18 10:31:47,549 INFO [train.py:1198] (1/2) Epoch 23, batch 1150, loss[loss=0.2175, simple_loss=0.2635, pruned_loss=0.06431, ctc_loss=0.1319, cr_loss=0.4125, over 34382.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.2727, pruned_loss=0.06441, ctc_loss=0.1337, cr_loss=0.4091, over 6714000.51 frames. ], batch size: 91, lr: 5.24e-03, grad_scale: 32.0 2024-09-18 10:31:52,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=15.0 2024-09-18 10:31:59,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=403340.0, ans=0.125 2024-09-18 10:32:11,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=403386.6666666667, ans=0.125 2024-09-18 10:32:22,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.69 vs. limit=22.5 2024-09-18 10:32:42,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=403480.0, ans=0.0 2024-09-18 10:32:53,723 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.054e+02 2.536e+02 2.868e+02 3.519e+02 5.124e+02, threshold=5.736e+02, percent-clipped=0.0 2024-09-18 10:32:54,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=403526.6666666667, ans=0.1 2024-09-18 10:33:00,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=403526.6666666667, ans=0.0 2024-09-18 10:33:10,284 INFO [train.py:1198] (1/2) Epoch 23, batch 1200, loss[loss=0.2298, simple_loss=0.2832, pruned_loss=0.06603, ctc_loss=0.1387, cr_loss=0.4168, over 34550.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.274, pruned_loss=0.06492, ctc_loss=0.1347, cr_loss=0.4103, over 6707244.18 frames. ], batch size: 99, lr: 5.24e-03, grad_scale: 32.0 2024-09-18 10:33:12,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=403573.3333333333, ans=0.125 2024-09-18 10:33:18,853 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:33:37,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=403620.0, ans=0.1 2024-09-18 10:33:45,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=403666.6666666667, ans=0.125 2024-09-18 10:33:48,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403666.6666666667, ans=0.1 2024-09-18 10:34:06,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=403713.3333333333, ans=0.125 2024-09-18 10:34:12,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=12.0 2024-09-18 10:34:36,433 INFO [train.py:1198] (1/2) Epoch 23, batch 1250, loss[loss=0.2451, simple_loss=0.293, pruned_loss=0.07417, ctc_loss=0.1523, cr_loss=0.4588, over 34336.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2743, pruned_loss=0.06498, ctc_loss=0.1347, cr_loss=0.4114, over 6741057.94 frames. ], batch size: 107, lr: 5.24e-03, grad_scale: 32.0 2024-09-18 10:34:38,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=403806.6666666667, ans=0.125 2024-09-18 10:34:41,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=403806.6666666667, ans=0.125 2024-09-18 10:34:50,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=403806.6666666667, ans=0.0 2024-09-18 10:35:13,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=403900.0, ans=0.125 2024-09-18 10:35:18,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=403900.0, ans=0.125 2024-09-18 10:35:33,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=403946.6666666667, ans=0.125 2024-09-18 10:35:34,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=403946.6666666667, ans=0.0 2024-09-18 10:35:41,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=403993.3333333333, ans=0.125 2024-09-18 10:35:42,760 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.896e+02 2.513e+02 2.931e+02 3.433e+02 7.104e+02, threshold=5.863e+02, percent-clipped=3.0 2024-09-18 10:35:46,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=403993.3333333333, ans=0.125 2024-09-18 10:35:49,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=403993.3333333333, ans=0.125 2024-09-18 10:35:59,130 INFO [train.py:1198] (1/2) Epoch 23, batch 1300, loss[loss=0.2281, simple_loss=0.2844, pruned_loss=0.06408, ctc_loss=0.1363, cr_loss=0.4069, over 33157.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2736, pruned_loss=0.06469, ctc_loss=0.1342, cr_loss=0.41, over 6745718.94 frames. ], batch size: 130, lr: 5.24e-03, grad_scale: 32.0 2024-09-18 10:36:29,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=15.0 2024-09-18 10:36:46,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=404133.3333333333, ans=0.125 2024-09-18 10:36:54,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=404180.0, ans=0.07 2024-09-18 10:36:59,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=404180.0, ans=0.125 2024-09-18 10:37:07,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=404226.6666666667, ans=0.125 2024-09-18 10:37:22,194 INFO [train.py:1198] (1/2) Epoch 23, batch 1350, loss[loss=0.2312, simple_loss=0.2756, pruned_loss=0.07037, ctc_loss=0.1424, cr_loss=0.4404, over 34515.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2734, pruned_loss=0.06458, ctc_loss=0.134, cr_loss=0.4104, over 6765036.82 frames. ], batch size: 94, lr: 5.24e-03, grad_scale: 32.0 2024-09-18 10:37:29,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=404273.3333333333, ans=0.125 2024-09-18 10:37:32,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404273.3333333333, ans=0.1 2024-09-18 10:37:53,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=404320.0, ans=0.125 2024-09-18 10:38:11,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=404413.3333333333, ans=0.125 2024-09-18 10:38:16,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=404413.3333333333, ans=0.125 2024-09-18 10:38:20,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.08 vs. limit=10.0 2024-09-18 10:38:33,103 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.955e+02 2.529e+02 2.938e+02 3.948e+02 7.890e+02, threshold=5.877e+02, percent-clipped=3.0 2024-09-18 10:38:43,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=404460.0, ans=0.1 2024-09-18 10:38:48,018 INFO [train.py:1198] (1/2) Epoch 23, batch 1400, loss[loss=0.195, simple_loss=0.2411, pruned_loss=0.05553, ctc_loss=0.1162, cr_loss=0.367, over 34306.00 frames. ], tot_loss[loss=0.2227, simple_loss=0.273, pruned_loss=0.0646, ctc_loss=0.134, cr_loss=0.4103, over 6777212.08 frames. ], batch size: 80, lr: 5.24e-03, grad_scale: 16.0 2024-09-18 10:38:53,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=404506.6666666667, ans=0.025 2024-09-18 10:38:55,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=404506.6666666667, ans=0.125 2024-09-18 10:38:56,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=404506.6666666667, ans=0.125 2024-09-18 10:39:04,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=404553.3333333333, ans=0.125 2024-09-18 10:39:10,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2024-09-18 10:40:08,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=22.5 2024-09-18 10:40:10,551 INFO [train.py:1198] (1/2) Epoch 23, batch 1450, loss[loss=0.2298, simple_loss=0.2818, pruned_loss=0.06646, ctc_loss=0.1374, cr_loss=0.4383, over 34453.00 frames. ], tot_loss[loss=0.223, simple_loss=0.2735, pruned_loss=0.06461, ctc_loss=0.134, cr_loss=0.4107, over 6774191.58 frames. ], batch size: 110, lr: 5.24e-03, grad_scale: 16.0 2024-09-18 10:40:40,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=404786.6666666667, ans=0.2 2024-09-18 10:40:42,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=404833.3333333333, ans=0.035 2024-09-18 10:40:44,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=404833.3333333333, ans=0.125 2024-09-18 10:41:04,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=404880.0, ans=10.0 2024-09-18 10:41:08,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=404880.0, ans=0.0 2024-09-18 10:41:13,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=404880.0, ans=0.125 2024-09-18 10:41:17,788 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.158e+02 2.624e+02 3.031e+02 4.062e+02 6.877e+02, threshold=6.062e+02, percent-clipped=4.0 2024-09-18 10:41:19,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=404926.6666666667, ans=0.125 2024-09-18 10:41:34,427 INFO [train.py:1198] (1/2) Epoch 23, batch 1500, loss[loss=0.2371, simple_loss=0.2879, pruned_loss=0.07018, ctc_loss=0.1463, cr_loss=0.4203, over 34455.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.274, pruned_loss=0.0648, ctc_loss=0.1344, cr_loss=0.411, over 6774268.93 frames. ], batch size: 100, lr: 5.23e-03, grad_scale: 16.0 2024-09-18 10:42:02,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=405020.0, ans=0.1 2024-09-18 10:42:47,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=22.5 2024-09-18 10:42:59,612 INFO [train.py:1198] (1/2) Epoch 23, batch 1550, loss[loss=0.2445, simple_loss=0.2927, pruned_loss=0.0745, ctc_loss=0.1468, cr_loss=0.4487, over 34404.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2739, pruned_loss=0.06495, ctc_loss=0.1347, cr_loss=0.4115, over 6746092.92 frames. ], batch size: 105, lr: 5.23e-03, grad_scale: 16.0 2024-09-18 10:43:03,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=405206.6666666667, ans=0.0 2024-09-18 10:43:09,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.12 vs. limit=12.0 2024-09-18 10:43:15,084 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.65 vs. limit=15.0 2024-09-18 10:43:26,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=405253.3333333333, ans=0.07 2024-09-18 10:44:07,008 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.077e+02 2.579e+02 2.897e+02 3.710e+02 7.414e+02, threshold=5.793e+02, percent-clipped=4.0 2024-09-18 10:44:21,937 INFO [train.py:1198] (1/2) Epoch 23, batch 1600, loss[loss=0.23, simple_loss=0.2838, pruned_loss=0.06613, ctc_loss=0.1373, cr_loss=0.41, over 34569.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.274, pruned_loss=0.06502, ctc_loss=0.1348, cr_loss=0.411, over 6725849.17 frames. ], batch size: 99, lr: 5.23e-03, grad_scale: 32.0 2024-09-18 10:44:53,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=405533.3333333333, ans=0.125 2024-09-18 10:45:27,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.31 vs. limit=6.0 2024-09-18 10:45:31,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=405626.6666666667, ans=0.125 2024-09-18 10:45:46,040 INFO [train.py:1198] (1/2) Epoch 23, batch 1650, loss[loss=0.2301, simple_loss=0.2857, pruned_loss=0.065, ctc_loss=0.1353, cr_loss=0.435, over 34380.00 frames. ], tot_loss[loss=0.2237, simple_loss=0.274, pruned_loss=0.06498, ctc_loss=0.1348, cr_loss=0.4107, over 6718440.18 frames. ], batch size: 103, lr: 5.23e-03, grad_scale: 32.0 2024-09-18 10:46:07,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=405720.0, ans=15.0 2024-09-18 10:46:26,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405766.6666666667, ans=0.1 2024-09-18 10:46:36,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=405813.3333333333, ans=0.035 2024-09-18 10:46:39,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=405813.3333333333, ans=0.125 2024-09-18 10:46:39,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=405813.3333333333, ans=0.2 2024-09-18 10:46:54,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=405860.0, ans=0.0 2024-09-18 10:46:56,021 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.486e+02 2.992e+02 3.688e+02 9.145e+02, threshold=5.984e+02, percent-clipped=7.0 2024-09-18 10:46:59,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=405860.0, ans=0.125 2024-09-18 10:47:10,650 INFO [train.py:1198] (1/2) Epoch 23, batch 1700, loss[loss=0.1942, simple_loss=0.2495, pruned_loss=0.05116, ctc_loss=0.1112, cr_loss=0.3587, over 34306.00 frames. ], tot_loss[loss=0.2233, simple_loss=0.2737, pruned_loss=0.06479, ctc_loss=0.1344, cr_loss=0.4103, over 6744618.54 frames. ], batch size: 80, lr: 5.23e-03, grad_scale: 32.0 2024-09-18 10:47:52,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=406000.0, ans=0.09899494936611666 2024-09-18 10:48:03,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=406046.6666666667, ans=0.0 2024-09-18 10:48:05,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=406046.6666666667, ans=0.95 2024-09-18 10:48:08,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=406046.6666666667, ans=0.125 2024-09-18 10:48:10,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.99 vs. limit=22.5 2024-09-18 10:48:11,142 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.32 vs. limit=10.0 2024-09-18 10:48:27,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=406093.3333333333, ans=0.0 2024-09-18 10:48:33,430 INFO [train.py:1198] (1/2) Epoch 23, batch 1750, loss[loss=0.1955, simple_loss=0.2479, pruned_loss=0.05334, ctc_loss=0.1117, cr_loss=0.3535, over 34196.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2731, pruned_loss=0.06448, ctc_loss=0.1338, cr_loss=0.4088, over 6754097.59 frames. ], batch size: 78, lr: 5.23e-03, grad_scale: 32.0 2024-09-18 10:48:41,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=406140.0, ans=0.0 2024-09-18 10:48:42,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.30 vs. limit=15.0 2024-09-18 10:48:52,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=406186.6666666667, ans=0.125 2024-09-18 10:49:19,169 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.14 vs. limit=15.0 2024-09-18 10:49:38,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=406280.0, ans=0.125 2024-09-18 10:49:44,265 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.914e+02 2.434e+02 2.938e+02 3.546e+02 5.114e+02, threshold=5.875e+02, percent-clipped=0.0 2024-09-18 10:49:56,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=406326.6666666667, ans=0.125 2024-09-18 10:49:59,556 INFO [train.py:1198] (1/2) Epoch 23, batch 1800, loss[loss=0.2161, simple_loss=0.269, pruned_loss=0.06061, ctc_loss=0.1297, cr_loss=0.4034, over 34703.00 frames. ], tot_loss[loss=0.2227, simple_loss=0.2731, pruned_loss=0.06455, ctc_loss=0.134, cr_loss=0.4096, over 6757941.46 frames. ], batch size: 97, lr: 5.23e-03, grad_scale: 16.0 2024-09-18 10:50:09,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=406373.3333333333, ans=0.2 2024-09-18 10:50:14,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=406420.0, ans=0.015 2024-09-18 10:50:21,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=406420.0, ans=0.1 2024-09-18 10:50:26,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=406420.0, ans=0.1 2024-09-18 10:50:42,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=406466.6666666667, ans=0.0 2024-09-18 10:50:53,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.96 vs. limit=12.0 2024-09-18 10:51:22,185 INFO [train.py:1198] (1/2) Epoch 23, batch 1850, loss[loss=0.2293, simple_loss=0.2851, pruned_loss=0.0647, ctc_loss=0.1371, cr_loss=0.4177, over 34461.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.273, pruned_loss=0.06451, ctc_loss=0.1339, cr_loss=0.4093, over 6765952.15 frames. ], batch size: 100, lr: 5.22e-03, grad_scale: 16.0 2024-09-18 10:51:47,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=406653.3333333333, ans=0.0 2024-09-18 10:52:04,357 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.39 vs. limit=15.0 2024-09-18 10:52:05,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=406700.0, ans=0.025 2024-09-18 10:52:31,618 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.079e+02 2.703e+02 3.319e+02 4.044e+02 6.577e+02, threshold=6.638e+02, percent-clipped=4.0 2024-09-18 10:52:46,294 INFO [train.py:1198] (1/2) Epoch 23, batch 1900, loss[loss=0.2324, simple_loss=0.2857, pruned_loss=0.06616, ctc_loss=0.1443, cr_loss=0.4485, over 34384.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2737, pruned_loss=0.06466, ctc_loss=0.1343, cr_loss=0.4101, over 6774352.03 frames. ], batch size: 103, lr: 5.22e-03, grad_scale: 16.0 2024-09-18 10:52:47,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.46 vs. limit=15.0 2024-09-18 10:52:48,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=406840.0, ans=0.125 2024-09-18 10:52:56,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=406840.0, ans=0.0 2024-09-18 10:53:09,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=406886.6666666667, ans=0.125 2024-09-18 10:53:38,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=406980.0, ans=0.0 2024-09-18 10:53:39,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=22.5 2024-09-18 10:53:50,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.07 vs. limit=22.5 2024-09-18 10:53:58,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.84 vs. limit=15.0 2024-09-18 10:54:11,298 INFO [train.py:1198] (1/2) Epoch 23, batch 1950, loss[loss=0.2219, simple_loss=0.2762, pruned_loss=0.06249, ctc_loss=0.1303, cr_loss=0.4143, over 34354.00 frames. ], tot_loss[loss=0.224, simple_loss=0.2749, pruned_loss=0.06484, ctc_loss=0.1347, cr_loss=0.4118, over 6790857.27 frames. ], batch size: 91, lr: 5.22e-03, grad_scale: 16.0 2024-09-18 10:54:25,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.83 vs. limit=10.0 2024-09-18 10:55:02,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=407213.3333333333, ans=0.1 2024-09-18 10:55:10,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-09-18 10:55:17,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=407260.0, ans=0.125 2024-09-18 10:55:20,600 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.081e+02 2.429e+02 2.804e+02 3.518e+02 5.036e+02, threshold=5.608e+02, percent-clipped=0.0 2024-09-18 10:55:33,876 INFO [train.py:1198] (1/2) Epoch 23, batch 2000, loss[loss=0.1953, simple_loss=0.2448, pruned_loss=0.05369, ctc_loss=0.1165, cr_loss=0.3795, over 34180.00 frames. ], tot_loss[loss=0.2244, simple_loss=0.2753, pruned_loss=0.06501, ctc_loss=0.1352, cr_loss=0.413, over 6764740.61 frames. ], batch size: 78, lr: 5.22e-03, grad_scale: 32.0 2024-09-18 10:55:42,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=407306.6666666667, ans=0.125 2024-09-18 10:55:42,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=407306.6666666667, ans=0.125 2024-09-18 10:55:56,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=407353.3333333333, ans=0.2 2024-09-18 10:56:03,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.81 vs. limit=15.0 2024-09-18 10:56:09,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=407400.0, ans=0.0 2024-09-18 10:56:22,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=407400.0, ans=0.0 2024-09-18 10:56:24,676 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.95 vs. limit=15.0 2024-09-18 10:56:25,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=407446.6666666667, ans=0.0 2024-09-18 10:56:35,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=407446.6666666667, ans=0.125 2024-09-18 10:56:58,280 INFO [train.py:1198] (1/2) Epoch 23, batch 2050, loss[loss=0.209, simple_loss=0.2566, pruned_loss=0.06062, ctc_loss=0.1255, cr_loss=0.3765, over 34493.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.274, pruned_loss=0.06452, ctc_loss=0.1343, cr_loss=0.4109, over 6756050.05 frames. ], batch size: 82, lr: 5.22e-03, grad_scale: 32.0 2024-09-18 10:57:03,731 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:57:10,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=407540.0, ans=0.125 2024-09-18 10:57:40,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=407633.3333333333, ans=0.0 2024-09-18 10:57:48,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=407680.0, ans=0.125 2024-09-18 10:58:00,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=407680.0, ans=0.125 2024-09-18 10:58:06,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=407726.6666666667, ans=0.1 2024-09-18 10:58:09,653 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.503e+02 3.183e+02 3.889e+02 5.835e+02, threshold=6.365e+02, percent-clipped=2.0 2024-09-18 10:58:09,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=407726.6666666667, ans=0.1 2024-09-18 10:58:10,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=407726.6666666667, ans=0.1 2024-09-18 10:58:22,977 INFO [train.py:1198] (1/2) Epoch 23, batch 2100, loss[loss=0.2131, simple_loss=0.2653, pruned_loss=0.05982, ctc_loss=0.1261, cr_loss=0.3995, over 34507.00 frames. ], tot_loss[loss=0.2228, simple_loss=0.2735, pruned_loss=0.06443, ctc_loss=0.1341, cr_loss=0.4105, over 6770994.72 frames. ], batch size: 94, lr: 5.22e-03, grad_scale: 32.0 2024-09-18 10:58:28,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=407773.3333333333, ans=0.125 2024-09-18 10:58:34,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=407773.3333333333, ans=0.125 2024-09-18 10:58:51,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=407820.0, ans=0.125 2024-09-18 10:59:24,831 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.79 vs. limit=6.0 2024-09-18 10:59:27,602 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 10:59:35,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=407960.0, ans=0.0 2024-09-18 10:59:45,379 INFO [train.py:1198] (1/2) Epoch 23, batch 2150, loss[loss=0.2289, simple_loss=0.2726, pruned_loss=0.06983, ctc_loss=0.1401, cr_loss=0.4351, over 34717.00 frames. ], tot_loss[loss=0.222, simple_loss=0.2728, pruned_loss=0.06406, ctc_loss=0.1333, cr_loss=0.409, over 6788503.10 frames. ], batch size: 92, lr: 5.21e-03, grad_scale: 32.0 2024-09-18 11:00:12,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=408053.3333333333, ans=0.07 2024-09-18 11:00:14,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.28 vs. limit=15.0 2024-09-18 11:00:31,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=408100.0, ans=0.1 2024-09-18 11:00:33,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=408100.0, ans=0.0 2024-09-18 11:00:37,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=22.5 2024-09-18 11:00:50,133 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:00:50,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=408146.6666666667, ans=0.125 2024-09-18 11:00:57,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.73 vs. limit=22.5 2024-09-18 11:00:58,015 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.519e+02 2.866e+02 4.106e+02 7.387e+02, threshold=5.733e+02, percent-clipped=2.0 2024-09-18 11:00:58,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=408193.3333333333, ans=0.125 2024-09-18 11:01:01,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=408193.3333333333, ans=0.1 2024-09-18 11:01:09,572 INFO [train.py:1198] (1/2) Epoch 23, batch 2200, loss[loss=0.2298, simple_loss=0.2839, pruned_loss=0.06558, ctc_loss=0.1391, cr_loss=0.4205, over 34435.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.273, pruned_loss=0.06421, ctc_loss=0.1335, cr_loss=0.4096, over 6783483.68 frames. ], batch size: 100, lr: 5.21e-03, grad_scale: 16.0 2024-09-18 11:01:40,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=408286.6666666667, ans=0.2 2024-09-18 11:01:56,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=408333.3333333333, ans=0.0 2024-09-18 11:02:14,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=408380.0, ans=0.125 2024-09-18 11:02:19,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=408426.6666666667, ans=0.1 2024-09-18 11:02:24,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=408426.6666666667, ans=0.125 2024-09-18 11:02:29,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=408426.6666666667, ans=0.0 2024-09-18 11:02:34,108 INFO [train.py:1198] (1/2) Epoch 23, batch 2250, loss[loss=0.2266, simple_loss=0.2812, pruned_loss=0.06426, ctc_loss=0.1349, cr_loss=0.413, over 34399.00 frames. ], tot_loss[loss=0.2216, simple_loss=0.2725, pruned_loss=0.06387, ctc_loss=0.133, cr_loss=0.4081, over 6782298.93 frames. ], batch size: 95, lr: 5.21e-03, grad_scale: 16.0 2024-09-18 11:02:36,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=408473.3333333333, ans=0.125 2024-09-18 11:02:46,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=408473.3333333333, ans=0.125 2024-09-18 11:02:54,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=408520.0, ans=0.07 2024-09-18 11:03:02,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=408520.0, ans=0.1 2024-09-18 11:03:15,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=408566.6666666667, ans=0.0 2024-09-18 11:03:22,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=408613.3333333333, ans=0.5 2024-09-18 11:03:23,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=408613.3333333333, ans=0.2 2024-09-18 11:03:37,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=408613.3333333333, ans=0.1 2024-09-18 11:03:40,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=408660.0, ans=0.125 2024-09-18 11:03:40,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=408660.0, ans=0.125 2024-09-18 11:03:47,123 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.518e+02 3.178e+02 4.170e+02 1.638e+03, threshold=6.355e+02, percent-clipped=7.0 2024-09-18 11:03:58,743 INFO [train.py:1198] (1/2) Epoch 23, batch 2300, loss[loss=0.2024, simple_loss=0.2501, pruned_loss=0.05851, ctc_loss=0.1153, cr_loss=0.363, over 34286.00 frames. ], tot_loss[loss=0.221, simple_loss=0.2717, pruned_loss=0.06376, ctc_loss=0.1326, cr_loss=0.4071, over 6766382.70 frames. ], batch size: 83, lr: 5.21e-03, grad_scale: 16.0 2024-09-18 11:04:01,386 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.87 vs. limit=15.0 2024-09-18 11:04:14,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=408753.3333333333, ans=0.125 2024-09-18 11:04:16,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.48 vs. limit=12.0 2024-09-18 11:04:30,866 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2024-09-18 11:04:40,104 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=22.5 2024-09-18 11:05:04,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=408893.3333333333, ans=0.125 2024-09-18 11:05:09,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=408893.3333333333, ans=0.02 2024-09-18 11:05:10,851 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:05:16,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=408893.3333333333, ans=0.1 2024-09-18 11:05:24,298 INFO [train.py:1198] (1/2) Epoch 23, batch 2350, loss[loss=0.2371, simple_loss=0.29, pruned_loss=0.06914, ctc_loss=0.1427, cr_loss=0.4333, over 34710.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.272, pruned_loss=0.06387, ctc_loss=0.1328, cr_loss=0.4079, over 6773178.17 frames. ], batch size: 97, lr: 5.21e-03, grad_scale: 16.0 2024-09-18 11:06:34,743 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.082e+02 2.392e+02 2.959e+02 3.587e+02 5.819e+02, threshold=5.917e+02, percent-clipped=0.0 2024-09-18 11:06:39,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=409126.6666666667, ans=0.125 2024-09-18 11:06:43,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=409126.6666666667, ans=0.125 2024-09-18 11:06:46,336 INFO [train.py:1198] (1/2) Epoch 23, batch 2400, loss[loss=0.2152, simple_loss=0.2641, pruned_loss=0.06214, ctc_loss=0.1274, cr_loss=0.4127, over 34578.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.2726, pruned_loss=0.06431, ctc_loss=0.1336, cr_loss=0.4095, over 6777361.31 frames. ], batch size: 89, lr: 5.21e-03, grad_scale: 32.0 2024-09-18 11:07:06,458 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.554e-03 2024-09-18 11:07:14,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=409220.0, ans=0.125 2024-09-18 11:07:56,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.83 vs. limit=22.5 2024-09-18 11:08:10,573 INFO [train.py:1198] (1/2) Epoch 23, batch 2450, loss[loss=0.2213, simple_loss=0.2776, pruned_loss=0.06161, ctc_loss=0.132, cr_loss=0.3867, over 34400.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2737, pruned_loss=0.06471, ctc_loss=0.1344, cr_loss=0.411, over 6751322.22 frames. ], batch size: 95, lr: 5.21e-03, grad_scale: 32.0 2024-09-18 11:08:17,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=409406.6666666667, ans=0.125 2024-09-18 11:08:32,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=409453.3333333333, ans=0.125 2024-09-18 11:08:33,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=409453.3333333333, ans=0.125 2024-09-18 11:08:59,366 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.34 vs. limit=22.5 2024-09-18 11:09:02,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=409546.6666666667, ans=0.125 2024-09-18 11:09:15,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=409546.6666666667, ans=0.025 2024-09-18 11:09:23,547 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.524e+02 2.901e+02 3.484e+02 6.341e+02, threshold=5.803e+02, percent-clipped=3.0 2024-09-18 11:09:35,181 INFO [train.py:1198] (1/2) Epoch 23, batch 2500, loss[loss=0.2306, simple_loss=0.2791, pruned_loss=0.06813, ctc_loss=0.1401, cr_loss=0.4451, over 34442.00 frames. ], tot_loss[loss=0.223, simple_loss=0.2735, pruned_loss=0.0646, ctc_loss=0.1342, cr_loss=0.4104, over 6762957.23 frames. ], batch size: 100, lr: 5.20e-03, grad_scale: 32.0 2024-09-18 11:09:37,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=409640.0, ans=0.125 2024-09-18 11:09:48,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=409640.0, ans=0.2 2024-09-18 11:10:03,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=409686.6666666667, ans=0.0 2024-09-18 11:10:17,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.45 vs. limit=10.0 2024-09-18 11:10:23,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=409780.0, ans=0.025 2024-09-18 11:10:58,291 INFO [train.py:1198] (1/2) Epoch 23, batch 2550, loss[loss=0.1863, simple_loss=0.2373, pruned_loss=0.04995, ctc_loss=0.1072, cr_loss=0.3479, over 34196.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2733, pruned_loss=0.06463, ctc_loss=0.1342, cr_loss=0.4106, over 6765425.05 frames. ], batch size: 78, lr: 5.20e-03, grad_scale: 32.0 2024-09-18 11:11:00,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=409873.3333333333, ans=0.5 2024-09-18 11:11:24,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=409920.0, ans=0.125 2024-09-18 11:11:28,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=409920.0, ans=0.025 2024-09-18 11:11:29,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff2.min_abs, batch_count=409920.0, ans=0.1 2024-09-18 11:11:43,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=409966.6666666667, ans=0.125 2024-09-18 11:11:44,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=409966.6666666667, ans=0.2 2024-09-18 11:12:06,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=410060.0, ans=0.125 2024-09-18 11:12:08,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=15.0 2024-09-18 11:12:10,797 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.477e+02 2.918e+02 3.711e+02 7.995e+02, threshold=5.835e+02, percent-clipped=7.0 2024-09-18 11:12:14,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=410060.0, ans=0.125 2024-09-18 11:12:19,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.00 vs. limit=22.5 2024-09-18 11:12:22,222 INFO [train.py:1198] (1/2) Epoch 23, batch 2600, loss[loss=0.2223, simple_loss=0.2727, pruned_loss=0.06411, ctc_loss=0.1357, cr_loss=0.4138, over 34371.00 frames. ], tot_loss[loss=0.2234, simple_loss=0.2737, pruned_loss=0.06482, ctc_loss=0.1347, cr_loss=0.4112, over 6761935.96 frames. ], batch size: 91, lr: 5.20e-03, grad_scale: 32.0 2024-09-18 11:12:29,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.81 vs. limit=22.5 2024-09-18 11:12:35,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=410106.6666666667, ans=0.015 2024-09-18 11:12:39,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.37 vs. limit=15.0 2024-09-18 11:12:41,251 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.71 vs. limit=12.0 2024-09-18 11:12:50,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=410153.3333333333, ans=0.125 2024-09-18 11:13:02,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=410200.0, ans=0.0 2024-09-18 11:13:08,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=410200.0, ans=0.125 2024-09-18 11:13:22,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=410246.6666666667, ans=0.125 2024-09-18 11:13:33,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=410293.3333333333, ans=0.0 2024-09-18 11:13:46,373 INFO [train.py:1198] (1/2) Epoch 23, batch 2650, loss[loss=0.2501, simple_loss=0.2987, pruned_loss=0.07589, ctc_loss=0.1555, cr_loss=0.4675, over 34248.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2741, pruned_loss=0.06482, ctc_loss=0.1348, cr_loss=0.4112, over 6769396.06 frames. ], batch size: 117, lr: 5.20e-03, grad_scale: 32.0 2024-09-18 11:13:55,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=410340.0, ans=0.0 2024-09-18 11:14:11,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=410386.6666666667, ans=0.125 2024-09-18 11:14:20,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2024-09-18 11:14:55,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.44 vs. limit=15.0 2024-09-18 11:14:59,461 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.447e+02 2.948e+02 3.879e+02 6.824e+02, threshold=5.896e+02, percent-clipped=3.0 2024-09-18 11:15:10,921 INFO [train.py:1198] (1/2) Epoch 23, batch 2700, loss[loss=0.2253, simple_loss=0.2803, pruned_loss=0.06389, ctc_loss=0.1312, cr_loss=0.4054, over 34628.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2742, pruned_loss=0.06473, ctc_loss=0.1345, cr_loss=0.4105, over 6764250.87 frames. ], batch size: 102, lr: 5.20e-03, grad_scale: 32.0 2024-09-18 11:15:33,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=410620.0, ans=0.2 2024-09-18 11:15:41,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=410620.0, ans=0.125 2024-09-18 11:16:07,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=410713.3333333333, ans=0.2 2024-09-18 11:16:09,051 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:16:19,553 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.08 vs. limit=22.5 2024-09-18 11:16:31,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=410760.0, ans=10.0 2024-09-18 11:16:32,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=410760.0, ans=0.0 2024-09-18 11:16:35,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=410760.0, ans=0.2 2024-09-18 11:16:42,106 INFO [train.py:1198] (1/2) Epoch 23, batch 2750, loss[loss=0.2131, simple_loss=0.2593, pruned_loss=0.06263, ctc_loss=0.1288, cr_loss=0.3973, over 34646.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.2728, pruned_loss=0.06424, ctc_loss=0.1336, cr_loss=0.4081, over 6761501.98 frames. ], batch size: 88, lr: 5.20e-03, grad_scale: 32.0 2024-09-18 11:16:56,159 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.17 vs. limit=15.0 2024-09-18 11:16:57,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=410853.3333333333, ans=0.09899494936611666 2024-09-18 11:17:00,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=410853.3333333333, ans=0.0 2024-09-18 11:17:13,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=410900.0, ans=0.125 2024-09-18 11:17:14,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.63 vs. limit=15.0 2024-09-18 11:17:26,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=410900.0, ans=0.025 2024-09-18 11:17:52,948 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.471e+02 2.930e+02 3.577e+02 6.576e+02, threshold=5.860e+02, percent-clipped=1.0 2024-09-18 11:18:04,559 INFO [train.py:1198] (1/2) Epoch 23, batch 2800, loss[loss=0.265, simple_loss=0.3062, pruned_loss=0.08541, ctc_loss=0.1719, cr_loss=0.4656, over 23999.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2734, pruned_loss=0.06477, ctc_loss=0.1344, cr_loss=0.4097, over 6741292.17 frames. ], batch size: 244, lr: 5.20e-03, grad_scale: 32.0 2024-09-18 11:18:08,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.81 vs. limit=22.5 2024-09-18 11:18:41,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=411133.3333333333, ans=0.04949747468305833 2024-09-18 11:18:58,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.87 vs. limit=15.0 2024-09-18 11:19:16,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.54 vs. limit=10.0 2024-09-18 11:19:22,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=411226.6666666667, ans=0.125 2024-09-18 11:19:28,707 INFO [train.py:1198] (1/2) Epoch 23, batch 2850, loss[loss=0.2158, simple_loss=0.2618, pruned_loss=0.06372, ctc_loss=0.1324, cr_loss=0.3979, over 34464.00 frames. ], tot_loss[loss=0.2238, simple_loss=0.274, pruned_loss=0.06505, ctc_loss=0.135, cr_loss=0.4112, over 6725786.88 frames. ], batch size: 90, lr: 5.19e-03, grad_scale: 16.0 2024-09-18 11:19:28,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=411273.3333333333, ans=0.125 2024-09-18 11:19:46,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.34 vs. limit=15.0 2024-09-18 11:20:07,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=411366.6666666667, ans=0.1 2024-09-18 11:20:07,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=411366.6666666667, ans=0.0 2024-09-18 11:20:10,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=411366.6666666667, ans=0.0 2024-09-18 11:20:11,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=411366.6666666667, ans=0.1 2024-09-18 11:20:14,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.37 vs. limit=12.0 2024-09-18 11:20:43,215 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.065e+02 2.445e+02 2.971e+02 3.673e+02 8.059e+02, threshold=5.941e+02, percent-clipped=3.0 2024-09-18 11:20:45,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=411460.0, ans=0.0 2024-09-18 11:20:45,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=411460.0, ans=0.125 2024-09-18 11:20:46,785 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:20:53,084 INFO [train.py:1198] (1/2) Epoch 23, batch 2900, loss[loss=0.2219, simple_loss=0.2732, pruned_loss=0.06366, ctc_loss=0.1326, cr_loss=0.4192, over 34528.00 frames. ], tot_loss[loss=0.225, simple_loss=0.2754, pruned_loss=0.06551, ctc_loss=0.1359, cr_loss=0.4131, over 6756467.80 frames. ], batch size: 94, lr: 5.19e-03, grad_scale: 16.0 2024-09-18 11:20:56,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff3.min_abs, batch_count=411506.6666666667, ans=0.2 2024-09-18 11:21:03,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=411506.6666666667, ans=0.0 2024-09-18 11:21:11,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2024-09-18 11:21:26,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=411600.0, ans=0.125 2024-09-18 11:21:30,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=411600.0, ans=0.125 2024-09-18 11:21:37,424 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.08 vs. limit=12.0 2024-09-18 11:21:48,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=411646.6666666667, ans=0.125 2024-09-18 11:21:58,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=12.0 2024-09-18 11:22:03,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=411693.3333333333, ans=0.1 2024-09-18 11:22:15,953 INFO [train.py:1198] (1/2) Epoch 23, batch 2950, loss[loss=0.2171, simple_loss=0.2635, pruned_loss=0.06387, ctc_loss=0.13, cr_loss=0.4233, over 34618.00 frames. ], tot_loss[loss=0.2236, simple_loss=0.2739, pruned_loss=0.06492, ctc_loss=0.1347, cr_loss=0.4105, over 6751145.13 frames. ], batch size: 88, lr: 5.19e-03, grad_scale: 8.0 2024-09-18 11:22:34,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=411786.6666666667, ans=0.1 2024-09-18 11:22:39,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=411786.6666666667, ans=0.125 2024-09-18 11:22:46,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=411786.6666666667, ans=0.125 2024-09-18 11:23:21,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=411880.0, ans=0.125 2024-09-18 11:23:24,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=411926.6666666667, ans=0.035 2024-09-18 11:23:24,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=411926.6666666667, ans=0.125 2024-09-18 11:23:24,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=411926.6666666667, ans=0.125 2024-09-18 11:23:31,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=411926.6666666667, ans=0.0 2024-09-18 11:23:32,306 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.055e+02 2.656e+02 3.310e+02 4.609e+02 8.730e+02, threshold=6.621e+02, percent-clipped=12.0 2024-09-18 11:23:40,631 INFO [train.py:1198] (1/2) Epoch 23, batch 3000, loss[loss=0.2109, simple_loss=0.2664, pruned_loss=0.0575, ctc_loss=0.1216, cr_loss=0.4008, over 34550.00 frames. ], tot_loss[loss=0.2229, simple_loss=0.2736, pruned_loss=0.06459, ctc_loss=0.1339, cr_loss=0.4093, over 6752402.33 frames. ], batch size: 94, lr: 5.19e-03, grad_scale: 8.0 2024-09-18 11:23:40,632 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 11:23:57,575 INFO [train.py:1230] (1/2) Epoch 23, validation: loss=0.1486, simple_loss=0.2449, pruned_loss=0.02208, ctc_loss=0.04086, cr_loss=1.928e-14, over 944034.00 frames. 2024-09-18 11:23:57,575 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 11:24:05,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2024-09-18 11:24:06,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411973.3333333333, ans=0.1 2024-09-18 11:24:10,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.65 vs. limit=15.0 2024-09-18 11:24:24,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=412020.0, ans=0.125 2024-09-18 11:24:44,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=412066.6666666667, ans=0.0 2024-09-18 11:24:48,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.99 vs. limit=15.0 2024-09-18 11:24:50,785 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:24:54,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=412113.3333333333, ans=0.125 2024-09-18 11:25:02,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=412113.3333333333, ans=0.2 2024-09-18 11:25:21,844 INFO [train.py:1198] (1/2) Epoch 23, batch 3050, loss[loss=0.2091, simple_loss=0.2591, pruned_loss=0.05932, ctc_loss=0.1241, cr_loss=0.3911, over 34597.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2742, pruned_loss=0.06479, ctc_loss=0.1343, cr_loss=0.4101, over 6743772.90 frames. ], batch size: 89, lr: 5.19e-03, grad_scale: 8.0 2024-09-18 11:25:41,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.22 vs. limit=6.0 2024-09-18 11:26:09,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=412346.6666666667, ans=0.2 2024-09-18 11:26:12,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=412346.6666666667, ans=0.0 2024-09-18 11:26:23,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=412346.6666666667, ans=0.125 2024-09-18 11:26:30,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=412393.3333333333, ans=0.2 2024-09-18 11:26:34,696 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.440e+02 2.801e+02 3.420e+02 1.177e+03, threshold=5.601e+02, percent-clipped=1.0 2024-09-18 11:26:35,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=412393.3333333333, ans=0.0 2024-09-18 11:26:41,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=412440.0, ans=0.125 2024-09-18 11:26:42,780 INFO [train.py:1198] (1/2) Epoch 23, batch 3100, loss[loss=0.2354, simple_loss=0.2836, pruned_loss=0.07042, ctc_loss=0.1466, cr_loss=0.4258, over 34219.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2737, pruned_loss=0.06467, ctc_loss=0.1342, cr_loss=0.4097, over 6743692.33 frames. ], batch size: 117, lr: 5.19e-03, grad_scale: 8.0 2024-09-18 11:26:52,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=412440.0, ans=0.1 2024-09-18 11:26:59,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=412486.6666666667, ans=0.09899494936611666 2024-09-18 11:26:59,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=412486.6666666667, ans=0.125 2024-09-18 11:27:08,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=412486.6666666667, ans=0.025 2024-09-18 11:27:24,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=412533.3333333333, ans=0.125 2024-09-18 11:27:27,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=412533.3333333333, ans=0.95 2024-09-18 11:27:32,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff3.min_abs, batch_count=412580.0, ans=0.2 2024-09-18 11:27:40,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=412580.0, ans=0.125 2024-09-18 11:28:06,053 INFO [train.py:1198] (1/2) Epoch 23, batch 3150, loss[loss=0.238, simple_loss=0.2898, pruned_loss=0.07005, ctc_loss=0.145, cr_loss=0.4261, over 33768.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2737, pruned_loss=0.06464, ctc_loss=0.1342, cr_loss=0.4098, over 6748577.77 frames. ], batch size: 122, lr: 5.19e-03, grad_scale: 8.0 2024-09-18 11:28:20,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=412720.0, ans=0.125 2024-09-18 11:28:21,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=412720.0, ans=0.1 2024-09-18 11:28:22,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=412720.0, ans=0.125 2024-09-18 11:28:33,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=412720.0, ans=0.2 2024-09-18 11:28:58,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.08 vs. limit=15.0 2024-09-18 11:29:09,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=412860.0, ans=0.0 2024-09-18 11:29:15,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=412860.0, ans=0.05 2024-09-18 11:29:18,654 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.873e+02 2.570e+02 3.084e+02 3.737e+02 6.530e+02, threshold=6.168e+02, percent-clipped=2.0 2024-09-18 11:29:26,762 INFO [train.py:1198] (1/2) Epoch 23, batch 3200, loss[loss=0.2278, simple_loss=0.279, pruned_loss=0.06628, ctc_loss=0.1377, cr_loss=0.4154, over 34545.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.2729, pruned_loss=0.06441, ctc_loss=0.1337, cr_loss=0.4086, over 6759914.49 frames. ], batch size: 94, lr: 5.18e-03, grad_scale: 16.0 2024-09-18 11:29:42,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2024-09-18 11:30:02,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=413000.0, ans=0.1 2024-09-18 11:30:23,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=413046.6666666667, ans=0.0 2024-09-18 11:30:26,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=413046.6666666667, ans=0.2 2024-09-18 11:30:49,132 INFO [train.py:1198] (1/2) Epoch 23, batch 3250, loss[loss=0.2353, simple_loss=0.2863, pruned_loss=0.06914, ctc_loss=0.1444, cr_loss=0.4278, over 34647.00 frames. ], tot_loss[loss=0.2227, simple_loss=0.2734, pruned_loss=0.06441, ctc_loss=0.1338, cr_loss=0.4089, over 6769695.63 frames. ], batch size: 98, lr: 5.18e-03, grad_scale: 16.0 2024-09-18 11:30:52,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-18 11:31:47,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2024-09-18 11:31:53,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=413326.6666666667, ans=0.1 2024-09-18 11:32:01,163 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.111e+02 2.451e+02 2.851e+02 3.580e+02 5.163e+02, threshold=5.702e+02, percent-clipped=0.0 2024-09-18 11:32:07,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=413373.3333333333, ans=0.2 2024-09-18 11:32:09,344 INFO [train.py:1198] (1/2) Epoch 23, batch 3300, loss[loss=0.2187, simple_loss=0.2777, pruned_loss=0.05885, ctc_loss=0.1297, cr_loss=0.4001, over 33031.00 frames. ], tot_loss[loss=0.2212, simple_loss=0.272, pruned_loss=0.06382, ctc_loss=0.1327, cr_loss=0.4068, over 6769066.28 frames. ], batch size: 130, lr: 5.18e-03, grad_scale: 16.0 2024-09-18 11:32:16,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=413373.3333333333, ans=0.0 2024-09-18 11:32:21,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=413373.3333333333, ans=0.0 2024-09-18 11:32:29,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=413420.0, ans=0.2 2024-09-18 11:33:20,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=413560.0, ans=0.125 2024-09-18 11:33:31,719 INFO [train.py:1198] (1/2) Epoch 23, batch 3350, loss[loss=0.2271, simple_loss=0.283, pruned_loss=0.06477, ctc_loss=0.1296, cr_loss=0.3915, over 33913.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.2725, pruned_loss=0.06402, ctc_loss=0.133, cr_loss=0.408, over 6742772.87 frames. ], batch size: 122, lr: 5.18e-03, grad_scale: 16.0 2024-09-18 11:34:06,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=413700.0, ans=0.025 2024-09-18 11:34:06,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=413700.0, ans=0.0 2024-09-18 11:34:18,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=413700.0, ans=0.125 2024-09-18 11:34:45,003 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.117e+02 2.426e+02 2.836e+02 3.512e+02 7.575e+02, threshold=5.673e+02, percent-clipped=2.0 2024-09-18 11:34:46,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=413793.3333333333, ans=0.125 2024-09-18 11:34:49,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.66 vs. limit=10.0 2024-09-18 11:34:52,989 INFO [train.py:1198] (1/2) Epoch 23, batch 3400, loss[loss=0.1912, simple_loss=0.2387, pruned_loss=0.05393, ctc_loss=0.1099, cr_loss=0.3468, over 34161.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2726, pruned_loss=0.06429, ctc_loss=0.1335, cr_loss=0.4084, over 6732925.24 frames. ], batch size: 78, lr: 5.18e-03, grad_scale: 16.0 2024-09-18 11:34:54,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=413840.0, ans=0.0 2024-09-18 11:34:55,111 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:34:58,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.86 vs. limit=15.0 2024-09-18 11:35:12,915 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:35:30,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=413933.3333333333, ans=0.025 2024-09-18 11:35:33,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=413933.3333333333, ans=0.07 2024-09-18 11:35:33,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=413933.3333333333, ans=0.5 2024-09-18 11:35:39,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=413933.3333333333, ans=0.125 2024-09-18 11:35:56,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=413980.0, ans=0.0 2024-09-18 11:36:15,230 INFO [train.py:1198] (1/2) Epoch 23, batch 3450, loss[loss=0.233, simple_loss=0.2853, pruned_loss=0.06792, ctc_loss=0.1395, cr_loss=0.4256, over 33256.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.273, pruned_loss=0.06435, ctc_loss=0.1337, cr_loss=0.4091, over 6743918.91 frames. ], batch size: 130, lr: 5.18e-03, grad_scale: 16.0 2024-09-18 11:36:17,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=414073.3333333333, ans=0.125 2024-09-18 11:36:17,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.01 vs. limit=15.0 2024-09-18 11:36:41,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=414120.0, ans=0.125 2024-09-18 11:36:46,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=414166.6666666667, ans=0.125 2024-09-18 11:36:59,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.75 vs. limit=12.0 2024-09-18 11:37:16,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=414213.3333333333, ans=0.2 2024-09-18 11:37:28,770 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.059e+02 2.401e+02 2.704e+02 3.763e+02 5.684e+02, threshold=5.408e+02, percent-clipped=2.0 2024-09-18 11:37:36,812 INFO [train.py:1198] (1/2) Epoch 23, batch 3500, loss[loss=0.1949, simple_loss=0.2511, pruned_loss=0.05144, ctc_loss=0.1108, cr_loss=0.3396, over 34464.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.2723, pruned_loss=0.06419, ctc_loss=0.1334, cr_loss=0.4085, over 6747017.92 frames. ], batch size: 85, lr: 5.18e-03, grad_scale: 16.0 2024-09-18 11:37:37,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=414306.6666666667, ans=10.0 2024-09-18 11:37:45,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=414306.6666666667, ans=0.0 2024-09-18 11:38:37,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=414446.6666666667, ans=0.125 2024-09-18 11:38:48,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=414493.3333333333, ans=0.125 2024-09-18 11:38:57,856 INFO [train.py:1198] (1/2) Epoch 23, batch 3550, loss[loss=0.2237, simple_loss=0.2828, pruned_loss=0.06177, ctc_loss=0.1288, cr_loss=0.3852, over 34372.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.2728, pruned_loss=0.06435, ctc_loss=0.1337, cr_loss=0.4094, over 6756000.32 frames. ], batch size: 103, lr: 5.17e-03, grad_scale: 16.0 2024-09-18 11:39:12,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=414586.6666666667, ans=0.0 2024-09-18 11:39:17,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=414586.6666666667, ans=0.0 2024-09-18 11:39:19,696 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.59 vs. limit=12.0 2024-09-18 11:39:29,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=414633.3333333333, ans=0.125 2024-09-18 11:39:33,631 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.28 vs. limit=15.0 2024-09-18 11:39:51,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=414680.0, ans=0.125 2024-09-18 11:39:56,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=414680.0, ans=10.0 2024-09-18 11:40:10,750 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.077e+02 2.505e+02 2.885e+02 3.789e+02 6.341e+02, threshold=5.771e+02, percent-clipped=2.0 2024-09-18 11:40:13,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2024-09-18 11:40:18,778 INFO [train.py:1198] (1/2) Epoch 23, batch 3600, loss[loss=0.2154, simple_loss=0.2674, pruned_loss=0.06096, ctc_loss=0.1294, cr_loss=0.3893, over 34469.00 frames. ], tot_loss[loss=0.2228, simple_loss=0.2733, pruned_loss=0.06455, ctc_loss=0.134, cr_loss=0.4099, over 6766656.72 frames. ], batch size: 90, lr: 5.17e-03, grad_scale: 32.0 2024-09-18 11:40:30,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=414773.3333333333, ans=0.1 2024-09-18 11:40:46,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=414820.0, ans=0.1 2024-09-18 11:41:06,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=414913.3333333333, ans=0.0 2024-09-18 11:41:08,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=414913.3333333333, ans=0.0 2024-09-18 11:41:38,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=415006.6666666667, ans=0.125 2024-09-18 11:41:39,842 INFO [train.py:1198] (1/2) Epoch 23, batch 3650, loss[loss=0.2319, simple_loss=0.2873, pruned_loss=0.06592, ctc_loss=0.139, cr_loss=0.4227, over 34486.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.2726, pruned_loss=0.06407, ctc_loss=0.1331, cr_loss=0.4082, over 6769660.11 frames. ], batch size: 110, lr: 5.17e-03, grad_scale: 32.0 2024-09-18 11:41:43,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=415006.6666666667, ans=0.125 2024-09-18 11:42:20,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=415100.0, ans=0.125 2024-09-18 11:42:51,911 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.059e+02 2.574e+02 2.995e+02 3.988e+02 9.314e+02, threshold=5.991e+02, percent-clipped=7.0 2024-09-18 11:43:00,022 INFO [train.py:1198] (1/2) Epoch 23, batch 3700, loss[loss=0.2393, simple_loss=0.2959, pruned_loss=0.06842, ctc_loss=0.1434, cr_loss=0.4299, over 34601.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.2723, pruned_loss=0.0637, ctc_loss=0.1326, cr_loss=0.4075, over 6783082.80 frames. ], batch size: 102, lr: 5.17e-03, grad_scale: 32.0 2024-09-18 11:43:01,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=415240.0, ans=0.07 2024-09-18 11:43:08,992 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.47 vs. limit=12.0 2024-09-18 11:43:28,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=415286.6666666667, ans=0.0 2024-09-18 11:43:28,919 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:43:46,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=415333.3333333333, ans=0.1 2024-09-18 11:43:50,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=415380.0, ans=0.0 2024-09-18 11:43:55,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=415380.0, ans=0.2 2024-09-18 11:44:05,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.00 vs. limit=22.5 2024-09-18 11:44:15,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=415426.6666666667, ans=0.2 2024-09-18 11:44:18,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=415426.6666666667, ans=0.125 2024-09-18 11:44:22,991 INFO [train.py:1198] (1/2) Epoch 23, batch 3750, loss[loss=0.2268, simple_loss=0.281, pruned_loss=0.06397, ctc_loss=0.1347, cr_loss=0.4417, over 34400.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2755, pruned_loss=0.06488, ctc_loss=0.1348, cr_loss=0.4124, over 6784582.00 frames. ], batch size: 113, lr: 5.17e-03, grad_scale: 32.0 2024-09-18 11:44:33,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2024-09-18 11:44:55,784 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:44:58,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=415566.6666666667, ans=0.0 2024-09-18 11:45:00,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=415566.6666666667, ans=0.1 2024-09-18 11:45:01,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=415566.6666666667, ans=0.125 2024-09-18 11:45:08,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=415566.6666666667, ans=0.2 2024-09-18 11:45:24,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=415613.3333333333, ans=0.125 2024-09-18 11:45:24,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=415613.3333333333, ans=0.125 2024-09-18 11:45:35,421 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.033e+02 2.313e+02 2.457e+02 2.719e+02 6.086e+02, threshold=4.914e+02, percent-clipped=1.0 2024-09-18 11:45:36,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.48 vs. limit=22.5 2024-09-18 11:45:44,067 INFO [train.py:1198] (1/2) Epoch 23, batch 3800, loss[loss=0.2492, simple_loss=0.29, pruned_loss=0.07901, ctc_loss=0.1602, cr_loss=0.461, over 29933.00 frames. ], tot_loss[loss=0.228, simple_loss=0.2783, pruned_loss=0.06664, ctc_loss=0.138, cr_loss=0.4176, over 6674346.21 frames. ], batch size: 175, lr: 5.17e-03, grad_scale: 32.0 2024-09-18 11:46:17,145 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.77 vs. limit=6.0 2024-09-18 11:46:31,378 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:46:41,713 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.51 vs. limit=22.5 2024-09-18 11:46:47,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=415846.6666666667, ans=0.0 2024-09-18 11:46:51,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=415893.3333333333, ans=0.125 2024-09-18 11:47:00,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=415893.3333333333, ans=0.1 2024-09-18 11:47:06,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.11 vs. limit=15.0 2024-09-18 11:47:07,330 INFO [train.py:1198] (1/2) Epoch 23, batch 3850, loss[loss=0.242, simple_loss=0.286, pruned_loss=0.07477, ctc_loss=0.1556, cr_loss=0.4313, over 23921.00 frames. ], tot_loss[loss=0.2325, simple_loss=0.2812, pruned_loss=0.06916, ctc_loss=0.143, cr_loss=0.4219, over 6250020.65 frames. ], batch size: 246, lr: 5.17e-03, grad_scale: 32.0 2024-09-18 11:47:20,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=415940.0, ans=0.125 2024-09-18 11:47:24,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=415986.6666666667, ans=0.025 2024-09-18 11:47:37,092 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2024-09-18 11:47:44,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=416033.3333333333, ans=0.125 2024-09-18 11:48:41,407 INFO [train.py:1198] (1/2) Epoch 24, batch 0, loss[loss=0.2002, simple_loss=0.2521, pruned_loss=0.05527, ctc_loss=0.1167, cr_loss=0.3587, over 34451.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2521, pruned_loss=0.05527, ctc_loss=0.1167, cr_loss=0.3587, over 34451.00 frames. ], batch size: 85, lr: 5.05e-03, grad_scale: 32.0 2024-09-18 11:48:41,407 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 11:48:58,334 INFO [train.py:1230] (1/2) Epoch 24, validation: loss=0.1491, simple_loss=0.2462, pruned_loss=0.02188, ctc_loss=0.0409, cr_loss=1.895e-14, over 944034.00 frames. 2024-09-18 11:48:58,334 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 11:49:12,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2024-09-18 11:49:13,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=416112.6666666667, ans=0.125 2024-09-18 11:49:17,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=416112.6666666667, ans=0.2 2024-09-18 11:49:28,462 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.185e+02 2.704e+02 2.886e+02 3.400e+02 5.981e+02, threshold=5.773e+02, percent-clipped=4.0 2024-09-18 11:49:33,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=416159.3333333333, ans=0.125 2024-09-18 11:49:38,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=416159.3333333333, ans=0.125 2024-09-18 11:49:44,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.32 vs. limit=15.0 2024-09-18 11:49:52,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.91 vs. limit=22.5 2024-09-18 11:49:55,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=416206.0, ans=0.125 2024-09-18 11:50:02,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=416206.0, ans=0.125 2024-09-18 11:50:02,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=416206.0, ans=0.125 2024-09-18 11:50:02,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.86 vs. limit=10.0 2024-09-18 11:50:21,980 INFO [train.py:1198] (1/2) Epoch 24, batch 50, loss[loss=0.1991, simple_loss=0.2494, pruned_loss=0.05488, ctc_loss=0.1186, cr_loss=0.3819, over 34468.00 frames. ], tot_loss[loss=0.2258, simple_loss=0.2761, pruned_loss=0.06585, ctc_loss=0.1367, cr_loss=0.4134, over 1479589.60 frames. ], batch size: 82, lr: 5.05e-03, grad_scale: 16.0 2024-09-18 11:50:27,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=416299.3333333333, ans=0.07 2024-09-18 11:51:15,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=416439.3333333333, ans=0.1 2024-09-18 11:51:30,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=416486.0, ans=0.2 2024-09-18 11:51:46,284 INFO [train.py:1198] (1/2) Epoch 24, batch 100, loss[loss=0.2175, simple_loss=0.2678, pruned_loss=0.06307, ctc_loss=0.1275, cr_loss=0.3901, over 34579.00 frames. ], tot_loss[loss=0.226, simple_loss=0.2769, pruned_loss=0.06562, ctc_loss=0.1361, cr_loss=0.4146, over 2628978.10 frames. ], batch size: 89, lr: 5.05e-03, grad_scale: 16.0 2024-09-18 11:51:47,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=15.0 2024-09-18 11:52:08,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=416579.3333333333, ans=0.0 2024-09-18 11:52:19,408 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.045e+02 2.394e+02 2.795e+02 3.314e+02 6.566e+02, threshold=5.591e+02, percent-clipped=2.0 2024-09-18 11:52:21,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.27 vs. limit=22.5 2024-09-18 11:52:29,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=416626.0, ans=0.2 2024-09-18 11:52:31,735 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2024-09-18 11:52:32,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2024-09-18 11:52:34,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=416626.0, ans=0.125 2024-09-18 11:52:34,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=416626.0, ans=0.1 2024-09-18 11:52:51,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.19 vs. limit=15.0 2024-09-18 11:53:03,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=416719.3333333333, ans=0.1 2024-09-18 11:53:10,072 INFO [train.py:1198] (1/2) Epoch 24, batch 150, loss[loss=0.2114, simple_loss=0.2564, pruned_loss=0.06192, ctc_loss=0.1303, cr_loss=0.413, over 34466.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2745, pruned_loss=0.06456, ctc_loss=0.1344, cr_loss=0.4109, over 3556726.69 frames. ], batch size: 82, lr: 5.05e-03, grad_scale: 16.0 2024-09-18 11:53:25,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=416812.6666666667, ans=0.07 2024-09-18 11:54:00,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=8.71 vs. limit=15.0 2024-09-18 11:54:06,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=416906.0, ans=0.0 2024-09-18 11:54:22,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=416952.6666666667, ans=0.1 2024-09-18 11:54:31,989 INFO [train.py:1198] (1/2) Epoch 24, batch 200, loss[loss=0.2523, simple_loss=0.3005, pruned_loss=0.0777, ctc_loss=0.1568, cr_loss=0.433, over 31798.00 frames. ], tot_loss[loss=0.2223, simple_loss=0.2732, pruned_loss=0.06415, ctc_loss=0.1336, cr_loss=0.41, over 4272123.53 frames. ], batch size: 145, lr: 5.05e-03, grad_scale: 16.0 2024-09-18 11:54:45,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=416999.3333333333, ans=0.0 2024-09-18 11:54:45,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=416999.3333333333, ans=0.125 2024-09-18 11:54:46,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.89 vs. limit=15.0 2024-09-18 11:54:55,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=417046.0, ans=0.0 2024-09-18 11:55:03,392 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.448e+02 3.047e+02 4.568e+02 9.221e+02, threshold=6.095e+02, percent-clipped=11.0 2024-09-18 11:55:05,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=417092.6666666667, ans=0.02 2024-09-18 11:55:15,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=417092.6666666667, ans=0.2 2024-09-18 11:55:25,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=417139.3333333333, ans=0.125 2024-09-18 11:55:28,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=417139.3333333333, ans=0.125 2024-09-18 11:55:46,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417186.0, ans=0.1 2024-09-18 11:55:59,631 INFO [train.py:1198] (1/2) Epoch 24, batch 250, loss[loss=0.2358, simple_loss=0.2863, pruned_loss=0.0691, ctc_loss=0.1445, cr_loss=0.4549, over 34282.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.2728, pruned_loss=0.06391, ctc_loss=0.1331, cr_loss=0.4093, over 4834976.81 frames. ], batch size: 117, lr: 5.05e-03, grad_scale: 16.0 2024-09-18 11:56:13,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=417232.6666666667, ans=0.0 2024-09-18 11:56:19,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=417279.3333333333, ans=0.0 2024-09-18 11:56:42,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=417326.0, ans=0.1 2024-09-18 11:56:50,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=417372.6666666667, ans=0.025 2024-09-18 11:57:07,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=417419.3333333333, ans=0.125 2024-09-18 11:57:21,730 INFO [train.py:1198] (1/2) Epoch 24, batch 300, loss[loss=0.2481, simple_loss=0.2939, pruned_loss=0.07605, ctc_loss=0.1578, cr_loss=0.4677, over 34356.00 frames. ], tot_loss[loss=0.2216, simple_loss=0.2723, pruned_loss=0.06395, ctc_loss=0.1331, cr_loss=0.4095, over 5264754.95 frames. ], batch size: 107, lr: 5.04e-03, grad_scale: 16.0 2024-09-18 11:57:25,887 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=12.0 2024-09-18 11:57:25,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.80 vs. limit=15.0 2024-09-18 11:57:28,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=417466.0, ans=0.1 2024-09-18 11:57:45,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=417512.6666666667, ans=0.1 2024-09-18 11:57:51,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=417512.6666666667, ans=0.2 2024-09-18 11:57:53,121 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.158e+02 2.490e+02 2.962e+02 3.627e+02 6.463e+02, threshold=5.925e+02, percent-clipped=4.0 2024-09-18 11:57:54,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.49 vs. limit=10.0 2024-09-18 11:57:55,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=417559.3333333333, ans=0.2 2024-09-18 11:58:00,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=417559.3333333333, ans=0.2 2024-09-18 11:58:33,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417652.6666666667, ans=0.1 2024-09-18 11:58:34,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=417652.6666666667, ans=0.1 2024-09-18 11:58:43,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=417699.3333333333, ans=0.2 2024-09-18 11:58:44,435 INFO [train.py:1198] (1/2) Epoch 24, batch 350, loss[loss=0.2033, simple_loss=0.2507, pruned_loss=0.05846, ctc_loss=0.1183, cr_loss=0.3832, over 34283.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.2729, pruned_loss=0.06414, ctc_loss=0.1334, cr_loss=0.41, over 5599909.35 frames. ], batch size: 83, lr: 5.04e-03, grad_scale: 16.0 2024-09-18 11:58:47,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=417699.3333333333, ans=0.125 2024-09-18 11:59:36,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=417839.3333333333, ans=0.025 2024-09-18 11:59:39,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=417839.3333333333, ans=0.125 2024-09-18 11:59:41,704 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 11:59:43,844 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.69 vs. limit=10.0 2024-09-18 11:59:46,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=417839.3333333333, ans=0.5 2024-09-18 12:00:04,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=417886.0, ans=0.125 2024-09-18 12:00:10,520 INFO [train.py:1198] (1/2) Epoch 24, batch 400, loss[loss=0.2296, simple_loss=0.2798, pruned_loss=0.06712, ctc_loss=0.1392, cr_loss=0.4306, over 34406.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2724, pruned_loss=0.06384, ctc_loss=0.133, cr_loss=0.4087, over 5865870.14 frames. ], batch size: 95, lr: 5.04e-03, grad_scale: 32.0 2024-09-18 12:00:41,918 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.432e+02 2.779e+02 3.414e+02 6.535e+02, threshold=5.558e+02, percent-clipped=3.0 2024-09-18 12:01:01,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=418072.6666666667, ans=0.09899494936611666 2024-09-18 12:01:07,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=418072.6666666667, ans=0.025 2024-09-18 12:01:08,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=418072.6666666667, ans=0.125 2024-09-18 12:01:33,740 INFO [train.py:1198] (1/2) Epoch 24, batch 450, loss[loss=0.2269, simple_loss=0.2796, pruned_loss=0.06533, ctc_loss=0.1357, cr_loss=0.4114, over 34704.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2724, pruned_loss=0.06384, ctc_loss=0.1328, cr_loss=0.4083, over 6053949.56 frames. ], batch size: 97, lr: 5.04e-03, grad_scale: 32.0 2024-09-18 12:01:45,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=418166.0, ans=0.125 2024-09-18 12:01:52,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=418212.6666666667, ans=0.125 2024-09-18 12:02:03,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=418212.6666666667, ans=0.0 2024-09-18 12:02:05,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=418259.3333333333, ans=0.125 2024-09-18 12:02:11,003 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2024-09-18 12:02:11,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2024-09-18 12:02:38,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=418352.6666666667, ans=0.0 2024-09-18 12:02:46,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=418352.6666666667, ans=0.0 2024-09-18 12:02:58,229 INFO [train.py:1198] (1/2) Epoch 24, batch 500, loss[loss=0.2464, simple_loss=0.2935, pruned_loss=0.07485, ctc_loss=0.1548, cr_loss=0.4636, over 34445.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2717, pruned_loss=0.06363, ctc_loss=0.1325, cr_loss=0.4073, over 6220691.78 frames. ], batch size: 110, lr: 5.04e-03, grad_scale: 16.0 2024-09-18 12:03:00,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=418399.3333333333, ans=0.025 2024-09-18 12:03:11,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2024-09-18 12:03:16,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=418446.0, ans=0.0 2024-09-18 12:03:21,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=418446.0, ans=0.0 2024-09-18 12:03:33,480 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.049e+02 2.390e+02 2.662e+02 3.375e+02 5.501e+02, threshold=5.324e+02, percent-clipped=0.0 2024-09-18 12:03:33,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=418492.6666666667, ans=0.125 2024-09-18 12:03:36,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.93 vs. limit=15.0 2024-09-18 12:03:39,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.18 vs. limit=22.5 2024-09-18 12:03:40,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=418492.6666666667, ans=0.1 2024-09-18 12:03:43,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=418492.6666666667, ans=0.125 2024-09-18 12:03:50,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=418539.3333333333, ans=0.0 2024-09-18 12:03:58,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=418539.3333333333, ans=0.0 2024-09-18 12:04:00,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=418539.3333333333, ans=0.1 2024-09-18 12:04:05,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=418586.0, ans=0.2 2024-09-18 12:04:06,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=418586.0, ans=0.0 2024-09-18 12:04:12,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.whiten.whitening_limit, batch_count=418586.0, ans=12.0 2024-09-18 12:04:16,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=418586.0, ans=0.125 2024-09-18 12:04:22,859 INFO [train.py:1198] (1/2) Epoch 24, batch 550, loss[loss=0.2245, simple_loss=0.2795, pruned_loss=0.06301, ctc_loss=0.1336, cr_loss=0.4172, over 33830.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2712, pruned_loss=0.06326, ctc_loss=0.1318, cr_loss=0.4061, over 6330189.59 frames. ], batch size: 122, lr: 5.04e-03, grad_scale: 16.0 2024-09-18 12:04:23,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=418632.6666666667, ans=0.125 2024-09-18 12:04:33,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.77 vs. limit=15.0 2024-09-18 12:04:37,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=418679.3333333333, ans=0.025 2024-09-18 12:04:44,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.94 vs. limit=15.0 2024-09-18 12:04:55,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=418726.0, ans=0.125 2024-09-18 12:05:22,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=418772.6666666667, ans=0.0 2024-09-18 12:05:45,130 INFO [train.py:1198] (1/2) Epoch 24, batch 600, loss[loss=0.2343, simple_loss=0.2833, pruned_loss=0.06969, ctc_loss=0.1448, cr_loss=0.4234, over 34224.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2717, pruned_loss=0.06361, ctc_loss=0.1324, cr_loss=0.407, over 6431108.26 frames. ], batch size: 117, lr: 5.04e-03, grad_scale: 16.0 2024-09-18 12:05:45,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=418866.0, ans=0.2 2024-09-18 12:06:00,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.77 vs. limit=22.5 2024-09-18 12:06:02,887 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=22.5 2024-09-18 12:06:08,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=418912.6666666667, ans=0.0 2024-09-18 12:06:14,051 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.73 vs. limit=22.5 2024-09-18 12:06:17,939 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.936e+02 2.450e+02 2.779e+02 3.715e+02 7.204e+02, threshold=5.557e+02, percent-clipped=3.0 2024-09-18 12:06:26,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=418959.3333333333, ans=0.125 2024-09-18 12:07:03,735 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.43 vs. limit=10.0 2024-09-18 12:07:09,076 INFO [train.py:1198] (1/2) Epoch 24, batch 650, loss[loss=0.2183, simple_loss=0.2672, pruned_loss=0.06329, ctc_loss=0.1321, cr_loss=0.4088, over 34530.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2711, pruned_loss=0.06327, ctc_loss=0.1319, cr_loss=0.4064, over 6523010.30 frames. ], batch size: 94, lr: 5.04e-03, grad_scale: 16.0 2024-09-18 12:07:41,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=419146.0, ans=0.0 2024-09-18 12:07:43,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=419192.6666666667, ans=0.125 2024-09-18 12:07:49,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=419192.6666666667, ans=0.05 2024-09-18 12:07:50,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.06 vs. limit=15.0 2024-09-18 12:07:51,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=419192.6666666667, ans=0.125 2024-09-18 12:07:51,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=419192.6666666667, ans=0.125 2024-09-18 12:07:53,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=419192.6666666667, ans=0.1 2024-09-18 12:07:58,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=419192.6666666667, ans=0.125 2024-09-18 12:07:59,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=419239.3333333333, ans=0.2 2024-09-18 12:08:04,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=419239.3333333333, ans=0.1 2024-09-18 12:08:12,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2024-09-18 12:08:34,008 INFO [train.py:1198] (1/2) Epoch 24, batch 700, loss[loss=0.2049, simple_loss=0.2567, pruned_loss=0.05718, ctc_loss=0.1205, cr_loss=0.3644, over 34566.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2718, pruned_loss=0.0635, ctc_loss=0.1323, cr_loss=0.408, over 6579735.01 frames. ], batch size: 89, lr: 5.03e-03, grad_scale: 16.0 2024-09-18 12:08:34,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=419332.6666666667, ans=0.125 2024-09-18 12:08:58,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=419379.3333333333, ans=0.125 2024-09-18 12:09:02,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=419379.3333333333, ans=0.125 2024-09-18 12:09:06,874 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.489e+02 3.171e+02 4.157e+02 7.022e+02, threshold=6.343e+02, percent-clipped=9.0 2024-09-18 12:09:14,003 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:09:18,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=419426.0, ans=0.1 2024-09-18 12:09:23,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.84 vs. limit=6.0 2024-09-18 12:09:27,255 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:09:43,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=419519.3333333333, ans=0.125 2024-09-18 12:09:56,571 INFO [train.py:1198] (1/2) Epoch 24, batch 750, loss[loss=0.2189, simple_loss=0.272, pruned_loss=0.06171, ctc_loss=0.1312, cr_loss=0.4053, over 34396.00 frames. ], tot_loss[loss=0.2205, simple_loss=0.2715, pruned_loss=0.06341, ctc_loss=0.1321, cr_loss=0.4072, over 6623025.09 frames. ], batch size: 95, lr: 5.03e-03, grad_scale: 16.0 2024-09-18 12:10:13,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2024-09-18 12:10:39,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=419659.3333333333, ans=0.125 2024-09-18 12:10:58,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=419706.0, ans=0.125 2024-09-18 12:11:23,050 INFO [train.py:1198] (1/2) Epoch 24, batch 800, loss[loss=0.2027, simple_loss=0.2576, pruned_loss=0.05469, ctc_loss=0.1167, cr_loss=0.3789, over 34512.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.2717, pruned_loss=0.06345, ctc_loss=0.1323, cr_loss=0.4073, over 6659781.13 frames. ], batch size: 85, lr: 5.03e-03, grad_scale: 32.0 2024-09-18 12:11:31,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=419799.3333333333, ans=0.1 2024-09-18 12:11:42,067 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.79 vs. limit=15.0 2024-09-18 12:11:54,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=419892.6666666667, ans=0.0 2024-09-18 12:11:55,920 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.402e+02 2.805e+02 3.427e+02 5.563e+02, threshold=5.609e+02, percent-clipped=0.0 2024-09-18 12:12:11,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=419939.3333333333, ans=0.0 2024-09-18 12:12:17,707 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:12:18,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.77 vs. limit=10.0 2024-09-18 12:12:40,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=419986.0, ans=0.125 2024-09-18 12:12:45,282 INFO [train.py:1198] (1/2) Epoch 24, batch 850, loss[loss=0.2288, simple_loss=0.2857, pruned_loss=0.06403, ctc_loss=0.1348, cr_loss=0.4232, over 34373.00 frames. ], tot_loss[loss=0.2203, simple_loss=0.2714, pruned_loss=0.06324, ctc_loss=0.1318, cr_loss=0.4067, over 6693562.69 frames. ], batch size: 103, lr: 5.03e-03, grad_scale: 32.0 2024-09-18 12:12:53,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=420032.6666666667, ans=0.2 2024-09-18 12:12:53,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=420032.6666666667, ans=0.125 2024-09-18 12:13:12,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2024-09-18 12:13:15,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=420079.3333333333, ans=0.0 2024-09-18 12:13:18,437 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:13:23,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=420126.0, ans=0.2 2024-09-18 12:13:33,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=420172.6666666667, ans=0.1 2024-09-18 12:13:53,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=420219.3333333333, ans=0.125 2024-09-18 12:13:53,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2024-09-18 12:14:09,797 INFO [train.py:1198] (1/2) Epoch 24, batch 900, loss[loss=0.1955, simple_loss=0.249, pruned_loss=0.05201, ctc_loss=0.115, cr_loss=0.3731, over 34479.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.2718, pruned_loss=0.06342, ctc_loss=0.1322, cr_loss=0.4074, over 6698880.62 frames. ], batch size: 85, lr: 5.03e-03, grad_scale: 32.0 2024-09-18 12:14:26,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=420312.6666666667, ans=0.0 2024-09-18 12:14:42,578 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.617e+02 3.117e+02 3.925e+02 5.688e+02, threshold=6.233e+02, percent-clipped=2.0 2024-09-18 12:14:48,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=420359.3333333333, ans=0.035 2024-09-18 12:14:59,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=420406.0, ans=0.0 2024-09-18 12:15:06,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=420406.0, ans=0.1 2024-09-18 12:15:17,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=420452.6666666667, ans=0.035 2024-09-18 12:15:32,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=420499.3333333333, ans=0.1 2024-09-18 12:15:33,789 INFO [train.py:1198] (1/2) Epoch 24, batch 950, loss[loss=0.1907, simple_loss=0.242, pruned_loss=0.05188, ctc_loss=0.108, cr_loss=0.3519, over 34722.00 frames. ], tot_loss[loss=0.221, simple_loss=0.272, pruned_loss=0.06357, ctc_loss=0.1323, cr_loss=0.4079, over 6703996.69 frames. ], batch size: 87, lr: 5.03e-03, grad_scale: 32.0 2024-09-18 12:15:42,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=420499.3333333333, ans=0.025 2024-09-18 12:15:45,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=420499.3333333333, ans=0.2 2024-09-18 12:16:15,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=420592.6666666667, ans=0.07 2024-09-18 12:16:15,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=420592.6666666667, ans=0.125 2024-09-18 12:16:56,234 INFO [train.py:1198] (1/2) Epoch 24, batch 1000, loss[loss=0.209, simple_loss=0.2594, pruned_loss=0.05926, ctc_loss=0.1242, cr_loss=0.3792, over 34487.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.2726, pruned_loss=0.06396, ctc_loss=0.1331, cr_loss=0.4096, over 6695764.20 frames. ], batch size: 90, lr: 5.03e-03, grad_scale: 16.0 2024-09-18 12:17:21,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=420779.3333333333, ans=0.125 2024-09-18 12:17:30,698 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.083e+02 2.702e+02 3.130e+02 3.952e+02 6.996e+02, threshold=6.260e+02, percent-clipped=2.0 2024-09-18 12:17:44,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=420826.0, ans=0.1 2024-09-18 12:17:54,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=420872.6666666667, ans=0.125 2024-09-18 12:18:14,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=420919.3333333333, ans=0.125 2024-09-18 12:18:20,631 INFO [train.py:1198] (1/2) Epoch 24, batch 1050, loss[loss=0.2192, simple_loss=0.2731, pruned_loss=0.06172, ctc_loss=0.1295, cr_loss=0.4002, over 34572.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.272, pruned_loss=0.06384, ctc_loss=0.1329, cr_loss=0.4088, over 6705438.28 frames. ], batch size: 99, lr: 5.02e-03, grad_scale: 16.0 2024-09-18 12:19:45,153 INFO [train.py:1198] (1/2) Epoch 24, batch 1100, loss[loss=0.2236, simple_loss=0.2711, pruned_loss=0.06643, ctc_loss=0.1342, cr_loss=0.4086, over 34364.00 frames. ], tot_loss[loss=0.221, simple_loss=0.2719, pruned_loss=0.06365, ctc_loss=0.1326, cr_loss=0.4077, over 6716963.74 frames. ], batch size: 91, lr: 5.02e-03, grad_scale: 16.0 2024-09-18 12:20:10,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=421246.0, ans=0.0 2024-09-18 12:20:15,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=421246.0, ans=0.0 2024-09-18 12:20:15,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=421246.0, ans=0.125 2024-09-18 12:20:19,839 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.408e+02 2.788e+02 3.449e+02 7.994e+02, threshold=5.575e+02, percent-clipped=2.0 2024-09-18 12:20:40,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=421339.3333333333, ans=0.5 2024-09-18 12:20:53,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=421386.0, ans=0.0 2024-09-18 12:21:08,298 INFO [train.py:1198] (1/2) Epoch 24, batch 1150, loss[loss=0.2171, simple_loss=0.2702, pruned_loss=0.06139, ctc_loss=0.1254, cr_loss=0.4045, over 34372.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2718, pruned_loss=0.0636, ctc_loss=0.1324, cr_loss=0.4077, over 6715862.89 frames. ], batch size: 91, lr: 5.02e-03, grad_scale: 16.0 2024-09-18 12:21:10,228 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:21:25,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=421479.3333333333, ans=0.1 2024-09-18 12:21:30,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=421479.3333333333, ans=0.025 2024-09-18 12:21:38,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=421479.3333333333, ans=0.07 2024-09-18 12:22:00,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=421572.6666666667, ans=0.125 2024-09-18 12:22:00,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=421572.6666666667, ans=0.0 2024-09-18 12:22:26,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=421619.3333333333, ans=0.0 2024-09-18 12:22:30,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.41 vs. limit=10.0 2024-09-18 12:22:35,144 INFO [train.py:1198] (1/2) Epoch 24, batch 1200, loss[loss=0.2194, simple_loss=0.273, pruned_loss=0.0615, ctc_loss=0.1326, cr_loss=0.4078, over 34551.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.2729, pruned_loss=0.06382, ctc_loss=0.1328, cr_loss=0.4084, over 6708524.71 frames. ], batch size: 99, lr: 5.02e-03, grad_scale: 32.0 2024-09-18 12:22:37,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=421666.0, ans=0.125 2024-09-18 12:22:40,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=421666.0, ans=0.0 2024-09-18 12:22:48,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=421666.0, ans=0.125 2024-09-18 12:22:58,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=421712.6666666667, ans=0.0 2024-09-18 12:22:58,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=421712.6666666667, ans=0.125 2024-09-18 12:23:00,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=421712.6666666667, ans=0.1 2024-09-18 12:23:09,849 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.409e+02 2.702e+02 3.074e+02 4.658e+02, threshold=5.405e+02, percent-clipped=0.0 2024-09-18 12:23:11,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=421759.3333333333, ans=0.025 2024-09-18 12:23:18,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=421759.3333333333, ans=0.2 2024-09-18 12:23:53,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=421852.6666666667, ans=0.125 2024-09-18 12:23:53,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=421852.6666666667, ans=0.125 2024-09-18 12:23:58,170 INFO [train.py:1198] (1/2) Epoch 24, batch 1250, loss[loss=0.2312, simple_loss=0.2846, pruned_loss=0.06713, ctc_loss=0.1353, cr_loss=0.4146, over 34331.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.2734, pruned_loss=0.06397, ctc_loss=0.1331, cr_loss=0.4089, over 6742421.26 frames. ], batch size: 107, lr: 5.02e-03, grad_scale: 32.0 2024-09-18 12:23:58,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=421899.3333333333, ans=0.0 2024-09-18 12:24:03,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=421899.3333333333, ans=0.025 2024-09-18 12:24:44,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=421992.6666666667, ans=0.0 2024-09-18 12:25:00,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=422039.3333333333, ans=0.125 2024-09-18 12:25:05,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.36 vs. limit=10.0 2024-09-18 12:25:06,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=422086.0, ans=0.025 2024-09-18 12:25:07,347 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.14 vs. limit=15.0 2024-09-18 12:25:11,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=422086.0, ans=0.125 2024-09-18 12:25:22,628 INFO [train.py:1198] (1/2) Epoch 24, batch 1300, loss[loss=0.2326, simple_loss=0.2874, pruned_loss=0.06678, ctc_loss=0.1395, cr_loss=0.4063, over 32950.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.2725, pruned_loss=0.06365, ctc_loss=0.1324, cr_loss=0.4073, over 6743615.94 frames. ], batch size: 130, lr: 5.02e-03, grad_scale: 32.0 2024-09-18 12:25:40,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=422179.3333333333, ans=0.125 2024-09-18 12:25:50,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=422179.3333333333, ans=0.125 2024-09-18 12:25:54,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=422226.0, ans=0.125 2024-09-18 12:25:57,158 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.962e+02 2.461e+02 2.889e+02 3.578e+02 7.022e+02, threshold=5.777e+02, percent-clipped=3.0 2024-09-18 12:26:10,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=422272.6666666667, ans=0.125 2024-09-18 12:26:42,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=422319.3333333333, ans=0.0 2024-09-18 12:26:44,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=422319.3333333333, ans=0.125 2024-09-18 12:26:47,346 INFO [train.py:1198] (1/2) Epoch 24, batch 1350, loss[loss=0.2133, simple_loss=0.2651, pruned_loss=0.06002, ctc_loss=0.1267, cr_loss=0.4002, over 34522.00 frames. ], tot_loss[loss=0.2206, simple_loss=0.272, pruned_loss=0.06332, ctc_loss=0.1319, cr_loss=0.4066, over 6762347.76 frames. ], batch size: 94, lr: 5.02e-03, grad_scale: 32.0 2024-09-18 12:26:59,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=422366.0, ans=0.2 2024-09-18 12:27:24,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=422459.3333333333, ans=0.125 2024-09-18 12:27:33,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=422459.3333333333, ans=0.125 2024-09-18 12:28:09,074 INFO [train.py:1198] (1/2) Epoch 24, batch 1400, loss[loss=0.1892, simple_loss=0.2434, pruned_loss=0.04946, ctc_loss=0.1072, cr_loss=0.365, over 34254.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.272, pruned_loss=0.06335, ctc_loss=0.1318, cr_loss=0.4069, over 6775607.39 frames. ], batch size: 80, lr: 5.01e-03, grad_scale: 32.0 2024-09-18 12:28:11,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=422599.3333333333, ans=0.1 2024-09-18 12:28:11,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=422599.3333333333, ans=0.0 2024-09-18 12:28:12,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=422599.3333333333, ans=0.125 2024-09-18 12:28:30,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=422646.0, ans=0.07 2024-09-18 12:28:43,322 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.478e+02 3.040e+02 3.729e+02 6.770e+02, threshold=6.080e+02, percent-clipped=2.0 2024-09-18 12:28:48,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=422692.6666666667, ans=0.125 2024-09-18 12:28:53,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=422692.6666666667, ans=0.0 2024-09-18 12:28:54,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.26 vs. limit=15.0 2024-09-18 12:28:59,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.16 vs. limit=15.0 2024-09-18 12:29:21,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=422786.0, ans=0.0 2024-09-18 12:29:33,410 INFO [train.py:1198] (1/2) Epoch 24, batch 1450, loss[loss=0.2307, simple_loss=0.283, pruned_loss=0.06612, ctc_loss=0.1428, cr_loss=0.442, over 34453.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2728, pruned_loss=0.06364, ctc_loss=0.1325, cr_loss=0.4085, over 6772974.26 frames. ], batch size: 110, lr: 5.01e-03, grad_scale: 32.0 2024-09-18 12:29:47,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=422832.6666666667, ans=0.125 2024-09-18 12:30:13,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=422926.0, ans=0.125 2024-09-18 12:30:13,658 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.891e-03 2024-09-18 12:30:16,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=422926.0, ans=0.0 2024-09-18 12:30:22,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=422926.0, ans=0.0 2024-09-18 12:30:39,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=423019.3333333333, ans=0.1 2024-09-18 12:30:57,825 INFO [train.py:1198] (1/2) Epoch 24, batch 1500, loss[loss=0.2221, simple_loss=0.2789, pruned_loss=0.06181, ctc_loss=0.1278, cr_loss=0.4044, over 34449.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.2733, pruned_loss=0.064, ctc_loss=0.133, cr_loss=0.4099, over 6772398.98 frames. ], batch size: 100, lr: 5.01e-03, grad_scale: 32.0 2024-09-18 12:31:16,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=423112.6666666667, ans=0.125 2024-09-18 12:31:18,882 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.47 vs. limit=12.0 2024-09-18 12:31:19,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=423112.6666666667, ans=0.125 2024-09-18 12:31:32,508 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.912e+02 2.348e+02 2.707e+02 3.205e+02 5.592e+02, threshold=5.413e+02, percent-clipped=0.0 2024-09-18 12:32:01,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=423206.0, ans=0.0 2024-09-18 12:32:20,396 INFO [train.py:1198] (1/2) Epoch 24, batch 1550, loss[loss=0.2397, simple_loss=0.2935, pruned_loss=0.06956, ctc_loss=0.1452, cr_loss=0.4441, over 34460.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.273, pruned_loss=0.0641, ctc_loss=0.1332, cr_loss=0.4098, over 6743874.70 frames. ], batch size: 105, lr: 5.01e-03, grad_scale: 16.0 2024-09-18 12:32:39,016 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:32:44,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=423346.0, ans=0.0 2024-09-18 12:33:01,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2024-09-18 12:33:07,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-09-18 12:33:18,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=423439.3333333333, ans=0.0 2024-09-18 12:33:31,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=423486.0, ans=0.125 2024-09-18 12:33:44,675 INFO [train.py:1198] (1/2) Epoch 24, batch 1600, loss[loss=0.2347, simple_loss=0.2843, pruned_loss=0.06955, ctc_loss=0.1445, cr_loss=0.4278, over 34551.00 frames. ], tot_loss[loss=0.222, simple_loss=0.2728, pruned_loss=0.06412, ctc_loss=0.1333, cr_loss=0.4094, over 6724256.13 frames. ], batch size: 99, lr: 5.01e-03, grad_scale: 32.0 2024-09-18 12:34:00,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=423532.6666666667, ans=0.025 2024-09-18 12:34:01,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=423579.3333333333, ans=0.025 2024-09-18 12:34:03,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=423579.3333333333, ans=0.0 2024-09-18 12:34:04,330 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.22 vs. limit=22.5 2024-09-18 12:34:23,434 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.422e+02 2.987e+02 3.378e+02 6.643e+02, threshold=5.974e+02, percent-clipped=1.0 2024-09-18 12:34:23,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=423626.0, ans=0.0 2024-09-18 12:34:34,145 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.14 vs. limit=15.0 2024-09-18 12:34:40,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=423672.6666666667, ans=0.0 2024-09-18 12:35:00,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=423719.3333333333, ans=0.2 2024-09-18 12:35:07,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=423719.3333333333, ans=0.125 2024-09-18 12:35:09,981 INFO [train.py:1198] (1/2) Epoch 24, batch 1650, loss[loss=0.2322, simple_loss=0.284, pruned_loss=0.06788, ctc_loss=0.1379, cr_loss=0.4258, over 34365.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2724, pruned_loss=0.06385, ctc_loss=0.1327, cr_loss=0.4084, over 6716205.77 frames. ], batch size: 103, lr: 5.01e-03, grad_scale: 32.0 2024-09-18 12:35:35,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.55 vs. limit=22.5 2024-09-18 12:35:40,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=423812.6666666667, ans=0.2 2024-09-18 12:35:42,781 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.73 vs. limit=5.0 2024-09-18 12:35:50,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=423859.3333333333, ans=0.125 2024-09-18 12:35:51,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=423859.3333333333, ans=0.5 2024-09-18 12:35:51,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=423859.3333333333, ans=0.125 2024-09-18 12:36:08,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=423906.0, ans=0.125 2024-09-18 12:36:29,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=423952.6666666667, ans=0.125 2024-09-18 12:36:34,272 INFO [train.py:1198] (1/2) Epoch 24, batch 1700, loss[loss=0.1978, simple_loss=0.2485, pruned_loss=0.05474, ctc_loss=0.1132, cr_loss=0.3733, over 34293.00 frames. ], tot_loss[loss=0.221, simple_loss=0.2721, pruned_loss=0.06359, ctc_loss=0.1323, cr_loss=0.4077, over 6742132.39 frames. ], batch size: 80, lr: 5.01e-03, grad_scale: 32.0 2024-09-18 12:36:52,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=424046.0, ans=0.025 2024-09-18 12:37:04,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=424046.0, ans=0.125 2024-09-18 12:37:05,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424092.6666666667, ans=0.1 2024-09-18 12:37:05,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=424092.6666666667, ans=0.125 2024-09-18 12:37:08,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=424092.6666666667, ans=0.125 2024-09-18 12:37:10,241 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.516e+02 3.064e+02 3.793e+02 8.020e+02, threshold=6.128e+02, percent-clipped=4.0 2024-09-18 12:37:33,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=424139.3333333333, ans=0.2 2024-09-18 12:37:50,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=424186.0, ans=0.125 2024-09-18 12:37:58,839 INFO [train.py:1198] (1/2) Epoch 24, batch 1750, loss[loss=0.1853, simple_loss=0.2377, pruned_loss=0.04901, ctc_loss=0.1057, cr_loss=0.3424, over 34174.00 frames. ], tot_loss[loss=0.2206, simple_loss=0.2717, pruned_loss=0.06343, ctc_loss=0.1319, cr_loss=0.4071, over 6751468.47 frames. ], batch size: 78, lr: 5.00e-03, grad_scale: 32.0 2024-09-18 12:38:22,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=15.0 2024-09-18 12:38:26,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.33 vs. limit=10.0 2024-09-18 12:39:00,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.58 vs. limit=15.0 2024-09-18 12:39:14,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=424419.3333333333, ans=0.1 2024-09-18 12:39:19,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=424466.0, ans=0.025 2024-09-18 12:39:21,020 INFO [train.py:1198] (1/2) Epoch 24, batch 1800, loss[loss=0.2201, simple_loss=0.2731, pruned_loss=0.0617, ctc_loss=0.1378, cr_loss=0.4042, over 34719.00 frames. ], tot_loss[loss=0.2212, simple_loss=0.2723, pruned_loss=0.0637, ctc_loss=0.1324, cr_loss=0.4082, over 6755586.22 frames. ], batch size: 97, lr: 5.00e-03, grad_scale: 32.0 2024-09-18 12:39:24,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424466.0, ans=0.1 2024-09-18 12:39:32,251 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=15.0 2024-09-18 12:39:34,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=424466.0, ans=0.125 2024-09-18 12:39:57,672 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.513e+02 3.062e+02 4.130e+02 7.691e+02, threshold=6.123e+02, percent-clipped=3.0 2024-09-18 12:40:13,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=424606.0, ans=15.0 2024-09-18 12:40:24,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=424606.0, ans=0.2 2024-09-18 12:40:31,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=424652.6666666667, ans=0.0 2024-09-18 12:40:35,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=424652.6666666667, ans=0.0 2024-09-18 12:40:46,383 INFO [train.py:1198] (1/2) Epoch 24, batch 1850, loss[loss=0.2232, simple_loss=0.2783, pruned_loss=0.06309, ctc_loss=0.1311, cr_loss=0.3916, over 34458.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2719, pruned_loss=0.06349, ctc_loss=0.1321, cr_loss=0.4077, over 6762886.55 frames. ], batch size: 100, lr: 5.00e-03, grad_scale: 32.0 2024-09-18 12:41:01,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=424746.0, ans=0.125 2024-09-18 12:41:15,830 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2024-09-18 12:41:21,898 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.48 vs. limit=15.0 2024-09-18 12:41:31,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424792.6666666667, ans=0.1 2024-09-18 12:41:34,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=424839.3333333333, ans=0.125 2024-09-18 12:41:48,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.50 vs. limit=15.0 2024-09-18 12:42:10,590 INFO [train.py:1198] (1/2) Epoch 24, batch 1900, loss[loss=0.2338, simple_loss=0.2849, pruned_loss=0.06868, ctc_loss=0.1421, cr_loss=0.423, over 34395.00 frames. ], tot_loss[loss=0.2216, simple_loss=0.2727, pruned_loss=0.06381, ctc_loss=0.1327, cr_loss=0.409, over 6772388.22 frames. ], batch size: 103, lr: 5.00e-03, grad_scale: 32.0 2024-09-18 12:42:16,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=424932.6666666667, ans=0.1 2024-09-18 12:42:36,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.69 vs. limit=15.0 2024-09-18 12:42:36,783 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2024-09-18 12:42:47,350 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.546e+02 3.019e+02 3.645e+02 5.584e+02, threshold=6.037e+02, percent-clipped=0.0 2024-09-18 12:42:54,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=425026.0, ans=0.125 2024-09-18 12:42:59,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=425072.6666666667, ans=0.1 2024-09-18 12:42:59,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=425072.6666666667, ans=0.0 2024-09-18 12:43:01,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.26 vs. limit=22.5 2024-09-18 12:43:04,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.92 vs. limit=22.5 2024-09-18 12:43:05,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=425072.6666666667, ans=0.0 2024-09-18 12:43:10,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=425072.6666666667, ans=0.1 2024-09-18 12:43:12,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.77 vs. limit=15.0 2024-09-18 12:43:33,178 INFO [train.py:1198] (1/2) Epoch 24, batch 1950, loss[loss=0.2126, simple_loss=0.2636, pruned_loss=0.05993, ctc_loss=0.1267, cr_loss=0.4096, over 34366.00 frames. ], tot_loss[loss=0.2226, simple_loss=0.2737, pruned_loss=0.06413, ctc_loss=0.1334, cr_loss=0.4109, over 6789362.92 frames. ], batch size: 91, lr: 5.00e-03, grad_scale: 32.0 2024-09-18 12:43:41,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.10 vs. limit=22.5 2024-09-18 12:43:56,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=425212.6666666667, ans=0.125 2024-09-18 12:44:01,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=425212.6666666667, ans=0.125 2024-09-18 12:44:02,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.10 vs. limit=10.0 2024-09-18 12:44:47,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=425352.6666666667, ans=0.0 2024-09-18 12:44:50,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=425352.6666666667, ans=0.125 2024-09-18 12:44:54,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=425352.6666666667, ans=0.125 2024-09-18 12:44:57,079 INFO [train.py:1198] (1/2) Epoch 24, batch 2000, loss[loss=0.2048, simple_loss=0.2518, pruned_loss=0.05911, ctc_loss=0.1204, cr_loss=0.3884, over 34149.00 frames. ], tot_loss[loss=0.2231, simple_loss=0.2742, pruned_loss=0.06442, ctc_loss=0.134, cr_loss=0.4117, over 6766097.05 frames. ], batch size: 78, lr: 5.00e-03, grad_scale: 32.0 2024-09-18 12:44:57,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=425399.3333333333, ans=0.125 2024-09-18 12:44:59,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=425399.3333333333, ans=0.125 2024-09-18 12:45:35,656 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.048e+02 2.501e+02 3.059e+02 3.591e+02 6.482e+02, threshold=6.118e+02, percent-clipped=1.0 2024-09-18 12:45:41,559 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.60 vs. limit=22.5 2024-09-18 12:46:22,087 INFO [train.py:1198] (1/2) Epoch 24, batch 2050, loss[loss=0.1788, simple_loss=0.2335, pruned_loss=0.04562, ctc_loss=0.1002, cr_loss=0.32, over 34476.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.2733, pruned_loss=0.0642, ctc_loss=0.1335, cr_loss=0.4105, over 6757840.17 frames. ], batch size: 82, lr: 5.00e-03, grad_scale: 32.0 2024-09-18 12:46:35,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=425632.6666666667, ans=0.125 2024-09-18 12:46:36,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.31 vs. limit=22.5 2024-09-18 12:46:44,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.94 vs. limit=10.0 2024-09-18 12:46:45,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=425679.3333333333, ans=0.0 2024-09-18 12:46:49,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.02 vs. limit=12.0 2024-09-18 12:47:12,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2024-09-18 12:47:16,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=425772.6666666667, ans=0.1 2024-09-18 12:47:46,309 INFO [train.py:1198] (1/2) Epoch 24, batch 2100, loss[loss=0.2315, simple_loss=0.2851, pruned_loss=0.06666, ctc_loss=0.1389, cr_loss=0.4207, over 34545.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2728, pruned_loss=0.06399, ctc_loss=0.1331, cr_loss=0.4097, over 6770922.90 frames. ], batch size: 94, lr: 5.00e-03, grad_scale: 32.0 2024-09-18 12:48:06,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.93 vs. limit=15.0 2024-09-18 12:48:08,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.45 vs. limit=10.0 2024-09-18 12:48:22,242 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.076e+02 2.462e+02 2.835e+02 3.788e+02 6.501e+02, threshold=5.671e+02, percent-clipped=1.0 2024-09-18 12:48:25,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=425959.3333333333, ans=0.035 2024-09-18 12:48:29,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=425959.3333333333, ans=0.1 2024-09-18 12:48:45,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=426006.0, ans=0.2 2024-09-18 12:49:08,209 INFO [train.py:1198] (1/2) Epoch 24, batch 2150, loss[loss=0.2186, simple_loss=0.2679, pruned_loss=0.06349, ctc_loss=0.1319, cr_loss=0.3991, over 34340.00 frames. ], tot_loss[loss=0.2211, simple_loss=0.2722, pruned_loss=0.06357, ctc_loss=0.1323, cr_loss=0.4083, over 6788388.87 frames. ], batch size: 91, lr: 4.99e-03, grad_scale: 32.0 2024-09-18 12:49:25,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=426146.0, ans=0.125 2024-09-18 12:49:30,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=426146.0, ans=0.0 2024-09-18 12:49:52,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2024-09-18 12:49:56,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=426192.6666666667, ans=0.035 2024-09-18 12:50:02,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2024-09-18 12:50:07,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.09 vs. limit=15.0 2024-09-18 12:50:30,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=426286.0, ans=0.125 2024-09-18 12:50:33,169 INFO [train.py:1198] (1/2) Epoch 24, batch 2200, loss[loss=0.2231, simple_loss=0.2811, pruned_loss=0.06117, ctc_loss=0.1326, cr_loss=0.4057, over 34461.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.272, pruned_loss=0.06348, ctc_loss=0.1322, cr_loss=0.408, over 6784543.98 frames. ], batch size: 100, lr: 4.99e-03, grad_scale: 32.0 2024-09-18 12:50:41,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=426332.6666666667, ans=0.125 2024-09-18 12:50:52,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2024-09-18 12:50:54,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.49 vs. limit=15.0 2024-09-18 12:50:55,142 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 12:51:10,915 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.070e+02 2.449e+02 2.797e+02 3.995e+02 6.257e+02, threshold=5.595e+02, percent-clipped=4.0 2024-09-18 12:51:17,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=426426.0, ans=0.2 2024-09-18 12:51:18,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=8.19 vs. limit=15.0 2024-09-18 12:51:57,911 INFO [train.py:1198] (1/2) Epoch 24, batch 2250, loss[loss=0.2331, simple_loss=0.2845, pruned_loss=0.06801, ctc_loss=0.1413, cr_loss=0.4387, over 34413.00 frames. ], tot_loss[loss=0.2207, simple_loss=0.272, pruned_loss=0.06335, ctc_loss=0.1321, cr_loss=0.4078, over 6779771.94 frames. ], batch size: 95, lr: 4.99e-03, grad_scale: 16.0 2024-09-18 12:52:09,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=426566.0, ans=0.1 2024-09-18 12:52:13,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=426612.6666666667, ans=0.1 2024-09-18 12:52:14,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=426612.6666666667, ans=0.125 2024-09-18 12:52:57,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=426706.0, ans=0.2 2024-09-18 12:53:22,163 INFO [train.py:1198] (1/2) Epoch 24, batch 2300, loss[loss=0.2043, simple_loss=0.2563, pruned_loss=0.05633, ctc_loss=0.1233, cr_loss=0.3743, over 34281.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2712, pruned_loss=0.06299, ctc_loss=0.1315, cr_loss=0.406, over 6765367.29 frames. ], batch size: 83, lr: 4.99e-03, grad_scale: 16.0 2024-09-18 12:53:58,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=426892.6666666667, ans=0.125 2024-09-18 12:53:59,970 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.520e+02 2.972e+02 3.872e+02 6.768e+02, threshold=5.944e+02, percent-clipped=1.0 2024-09-18 12:54:17,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=426939.3333333333, ans=0.0 2024-09-18 12:54:18,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=426939.3333333333, ans=0.125 2024-09-18 12:54:18,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=426939.3333333333, ans=0.125 2024-09-18 12:54:44,805 INFO [train.py:1198] (1/2) Epoch 24, batch 2350, loss[loss=0.2264, simple_loss=0.278, pruned_loss=0.06535, ctc_loss=0.139, cr_loss=0.4084, over 34703.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2713, pruned_loss=0.06311, ctc_loss=0.1317, cr_loss=0.4067, over 6771217.98 frames. ], batch size: 97, lr: 4.99e-03, grad_scale: 16.0 2024-09-18 12:55:11,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=427079.3333333333, ans=0.125 2024-09-18 12:55:49,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=427172.6666666667, ans=0.125 2024-09-18 12:55:49,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=427172.6666666667, ans=0.0 2024-09-18 12:55:59,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=427219.3333333333, ans=0.125 2024-09-18 12:56:09,075 INFO [train.py:1198] (1/2) Epoch 24, batch 2400, loss[loss=0.2137, simple_loss=0.2647, pruned_loss=0.06088, ctc_loss=0.1243, cr_loss=0.4, over 34587.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2721, pruned_loss=0.06341, ctc_loss=0.1322, cr_loss=0.4078, over 6775349.87 frames. ], batch size: 89, lr: 4.99e-03, grad_scale: 32.0 2024-09-18 12:56:12,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=427266.0, ans=0.07 2024-09-18 12:56:19,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=427266.0, ans=0.0 2024-09-18 12:56:22,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=427266.0, ans=0.125 2024-09-18 12:56:25,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=427312.6666666667, ans=0.2 2024-09-18 12:56:43,395 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.86 vs. limit=15.0 2024-09-18 12:56:48,711 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.098e+02 2.546e+02 2.922e+02 3.685e+02 5.675e+02, threshold=5.844e+02, percent-clipped=0.0 2024-09-18 12:57:06,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=427406.0, ans=0.1 2024-09-18 12:57:26,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.70 vs. limit=10.0 2024-09-18 12:57:34,216 INFO [train.py:1198] (1/2) Epoch 24, batch 2450, loss[loss=0.2181, simple_loss=0.2721, pruned_loss=0.06107, ctc_loss=0.1289, cr_loss=0.403, over 34438.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.273, pruned_loss=0.06377, ctc_loss=0.133, cr_loss=0.409, over 6750359.48 frames. ], batch size: 95, lr: 4.99e-03, grad_scale: 16.0 2024-09-18 12:58:04,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.50 vs. limit=12.0 2024-09-18 12:58:24,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.01 vs. limit=15.0 2024-09-18 12:58:25,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=427639.3333333333, ans=0.125 2024-09-18 12:58:48,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=427686.0, ans=0.0 2024-09-18 12:58:48,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=427686.0, ans=0.0 2024-09-18 12:58:58,270 INFO [train.py:1198] (1/2) Epoch 24, batch 2500, loss[loss=0.2347, simple_loss=0.2864, pruned_loss=0.06817, ctc_loss=0.1453, cr_loss=0.44, over 34480.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.273, pruned_loss=0.06383, ctc_loss=0.133, cr_loss=0.409, over 6762888.81 frames. ], batch size: 100, lr: 4.98e-03, grad_scale: 16.0 2024-09-18 12:59:06,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2024-09-18 12:59:18,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=427779.3333333333, ans=0.125 2024-09-18 12:59:37,740 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.521e+02 2.960e+02 3.690e+02 5.972e+02, threshold=5.921e+02, percent-clipped=2.0 2024-09-18 13:00:11,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=427919.3333333333, ans=0.125 2024-09-18 13:00:20,841 INFO [train.py:1198] (1/2) Epoch 24, batch 2550, loss[loss=0.1921, simple_loss=0.2416, pruned_loss=0.05262, ctc_loss=0.1122, cr_loss=0.3702, over 34198.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.2726, pruned_loss=0.06362, ctc_loss=0.1326, cr_loss=0.4087, over 6766257.18 frames. ], batch size: 78, lr: 4.98e-03, grad_scale: 16.0 2024-09-18 13:00:40,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=428012.6666666667, ans=0.04949747468305833 2024-09-18 13:00:44,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=428012.6666666667, ans=0.125 2024-09-18 13:00:55,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.09 vs. limit=15.0 2024-09-18 13:00:56,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=428059.3333333333, ans=0.125 2024-09-18 13:01:00,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=428059.3333333333, ans=0.125 2024-09-18 13:01:04,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=428059.3333333333, ans=0.125 2024-09-18 13:01:44,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=428199.3333333333, ans=0.125 2024-09-18 13:01:45,396 INFO [train.py:1198] (1/2) Epoch 24, batch 2600, loss[loss=0.2137, simple_loss=0.263, pruned_loss=0.06113, ctc_loss=0.1287, cr_loss=0.4081, over 34330.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.273, pruned_loss=0.06375, ctc_loss=0.1328, cr_loss=0.4089, over 6761195.45 frames. ], batch size: 91, lr: 4.98e-03, grad_scale: 16.0 2024-09-18 13:01:57,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=428199.3333333333, ans=0.0 2024-09-18 13:02:07,696 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.25 vs. limit=15.0 2024-09-18 13:02:18,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=428292.6666666667, ans=0.025 2024-09-18 13:02:24,833 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.959e+02 2.488e+02 2.855e+02 3.579e+02 1.298e+03, threshold=5.711e+02, percent-clipped=2.0 2024-09-18 13:02:26,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=428292.6666666667, ans=0.125 2024-09-18 13:02:33,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=428292.6666666667, ans=0.125 2024-09-18 13:02:38,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=428339.3333333333, ans=0.125 2024-09-18 13:03:05,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.00 vs. limit=15.0 2024-09-18 13:03:09,877 INFO [train.py:1198] (1/2) Epoch 24, batch 2650, loss[loss=0.231, simple_loss=0.2806, pruned_loss=0.06792, ctc_loss=0.1409, cr_loss=0.4347, over 34248.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.2732, pruned_loss=0.06383, ctc_loss=0.1328, cr_loss=0.4094, over 6769620.74 frames. ], batch size: 117, lr: 4.98e-03, grad_scale: 16.0 2024-09-18 13:03:38,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=428479.3333333333, ans=0.125 2024-09-18 13:03:38,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=428479.3333333333, ans=0.125 2024-09-18 13:03:44,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=428526.0, ans=0.1 2024-09-18 13:03:51,444 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:04:00,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=428572.6666666667, ans=0.0 2024-09-18 13:04:04,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=428572.6666666667, ans=0.125 2024-09-18 13:04:20,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=428619.3333333333, ans=0.1 2024-09-18 13:04:22,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=428619.3333333333, ans=15.0 2024-09-18 13:04:31,940 INFO [train.py:1198] (1/2) Epoch 24, batch 2700, loss[loss=0.2249, simple_loss=0.281, pruned_loss=0.06311, ctc_loss=0.132, cr_loss=0.4051, over 34602.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.2737, pruned_loss=0.06405, ctc_loss=0.1332, cr_loss=0.4103, over 6765138.83 frames. ], batch size: 102, lr: 4.98e-03, grad_scale: 16.0 2024-09-18 13:05:13,694 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.076e+02 2.549e+02 2.816e+02 3.838e+02 7.005e+02, threshold=5.632e+02, percent-clipped=5.0 2024-09-18 13:05:36,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=428806.0, ans=0.0 2024-09-18 13:05:41,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=428852.6666666667, ans=0.125 2024-09-18 13:05:56,236 INFO [train.py:1198] (1/2) Epoch 24, batch 2750, loss[loss=0.2215, simple_loss=0.2704, pruned_loss=0.06479, ctc_loss=0.1337, cr_loss=0.4092, over 34625.00 frames. ], tot_loss[loss=0.2211, simple_loss=0.2723, pruned_loss=0.06352, ctc_loss=0.1323, cr_loss=0.4082, over 6761867.34 frames. ], batch size: 88, lr: 4.98e-03, grad_scale: 8.0 2024-09-18 13:05:56,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=428899.3333333333, ans=0.2 2024-09-18 13:06:15,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=428946.0, ans=0.125 2024-09-18 13:06:18,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=428946.0, ans=0.0 2024-09-18 13:06:29,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=428992.6666666667, ans=0.125 2024-09-18 13:06:41,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=428992.6666666667, ans=0.025 2024-09-18 13:07:08,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=429086.0, ans=0.0 2024-09-18 13:07:11,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=429086.0, ans=0.0 2024-09-18 13:07:21,143 INFO [train.py:1198] (1/2) Epoch 24, batch 2800, loss[loss=0.2727, simple_loss=0.3015, pruned_loss=0.09412, ctc_loss=0.1849, cr_loss=0.4657, over 23145.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.2723, pruned_loss=0.06369, ctc_loss=0.1326, cr_loss=0.4085, over 6739672.50 frames. ], batch size: 245, lr: 4.98e-03, grad_scale: 16.0 2024-09-18 13:07:54,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=429226.0, ans=0.125 2024-09-18 13:08:02,355 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.089e+02 2.478e+02 2.844e+02 3.293e+02 4.960e+02, threshold=5.688e+02, percent-clipped=0.0 2024-09-18 13:08:15,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2024-09-18 13:08:42,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=429319.3333333333, ans=0.125 2024-09-18 13:08:47,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=429319.3333333333, ans=10.0 2024-09-18 13:08:49,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=429319.3333333333, ans=0.0 2024-09-18 13:08:52,326 INFO [train.py:1198] (1/2) Epoch 24, batch 2850, loss[loss=0.2124, simple_loss=0.2653, pruned_loss=0.05933, ctc_loss=0.1251, cr_loss=0.3962, over 34496.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.273, pruned_loss=0.06394, ctc_loss=0.133, cr_loss=0.4088, over 6722689.05 frames. ], batch size: 90, lr: 4.98e-03, grad_scale: 16.0 2024-09-18 13:09:23,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=429459.3333333333, ans=0.0 2024-09-18 13:09:36,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.70 vs. limit=10.0 2024-09-18 13:10:06,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=429552.6666666667, ans=0.0 2024-09-18 13:10:16,329 INFO [train.py:1198] (1/2) Epoch 24, batch 2900, loss[loss=0.2097, simple_loss=0.2634, pruned_loss=0.0586, ctc_loss=0.1207, cr_loss=0.3672, over 34514.00 frames. ], tot_loss[loss=0.2228, simple_loss=0.2741, pruned_loss=0.06422, ctc_loss=0.1334, cr_loss=0.4106, over 6753582.89 frames. ], batch size: 94, lr: 4.97e-03, grad_scale: 16.0 2024-09-18 13:10:17,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.51 vs. limit=12.0 2024-09-18 13:10:26,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=429599.3333333333, ans=0.125 2024-09-18 13:10:30,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.54 vs. limit=12.0 2024-09-18 13:10:31,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=429646.0, ans=0.035 2024-09-18 13:10:41,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=429646.0, ans=0.0 2024-09-18 13:10:43,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=429646.0, ans=0.1 2024-09-18 13:10:49,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=429692.6666666667, ans=0.125 2024-09-18 13:10:53,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=429692.6666666667, ans=0.125 2024-09-18 13:10:57,044 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.21 vs. limit=22.5 2024-09-18 13:10:57,683 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.408e+02 2.855e+02 3.481e+02 6.692e+02, threshold=5.710e+02, percent-clipped=3.0 2024-09-18 13:11:24,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=429786.0, ans=0.125 2024-09-18 13:11:32,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=429786.0, ans=0.125 2024-09-18 13:11:38,876 INFO [train.py:1198] (1/2) Epoch 24, batch 2950, loss[loss=0.2091, simple_loss=0.2588, pruned_loss=0.05947, ctc_loss=0.1221, cr_loss=0.3994, over 34635.00 frames. ], tot_loss[loss=0.2216, simple_loss=0.2728, pruned_loss=0.06375, ctc_loss=0.1327, cr_loss=0.4081, over 6749010.01 frames. ], batch size: 88, lr: 4.97e-03, grad_scale: 16.0 2024-09-18 13:11:40,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=429832.6666666667, ans=0.0 2024-09-18 13:11:44,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=429832.6666666667, ans=15.0 2024-09-18 13:11:46,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2024-09-18 13:12:04,803 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=5.76 vs. limit=15.0 2024-09-18 13:12:13,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=429926.0, ans=0.125 2024-09-18 13:12:26,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.38 vs. limit=22.5 2024-09-18 13:12:37,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=429972.6666666667, ans=0.0 2024-09-18 13:13:00,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=430019.3333333333, ans=0.125 2024-09-18 13:13:03,336 INFO [train.py:1198] (1/2) Epoch 24, batch 3000, loss[loss=0.225, simple_loss=0.2773, pruned_loss=0.06444, ctc_loss=0.1359, cr_loss=0.4158, over 34548.00 frames. ], tot_loss[loss=0.2216, simple_loss=0.2727, pruned_loss=0.06382, ctc_loss=0.1327, cr_loss=0.4081, over 6749478.12 frames. ], batch size: 94, lr: 4.97e-03, grad_scale: 16.0 2024-09-18 13:13:03,337 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 13:13:20,177 INFO [train.py:1230] (1/2) Epoch 24, validation: loss=0.1486, simple_loss=0.2448, pruned_loss=0.02216, ctc_loss=0.04058, cr_loss=1.947e-14, over 944034.00 frames. 2024-09-18 13:13:20,177 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 13:13:20,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=430066.0, ans=0.1 2024-09-18 13:13:36,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=15.0 2024-09-18 13:13:42,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=430112.6666666667, ans=0.125 2024-09-18 13:13:50,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=430112.6666666667, ans=0.0 2024-09-18 13:14:02,954 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.095e+02 2.478e+02 2.923e+02 3.709e+02 7.810e+02, threshold=5.847e+02, percent-clipped=5.0 2024-09-18 13:14:23,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2024-09-18 13:14:43,873 INFO [train.py:1198] (1/2) Epoch 24, batch 3050, loss[loss=0.2138, simple_loss=0.2583, pruned_loss=0.06353, ctc_loss=0.1314, cr_loss=0.3967, over 34607.00 frames. ], tot_loss[loss=0.2224, simple_loss=0.2734, pruned_loss=0.06416, ctc_loss=0.1334, cr_loss=0.4099, over 6742380.07 frames. ], batch size: 89, lr: 4.97e-03, grad_scale: 16.0 2024-09-18 13:14:51,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=430299.3333333333, ans=0.125 2024-09-18 13:15:41,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=430439.3333333333, ans=0.0 2024-09-18 13:16:02,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=430486.0, ans=0.2 2024-09-18 13:16:05,081 INFO [train.py:1198] (1/2) Epoch 24, batch 3100, loss[loss=0.2444, simple_loss=0.294, pruned_loss=0.07353, ctc_loss=0.1493, cr_loss=0.4467, over 34198.00 frames. ], tot_loss[loss=0.2219, simple_loss=0.273, pruned_loss=0.06391, ctc_loss=0.1329, cr_loss=0.4088, over 6742777.55 frames. ], batch size: 117, lr: 4.97e-03, grad_scale: 16.0 2024-09-18 13:16:05,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=430532.6666666667, ans=0.07 2024-09-18 13:16:08,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=430532.6666666667, ans=0.0 2024-09-18 13:16:16,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=430532.6666666667, ans=0.125 2024-09-18 13:16:23,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.12 vs. limit=15.0 2024-09-18 13:16:36,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=430626.0, ans=0.1 2024-09-18 13:16:37,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=430626.0, ans=0.0 2024-09-18 13:16:44,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=430626.0, ans=0.025 2024-09-18 13:16:44,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=430626.0, ans=0.125 2024-09-18 13:16:45,547 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.950e+02 2.446e+02 2.869e+02 3.585e+02 9.827e+02, threshold=5.738e+02, percent-clipped=5.0 2024-09-18 13:16:57,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.39 vs. limit=10.0 2024-09-18 13:17:25,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=430766.0, ans=0.1 2024-09-18 13:17:28,134 INFO [train.py:1198] (1/2) Epoch 24, batch 3150, loss[loss=0.2281, simple_loss=0.281, pruned_loss=0.0652, ctc_loss=0.1389, cr_loss=0.4251, over 33882.00 frames. ], tot_loss[loss=0.2217, simple_loss=0.2728, pruned_loss=0.06385, ctc_loss=0.1329, cr_loss=0.4091, over 6749473.68 frames. ], batch size: 122, lr: 4.97e-03, grad_scale: 16.0 2024-09-18 13:17:32,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=430766.0, ans=10.0 2024-09-18 13:18:23,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=430906.0, ans=0.2 2024-09-18 13:18:34,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=430952.6666666667, ans=0.05 2024-09-18 13:18:44,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=430952.6666666667, ans=0.1 2024-09-18 13:18:48,817 INFO [train.py:1198] (1/2) Epoch 24, batch 3200, loss[loss=0.2172, simple_loss=0.2654, pruned_loss=0.06273, ctc_loss=0.1329, cr_loss=0.4211, over 34548.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2721, pruned_loss=0.06341, ctc_loss=0.1322, cr_loss=0.4077, over 6761567.58 frames. ], batch size: 94, lr: 4.97e-03, grad_scale: 32.0 2024-09-18 13:19:10,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=431046.0, ans=0.0 2024-09-18 13:19:23,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=431092.6666666667, ans=0.025 2024-09-18 13:19:28,237 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:19:29,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.29 vs. limit=10.0 2024-09-18 13:19:31,012 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.594e+02 2.883e+02 3.475e+02 5.037e+02, threshold=5.767e+02, percent-clipped=0.0 2024-09-18 13:19:36,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=431139.3333333333, ans=0.125 2024-09-18 13:19:37,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.35 vs. limit=10.0 2024-09-18 13:19:43,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=431139.3333333333, ans=0.07 2024-09-18 13:20:00,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=431186.0, ans=0.95 2024-09-18 13:20:11,444 INFO [train.py:1198] (1/2) Epoch 24, batch 3250, loss[loss=0.2419, simple_loss=0.2877, pruned_loss=0.07398, ctc_loss=0.1483, cr_loss=0.4627, over 34656.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.2726, pruned_loss=0.0636, ctc_loss=0.1325, cr_loss=0.4085, over 6770926.27 frames. ], batch size: 98, lr: 4.96e-03, grad_scale: 16.0 2024-09-18 13:20:18,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=431232.6666666667, ans=0.0 2024-09-18 13:20:18,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=22.5 2024-09-18 13:20:18,681 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2024-09-18 13:20:58,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=431372.6666666667, ans=0.0 2024-09-18 13:21:03,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=431372.6666666667, ans=0.1 2024-09-18 13:21:17,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=431419.3333333333, ans=0.125 2024-09-18 13:21:19,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=431419.3333333333, ans=0.2 2024-09-18 13:21:25,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=431419.3333333333, ans=0.0 2024-09-18 13:21:32,019 INFO [train.py:1198] (1/2) Epoch 24, batch 3300, loss[loss=0.2279, simple_loss=0.2827, pruned_loss=0.06513, ctc_loss=0.1361, cr_loss=0.3897, over 33052.00 frames. ], tot_loss[loss=0.2196, simple_loss=0.271, pruned_loss=0.06288, ctc_loss=0.1312, cr_loss=0.4056, over 6768976.10 frames. ], batch size: 130, lr: 4.96e-03, grad_scale: 16.0 2024-09-18 13:21:32,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=431466.0, ans=0.1 2024-09-18 13:21:34,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=431466.0, ans=0.125 2024-09-18 13:21:43,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=431466.0, ans=0.0 2024-09-18 13:21:50,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=431512.6666666667, ans=0.0 2024-09-18 13:22:06,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=431559.3333333333, ans=0.0 2024-09-18 13:22:07,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.66 vs. limit=15.0 2024-09-18 13:22:13,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=431559.3333333333, ans=10.0 2024-09-18 13:22:14,363 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.020e+02 2.444e+02 2.779e+02 3.427e+02 9.278e+02, threshold=5.557e+02, percent-clipped=4.0 2024-09-18 13:22:17,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=431559.3333333333, ans=0.125 2024-09-18 13:22:21,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=431606.0, ans=0.2 2024-09-18 13:22:24,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=431606.0, ans=0.125 2024-09-18 13:22:53,276 INFO [train.py:1198] (1/2) Epoch 24, batch 3350, loss[loss=0.2323, simple_loss=0.2859, pruned_loss=0.06643, ctc_loss=0.1416, cr_loss=0.4389, over 33789.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.2722, pruned_loss=0.06343, ctc_loss=0.1322, cr_loss=0.408, over 6743631.34 frames. ], batch size: 122, lr: 4.96e-03, grad_scale: 16.0 2024-09-18 13:23:19,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=431746.0, ans=0.2 2024-09-18 13:23:37,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.94 vs. limit=15.0 2024-09-18 13:23:41,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=431839.3333333333, ans=0.2 2024-09-18 13:23:48,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=431839.3333333333, ans=0.0 2024-09-18 13:23:55,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=22.5 2024-09-18 13:24:15,028 INFO [train.py:1198] (1/2) Epoch 24, batch 3400, loss[loss=0.2021, simple_loss=0.2508, pruned_loss=0.05724, ctc_loss=0.1184, cr_loss=0.3807, over 34168.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2724, pruned_loss=0.06378, ctc_loss=0.1328, cr_loss=0.4092, over 6733761.93 frames. ], batch size: 78, lr: 4.96e-03, grad_scale: 16.0 2024-09-18 13:24:58,170 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.539e+02 3.191e+02 3.818e+02 5.603e+02, threshold=6.382e+02, percent-clipped=1.0 2024-09-18 13:25:11,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=432072.6666666667, ans=0.2 2024-09-18 13:25:32,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=432119.3333333333, ans=0.5 2024-09-18 13:25:37,150 INFO [train.py:1198] (1/2) Epoch 24, batch 3450, loss[loss=0.221, simple_loss=0.2752, pruned_loss=0.06252, ctc_loss=0.1283, cr_loss=0.3993, over 32980.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.273, pruned_loss=0.06382, ctc_loss=0.1329, cr_loss=0.4095, over 6746746.43 frames. ], batch size: 130, lr: 4.96e-03, grad_scale: 16.0 2024-09-18 13:25:40,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=432166.0, ans=0.125 2024-09-18 13:25:55,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=432212.6666666667, ans=0.0 2024-09-18 13:26:12,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=432259.3333333333, ans=0.125 2024-09-18 13:26:20,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=432259.3333333333, ans=0.07 2024-09-18 13:26:38,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=432306.0, ans=0.125 2024-09-18 13:26:53,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=432352.6666666667, ans=0.04949747468305833 2024-09-18 13:26:57,415 INFO [train.py:1198] (1/2) Epoch 24, batch 3500, loss[loss=0.2008, simple_loss=0.2552, pruned_loss=0.05433, ctc_loss=0.1136, cr_loss=0.3722, over 34472.00 frames. ], tot_loss[loss=0.2212, simple_loss=0.2722, pruned_loss=0.06365, ctc_loss=0.1326, cr_loss=0.4081, over 6747759.12 frames. ], batch size: 85, lr: 4.96e-03, grad_scale: 16.0 2024-09-18 13:26:59,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=432399.3333333333, ans=0.125 2024-09-18 13:27:01,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=432399.3333333333, ans=0.125 2024-09-18 13:27:11,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-09-18 13:27:22,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=432446.0, ans=0.0 2024-09-18 13:27:23,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=432446.0, ans=0.2 2024-09-18 13:27:37,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=432492.6666666667, ans=0.2 2024-09-18 13:27:39,342 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.077e+02 2.419e+02 2.756e+02 3.531e+02 5.701e+02, threshold=5.512e+02, percent-clipped=0.0 2024-09-18 13:28:01,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=432586.0, ans=0.1 2024-09-18 13:28:06,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=432586.0, ans=0.0 2024-09-18 13:28:09,067 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.24 vs. limit=10.0 2024-09-18 13:28:19,069 INFO [train.py:1198] (1/2) Epoch 24, batch 3550, loss[loss=0.2298, simple_loss=0.2823, pruned_loss=0.06579, ctc_loss=0.1405, cr_loss=0.4383, over 34380.00 frames. ], tot_loss[loss=0.2215, simple_loss=0.2724, pruned_loss=0.06382, ctc_loss=0.1328, cr_loss=0.409, over 6756710.42 frames. ], batch size: 103, lr: 4.96e-03, grad_scale: 16.0 2024-09-18 13:28:38,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=432679.3333333333, ans=0.0 2024-09-18 13:28:55,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=432726.0, ans=0.1 2024-09-18 13:29:05,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=432726.0, ans=0.5 2024-09-18 13:29:07,409 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.51 vs. limit=22.5 2024-09-18 13:29:08,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=432772.6666666667, ans=0.025 2024-09-18 13:29:37,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=432819.3333333333, ans=0.1 2024-09-18 13:29:40,242 INFO [train.py:1198] (1/2) Epoch 24, batch 3600, loss[loss=0.2215, simple_loss=0.2716, pruned_loss=0.06488, ctc_loss=0.1299, cr_loss=0.3907, over 34482.00 frames. ], tot_loss[loss=0.2216, simple_loss=0.2725, pruned_loss=0.06387, ctc_loss=0.133, cr_loss=0.4093, over 6766275.22 frames. ], batch size: 90, lr: 4.96e-03, grad_scale: 32.0 2024-09-18 13:29:43,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=432866.0, ans=0.1 2024-09-18 13:30:22,044 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.880e+02 2.471e+02 3.079e+02 3.747e+02 6.292e+02, threshold=6.158e+02, percent-clipped=2.0 2024-09-18 13:30:35,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=433006.0, ans=0.0 2024-09-18 13:30:36,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=433006.0, ans=0.0 2024-09-18 13:30:49,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=433052.6666666667, ans=0.125 2024-09-18 13:31:00,967 INFO [train.py:1198] (1/2) Epoch 24, batch 3650, loss[loss=0.2356, simple_loss=0.2894, pruned_loss=0.06809, ctc_loss=0.1399, cr_loss=0.4397, over 34492.00 frames. ], tot_loss[loss=0.2206, simple_loss=0.2718, pruned_loss=0.06339, ctc_loss=0.132, cr_loss=0.4075, over 6769335.67 frames. ], batch size: 110, lr: 4.95e-03, grad_scale: 32.0 2024-09-18 13:31:04,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=433099.3333333333, ans=0.125 2024-09-18 13:31:09,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=433099.3333333333, ans=0.0 2024-09-18 13:31:15,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=433146.0, ans=0.2 2024-09-18 13:31:25,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=433146.0, ans=0.125 2024-09-18 13:32:07,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=433286.0, ans=0.125 2024-09-18 13:32:09,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.82 vs. limit=15.0 2024-09-18 13:32:21,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=433332.6666666667, ans=0.0 2024-09-18 13:32:21,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=433332.6666666667, ans=0.2 2024-09-18 13:32:22,260 INFO [train.py:1198] (1/2) Epoch 24, batch 3700, loss[loss=0.217, simple_loss=0.2744, pruned_loss=0.0592, ctc_loss=0.1251, cr_loss=0.4045, over 34642.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2715, pruned_loss=0.06295, ctc_loss=0.1313, cr_loss=0.4057, over 6784312.52 frames. ], batch size: 102, lr: 4.95e-03, grad_scale: 32.0 2024-09-18 13:32:30,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=433332.6666666667, ans=0.0 2024-09-18 13:32:51,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=433379.3333333333, ans=0.0 2024-09-18 13:32:56,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2024-09-18 13:33:04,093 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.389e+02 2.702e+02 3.814e+02 7.812e+02, threshold=5.405e+02, percent-clipped=2.0 2024-09-18 13:33:06,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=433426.0, ans=0.125 2024-09-18 13:33:23,807 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:33:33,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=433519.3333333333, ans=0.125 2024-09-18 13:33:38,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.36 vs. limit=22.5 2024-09-18 13:33:39,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=433519.3333333333, ans=0.1 2024-09-18 13:33:42,810 INFO [train.py:1198] (1/2) Epoch 24, batch 3750, loss[loss=0.2353, simple_loss=0.2876, pruned_loss=0.06834, ctc_loss=0.1435, cr_loss=0.4423, over 34291.00 frames. ], tot_loss[loss=0.2235, simple_loss=0.2748, pruned_loss=0.06443, ctc_loss=0.1339, cr_loss=0.4116, over 6786483.56 frames. ], batch size: 113, lr: 4.95e-03, grad_scale: 32.0 2024-09-18 13:33:47,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=433566.0, ans=0.125 2024-09-18 13:34:30,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=433706.0, ans=0.125 2024-09-18 13:34:43,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=433706.0, ans=0.125 2024-09-18 13:34:47,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=433752.6666666667, ans=0.125 2024-09-18 13:34:48,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=433752.6666666667, ans=0.125 2024-09-18 13:34:48,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=433752.6666666667, ans=0.0 2024-09-18 13:34:56,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=433752.6666666667, ans=0.025 2024-09-18 13:35:04,931 INFO [train.py:1198] (1/2) Epoch 24, batch 3800, loss[loss=0.2504, simple_loss=0.2896, pruned_loss=0.08053, ctc_loss=0.16, cr_loss=0.4518, over 29868.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.2776, pruned_loss=0.06595, ctc_loss=0.1369, cr_loss=0.4175, over 6676509.02 frames. ], batch size: 175, lr: 4.95e-03, grad_scale: 32.0 2024-09-18 13:35:25,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=433846.0, ans=0.2 2024-09-18 13:35:32,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=433846.0, ans=0.125 2024-09-18 13:35:38,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=433892.6666666667, ans=0.125 2024-09-18 13:35:42,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=14.55 vs. limit=15.0 2024-09-18 13:35:49,873 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.356e+02 2.612e+02 2.921e+02 5.423e+02, threshold=5.223e+02, percent-clipped=1.0 2024-09-18 13:36:09,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=433939.3333333333, ans=0.025 2024-09-18 13:36:18,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=433986.0, ans=0.1 2024-09-18 13:36:29,099 INFO [train.py:1198] (1/2) Epoch 24, batch 3850, loss[loss=0.2402, simple_loss=0.2858, pruned_loss=0.07344, ctc_loss=0.1549, cr_loss=0.4183, over 23446.00 frames. ], tot_loss[loss=0.2312, simple_loss=0.2804, pruned_loss=0.06836, ctc_loss=0.1419, cr_loss=0.4223, over 6249391.10 frames. ], batch size: 244, lr: 4.95e-03, grad_scale: 16.0 2024-09-18 13:36:46,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=434079.3333333333, ans=0.125 2024-09-18 13:37:01,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2024-09-18 13:37:10,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=434126.0, ans=0.1 2024-09-18 13:37:58,808 INFO [train.py:1198] (1/2) Epoch 25, batch 0, loss[loss=0.2046, simple_loss=0.254, pruned_loss=0.05759, ctc_loss=0.1245, cr_loss=0.3779, over 34478.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.254, pruned_loss=0.05759, ctc_loss=0.1245, cr_loss=0.3779, over 34478.00 frames. ], batch size: 85, lr: 4.85e-03, grad_scale: 32.0 2024-09-18 13:37:58,808 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 13:38:15,858 INFO [train.py:1230] (1/2) Epoch 25, validation: loss=0.1492, simple_loss=0.2459, pruned_loss=0.02223, ctc_loss=0.04061, cr_loss=1.974e-14, over 944034.00 frames. 2024-09-18 13:38:15,859 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 13:38:20,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=15.0 2024-09-18 13:39:10,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=434298.6666666667, ans=0.125 2024-09-18 13:39:40,516 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.672e+02 2.976e+02 3.596e+02 6.640e+02, threshold=5.952e+02, percent-clipped=6.0 2024-09-18 13:39:40,537 INFO [train.py:1198] (1/2) Epoch 25, batch 50, loss[loss=0.1876, simple_loss=0.2405, pruned_loss=0.04936, ctc_loss=0.1089, cr_loss=0.3572, over 34483.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2753, pruned_loss=0.06464, ctc_loss=0.1341, cr_loss=0.4117, over 1481431.29 frames. ], batch size: 82, lr: 4.84e-03, grad_scale: 32.0 2024-09-18 13:40:08,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.15 vs. limit=10.0 2024-09-18 13:40:14,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.86 vs. limit=22.5 2024-09-18 13:40:31,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-18 13:41:05,547 INFO [train.py:1198] (1/2) Epoch 25, batch 100, loss[loss=0.2062, simple_loss=0.2572, pruned_loss=0.05757, ctc_loss=0.1221, cr_loss=0.3901, over 34587.00 frames. ], tot_loss[loss=0.2243, simple_loss=0.2756, pruned_loss=0.06481, ctc_loss=0.1345, cr_loss=0.4128, over 2629247.36 frames. ], batch size: 89, lr: 4.84e-03, grad_scale: 32.0 2024-09-18 13:41:12,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=434625.3333333333, ans=0.0 2024-09-18 13:41:20,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2024-09-18 13:41:30,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=434672.0, ans=0.0 2024-09-18 13:41:40,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=434718.6666666667, ans=0.025 2024-09-18 13:42:08,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=434765.3333333333, ans=0.125 2024-09-18 13:42:11,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.55 vs. limit=15.0 2024-09-18 13:42:26,930 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.507e+02 2.803e+02 3.443e+02 6.462e+02, threshold=5.605e+02, percent-clipped=3.0 2024-09-18 13:42:26,959 INFO [train.py:1198] (1/2) Epoch 25, batch 150, loss[loss=0.1987, simple_loss=0.2488, pruned_loss=0.05536, ctc_loss=0.116, cr_loss=0.3674, over 34500.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2728, pruned_loss=0.06314, ctc_loss=0.1316, cr_loss=0.4068, over 3555900.80 frames. ], batch size: 82, lr: 4.84e-03, grad_scale: 32.0 2024-09-18 13:42:38,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=434858.6666666667, ans=0.2 2024-09-18 13:42:41,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.81 vs. limit=22.5 2024-09-18 13:42:52,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=434905.3333333333, ans=0.125 2024-09-18 13:42:57,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=434905.3333333333, ans=0.125 2024-09-18 13:43:10,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2024-09-18 13:43:12,259 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:43:20,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=434998.6666666667, ans=0.0 2024-09-18 13:43:22,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=434998.6666666667, ans=0.0 2024-09-18 13:43:44,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-09-18 13:43:48,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=435045.3333333333, ans=12.0 2024-09-18 13:43:53,799 INFO [train.py:1198] (1/2) Epoch 25, batch 200, loss[loss=0.2356, simple_loss=0.284, pruned_loss=0.06997, ctc_loss=0.145, cr_loss=0.4542, over 32041.00 frames. ], tot_loss[loss=0.2194, simple_loss=0.2712, pruned_loss=0.06266, ctc_loss=0.1306, cr_loss=0.4048, over 4271237.28 frames. ], batch size: 145, lr: 4.84e-03, grad_scale: 32.0 2024-09-18 13:44:07,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=435092.0, ans=0.0 2024-09-18 13:44:17,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=435138.6666666667, ans=0.125 2024-09-18 13:44:19,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.34 vs. limit=12.0 2024-09-18 13:44:19,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=15.0 2024-09-18 13:44:23,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=435138.6666666667, ans=0.125 2024-09-18 13:44:32,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=435185.3333333333, ans=0.125 2024-09-18 13:44:45,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=435232.0, ans=0.0 2024-09-18 13:44:53,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=435232.0, ans=0.125 2024-09-18 13:45:10,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.86 vs. limit=22.5 2024-09-18 13:45:16,737 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.515e+02 2.933e+02 3.695e+02 9.037e+02, threshold=5.867e+02, percent-clipped=4.0 2024-09-18 13:45:16,757 INFO [train.py:1198] (1/2) Epoch 25, batch 250, loss[loss=0.2243, simple_loss=0.2848, pruned_loss=0.0605, ctc_loss=0.1315, cr_loss=0.4127, over 34285.00 frames. ], tot_loss[loss=0.2196, simple_loss=0.2713, pruned_loss=0.06278, ctc_loss=0.1307, cr_loss=0.4054, over 4833803.79 frames. ], batch size: 117, lr: 4.84e-03, grad_scale: 32.0 2024-09-18 13:45:44,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=435372.0, ans=0.1 2024-09-18 13:45:46,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=435372.0, ans=0.1 2024-09-18 13:45:51,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=435418.6666666667, ans=0.025 2024-09-18 13:45:59,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=435418.6666666667, ans=0.07 2024-09-18 13:46:16,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=435465.3333333333, ans=0.125 2024-09-18 13:46:40,861 INFO [train.py:1198] (1/2) Epoch 25, batch 300, loss[loss=0.2329, simple_loss=0.283, pruned_loss=0.06821, ctc_loss=0.1412, cr_loss=0.4528, over 34362.00 frames. ], tot_loss[loss=0.2196, simple_loss=0.2712, pruned_loss=0.06275, ctc_loss=0.1308, cr_loss=0.4052, over 5260687.57 frames. ], batch size: 107, lr: 4.84e-03, grad_scale: 32.0 2024-09-18 13:46:56,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=435605.3333333333, ans=0.0 2024-09-18 13:46:56,557 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.02 vs. limit=12.0 2024-09-18 13:47:07,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=435605.3333333333, ans=0.125 2024-09-18 13:47:07,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=435605.3333333333, ans=0.125 2024-09-18 13:47:36,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=435698.6666666667, ans=0.2 2024-09-18 13:47:43,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2024-09-18 13:48:05,450 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.983e+02 2.394e+02 2.816e+02 3.640e+02 7.392e+02, threshold=5.632e+02, percent-clipped=2.0 2024-09-18 13:48:05,475 INFO [train.py:1198] (1/2) Epoch 25, batch 350, loss[loss=0.1885, simple_loss=0.2407, pruned_loss=0.05034, ctc_loss=0.1092, cr_loss=0.346, over 34290.00 frames. ], tot_loss[loss=0.22, simple_loss=0.2717, pruned_loss=0.0629, ctc_loss=0.1311, cr_loss=0.406, over 5595984.38 frames. ], batch size: 83, lr: 4.84e-03, grad_scale: 32.0 2024-09-18 13:48:12,972 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.71 vs. limit=6.0 2024-09-18 13:48:28,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=435838.6666666667, ans=0.125 2024-09-18 13:48:30,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.71 vs. limit=15.0 2024-09-18 13:48:33,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=435838.6666666667, ans=0.025 2024-09-18 13:48:39,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=435885.3333333333, ans=0.125 2024-09-18 13:49:01,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=435932.0, ans=0.0 2024-09-18 13:49:05,565 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.55 vs. limit=22.5 2024-09-18 13:49:10,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=435978.6666666667, ans=0.125 2024-09-18 13:49:18,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=435978.6666666667, ans=0.125 2024-09-18 13:49:26,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=436025.3333333333, ans=0.0 2024-09-18 13:49:28,019 INFO [train.py:1198] (1/2) Epoch 25, batch 400, loss[loss=0.2256, simple_loss=0.2764, pruned_loss=0.06548, ctc_loss=0.1332, cr_loss=0.4313, over 34417.00 frames. ], tot_loss[loss=0.2191, simple_loss=0.271, pruned_loss=0.06246, ctc_loss=0.1304, cr_loss=0.4043, over 5863418.76 frames. ], batch size: 95, lr: 4.84e-03, grad_scale: 32.0 2024-09-18 13:49:38,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=436025.3333333333, ans=0.05 2024-09-18 13:49:46,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=436072.0, ans=0.125 2024-09-18 13:50:01,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=436118.6666666667, ans=0.125 2024-09-18 13:50:06,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=436118.6666666667, ans=0.0 2024-09-18 13:50:14,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=436118.6666666667, ans=0.0 2024-09-18 13:50:17,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=436165.3333333333, ans=0.125 2024-09-18 13:50:43,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=436212.0, ans=0.125 2024-09-18 13:50:48,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=436212.0, ans=0.125 2024-09-18 13:50:51,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=436258.6666666667, ans=0.125 2024-09-18 13:50:52,672 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.112e+02 2.503e+02 2.857e+02 3.660e+02 6.660e+02, threshold=5.714e+02, percent-clipped=5.0 2024-09-18 13:50:52,693 INFO [train.py:1198] (1/2) Epoch 25, batch 450, loss[loss=0.2409, simple_loss=0.2968, pruned_loss=0.06912, ctc_loss=0.1457, cr_loss=0.4431, over 34698.00 frames. ], tot_loss[loss=0.2194, simple_loss=0.2715, pruned_loss=0.06252, ctc_loss=0.1306, cr_loss=0.4055, over 6054661.48 frames. ], batch size: 97, lr: 4.83e-03, grad_scale: 32.0 2024-09-18 13:51:22,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.71 vs. limit=15.0 2024-09-18 13:51:26,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=436352.0, ans=0.125 2024-09-18 13:51:33,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=436352.0, ans=0.125 2024-09-18 13:51:44,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.03 vs. limit=22.5 2024-09-18 13:51:44,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=436398.6666666667, ans=0.125 2024-09-18 13:51:56,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=436398.6666666667, ans=0.125 2024-09-18 13:52:09,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=436445.3333333333, ans=0.125 2024-09-18 13:52:17,513 INFO [train.py:1198] (1/2) Epoch 25, batch 500, loss[loss=0.2413, simple_loss=0.2921, pruned_loss=0.07155, ctc_loss=0.1463, cr_loss=0.4567, over 34442.00 frames. ], tot_loss[loss=0.2192, simple_loss=0.271, pruned_loss=0.06253, ctc_loss=0.1307, cr_loss=0.4061, over 6220415.36 frames. ], batch size: 110, lr: 4.83e-03, grad_scale: 32.0 2024-09-18 13:52:19,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=436492.0, ans=0.125 2024-09-18 13:52:24,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=436492.0, ans=0.125 2024-09-18 13:53:03,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=436585.3333333333, ans=0.2 2024-09-18 13:53:08,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=436632.0, ans=0.125 2024-09-18 13:53:35,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=436678.6666666667, ans=0.2 2024-09-18 13:53:35,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=436678.6666666667, ans=0.125 2024-09-18 13:53:39,730 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 2.479e+02 2.842e+02 3.567e+02 7.827e+02, threshold=5.683e+02, percent-clipped=3.0 2024-09-18 13:53:39,751 INFO [train.py:1198] (1/2) Epoch 25, batch 550, loss[loss=0.2441, simple_loss=0.2912, pruned_loss=0.07469, ctc_loss=0.1517, cr_loss=0.4326, over 33734.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.271, pruned_loss=0.06263, ctc_loss=0.1309, cr_loss=0.406, over 6329700.07 frames. ], batch size: 122, lr: 4.83e-03, grad_scale: 32.0 2024-09-18 13:53:41,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=436725.3333333333, ans=0.125 2024-09-18 13:53:53,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=436725.3333333333, ans=0.1 2024-09-18 13:54:01,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=436772.0, ans=0.2 2024-09-18 13:54:03,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=436772.0, ans=0.125 2024-09-18 13:54:20,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=436818.6666666667, ans=0.2 2024-09-18 13:54:24,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=436818.6666666667, ans=0.1 2024-09-18 13:54:27,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.45 vs. limit=15.0 2024-09-18 13:54:40,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=436865.3333333333, ans=0.2 2024-09-18 13:54:49,029 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.96 vs. limit=12.0 2024-09-18 13:54:50,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=436912.0, ans=0.125 2024-09-18 13:54:57,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=436912.0, ans=0.125 2024-09-18 13:55:07,330 INFO [train.py:1198] (1/2) Epoch 25, batch 600, loss[loss=0.2286, simple_loss=0.2834, pruned_loss=0.06502, ctc_loss=0.1364, cr_loss=0.4101, over 34202.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2712, pruned_loss=0.06257, ctc_loss=0.1308, cr_loss=0.4057, over 6432341.65 frames. ], batch size: 117, lr: 4.83e-03, grad_scale: 32.0 2024-09-18 13:55:07,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=436958.6666666667, ans=0.125 2024-09-18 13:55:09,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=436958.6666666667, ans=0.125 2024-09-18 13:55:14,537 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 13:55:14,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=436958.6666666667, ans=0.2 2024-09-18 13:55:32,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=437005.3333333333, ans=0.1 2024-09-18 13:55:42,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=437052.0, ans=0.0 2024-09-18 13:55:48,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=437052.0, ans=0.0 2024-09-18 13:55:59,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=437098.6666666667, ans=0.0 2024-09-18 13:56:01,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=437098.6666666667, ans=0.0 2024-09-18 13:56:11,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=437145.3333333333, ans=0.125 2024-09-18 13:56:13,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=437145.3333333333, ans=0.125 2024-09-18 13:56:15,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=437145.3333333333, ans=0.0 2024-09-18 13:56:29,223 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 2.507e+02 3.142e+02 4.219e+02 8.358e+02, threshold=6.284e+02, percent-clipped=7.0 2024-09-18 13:56:29,245 INFO [train.py:1198] (1/2) Epoch 25, batch 650, loss[loss=0.2349, simple_loss=0.2833, pruned_loss=0.07024, ctc_loss=0.1442, cr_loss=0.4275, over 34521.00 frames. ], tot_loss[loss=0.2187, simple_loss=0.2706, pruned_loss=0.0623, ctc_loss=0.1304, cr_loss=0.4049, over 6523211.60 frames. ], batch size: 94, lr: 4.83e-03, grad_scale: 32.0 2024-09-18 13:56:31,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=437192.0, ans=0.125 2024-09-18 13:56:37,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=437192.0, ans=0.125 2024-09-18 13:56:57,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=437238.6666666667, ans=0.125 2024-09-18 13:57:02,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=437285.3333333333, ans=0.025 2024-09-18 13:57:17,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=437332.0, ans=0.0 2024-09-18 13:57:24,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437332.0, ans=0.1 2024-09-18 13:57:44,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=22.5 2024-09-18 13:57:51,749 INFO [train.py:1198] (1/2) Epoch 25, batch 700, loss[loss=0.2175, simple_loss=0.2641, pruned_loss=0.06411, ctc_loss=0.1318, cr_loss=0.4086, over 34607.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2713, pruned_loss=0.0626, ctc_loss=0.1309, cr_loss=0.4062, over 6581084.76 frames. ], batch size: 89, lr: 4.83e-03, grad_scale: 32.0 2024-09-18 13:57:53,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=437425.3333333333, ans=0.2 2024-09-18 13:58:18,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2024-09-18 13:58:36,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=437518.6666666667, ans=0.125 2024-09-18 13:58:38,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=437518.6666666667, ans=0.05 2024-09-18 13:58:38,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437518.6666666667, ans=0.1 2024-09-18 13:59:13,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437612.0, ans=0.1 2024-09-18 13:59:18,438 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 2.481e+02 2.834e+02 3.520e+02 5.611e+02, threshold=5.667e+02, percent-clipped=0.0 2024-09-18 13:59:18,459 INFO [train.py:1198] (1/2) Epoch 25, batch 750, loss[loss=0.2361, simple_loss=0.283, pruned_loss=0.07082, ctc_loss=0.1456, cr_loss=0.4616, over 34396.00 frames. ], tot_loss[loss=0.2192, simple_loss=0.271, pruned_loss=0.06256, ctc_loss=0.1308, cr_loss=0.4055, over 6624962.23 frames. ], batch size: 95, lr: 4.83e-03, grad_scale: 32.0 2024-09-18 13:59:25,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=437658.6666666667, ans=0.1 2024-09-18 13:59:33,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=437705.3333333333, ans=0.0 2024-09-18 13:59:55,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=437752.0, ans=0.125 2024-09-18 13:59:58,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=437752.0, ans=0.125 2024-09-18 14:00:16,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=437798.6666666667, ans=0.0 2024-09-18 14:00:23,336 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:00:41,134 INFO [train.py:1198] (1/2) Epoch 25, batch 800, loss[loss=0.1987, simple_loss=0.2492, pruned_loss=0.05511, ctc_loss=0.1158, cr_loss=0.3702, over 34457.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.271, pruned_loss=0.06275, ctc_loss=0.131, cr_loss=0.4064, over 6660794.58 frames. ], batch size: 85, lr: 4.83e-03, grad_scale: 32.0 2024-09-18 14:00:51,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=437892.0, ans=0.1 2024-09-18 14:00:54,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=437892.0, ans=0.025 2024-09-18 14:01:25,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=437985.3333333333, ans=0.025 2024-09-18 14:01:35,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=438032.0, ans=0.125 2024-09-18 14:01:44,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=438032.0, ans=0.125 2024-09-18 14:01:50,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=438078.6666666667, ans=0.95 2024-09-18 14:02:04,845 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.976e+02 2.430e+02 2.737e+02 3.625e+02 6.189e+02, threshold=5.473e+02, percent-clipped=1.0 2024-09-18 14:02:04,867 INFO [train.py:1198] (1/2) Epoch 25, batch 850, loss[loss=0.217, simple_loss=0.2715, pruned_loss=0.06013, ctc_loss=0.1272, cr_loss=0.4193, over 34357.00 frames. ], tot_loss[loss=0.2186, simple_loss=0.2704, pruned_loss=0.06225, ctc_loss=0.13, cr_loss=0.4047, over 6693985.99 frames. ], batch size: 103, lr: 4.82e-03, grad_scale: 32.0 2024-09-18 14:02:11,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=438125.3333333333, ans=0.1 2024-09-18 14:02:13,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=438125.3333333333, ans=0.125 2024-09-18 14:02:13,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.88 vs. limit=15.0 2024-09-18 14:02:21,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=438172.0, ans=0.0 2024-09-18 14:02:27,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.12 vs. limit=15.0 2024-09-18 14:02:28,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=438172.0, ans=0.07 2024-09-18 14:02:55,541 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.94 vs. limit=15.0 2024-09-18 14:03:11,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=438312.0, ans=0.0 2024-09-18 14:03:15,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=438312.0, ans=0.1 2024-09-18 14:03:29,829 INFO [train.py:1198] (1/2) Epoch 25, batch 900, loss[loss=0.2043, simple_loss=0.2576, pruned_loss=0.05591, ctc_loss=0.1197, cr_loss=0.3837, over 34440.00 frames. ], tot_loss[loss=0.219, simple_loss=0.2707, pruned_loss=0.06246, ctc_loss=0.1304, cr_loss=0.4053, over 6699023.08 frames. ], batch size: 85, lr: 4.82e-03, grad_scale: 32.0 2024-09-18 14:03:39,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=438358.6666666667, ans=0.125 2024-09-18 14:03:50,399 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.63 vs. limit=15.0 2024-09-18 14:04:00,500 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.15 vs. limit=15.0 2024-09-18 14:04:01,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=438452.0, ans=0.125 2024-09-18 14:04:10,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2024-09-18 14:04:24,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=438498.6666666667, ans=0.09899494936611666 2024-09-18 14:04:25,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2024-09-18 14:04:39,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=438545.3333333333, ans=0.125 2024-09-18 14:04:51,853 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.484e+02 3.092e+02 3.660e+02 9.292e+02, threshold=6.183e+02, percent-clipped=2.0 2024-09-18 14:04:51,873 INFO [train.py:1198] (1/2) Epoch 25, batch 950, loss[loss=0.2113, simple_loss=0.2641, pruned_loss=0.05896, ctc_loss=0.1238, cr_loss=0.3936, over 34702.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2705, pruned_loss=0.0622, ctc_loss=0.1299, cr_loss=0.4043, over 6703861.69 frames. ], batch size: 87, lr: 4.82e-03, grad_scale: 32.0 2024-09-18 14:04:52,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.87 vs. limit=15.0 2024-09-18 14:05:30,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=438685.3333333333, ans=0.0 2024-09-18 14:05:32,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=22.5 2024-09-18 14:05:40,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=438685.3333333333, ans=0.125 2024-09-18 14:05:45,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=438732.0, ans=0.125 2024-09-18 14:06:14,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=438825.3333333333, ans=0.125 2024-09-18 14:06:15,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=438825.3333333333, ans=10.0 2024-09-18 14:06:15,896 INFO [train.py:1198] (1/2) Epoch 25, batch 1000, loss[loss=0.2011, simple_loss=0.251, pruned_loss=0.05609, ctc_loss=0.1182, cr_loss=0.3831, over 34485.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2714, pruned_loss=0.06284, ctc_loss=0.1311, cr_loss=0.4066, over 6697536.44 frames. ], batch size: 90, lr: 4.82e-03, grad_scale: 16.0 2024-09-18 14:06:16,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=438825.3333333333, ans=0.1 2024-09-18 14:06:21,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=438825.3333333333, ans=0.2 2024-09-18 14:06:53,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=438918.6666666667, ans=0.125 2024-09-18 14:06:58,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=438918.6666666667, ans=0.0 2024-09-18 14:07:10,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=438965.3333333333, ans=15.0 2024-09-18 14:07:19,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=438965.3333333333, ans=0.5 2024-09-18 14:07:40,335 INFO [train.py:1198] (1/2) Epoch 25, batch 1050, loss[loss=0.2275, simple_loss=0.2817, pruned_loss=0.06499, ctc_loss=0.1343, cr_loss=0.4095, over 34576.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2707, pruned_loss=0.06269, ctc_loss=0.1308, cr_loss=0.4061, over 6706639.41 frames. ], batch size: 99, lr: 4.82e-03, grad_scale: 16.0 2024-09-18 14:07:41,873 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.559e+02 3.028e+02 3.699e+02 6.744e+02, threshold=6.057e+02, percent-clipped=1.0 2024-09-18 14:07:53,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=439058.6666666667, ans=0.125 2024-09-18 14:07:55,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=439105.3333333333, ans=0.125 2024-09-18 14:08:28,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=439198.6666666667, ans=0.1 2024-09-18 14:08:28,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=439198.6666666667, ans=0.1 2024-09-18 14:08:30,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=439198.6666666667, ans=0.125 2024-09-18 14:08:30,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.58 vs. limit=15.0 2024-09-18 14:08:31,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=439198.6666666667, ans=0.05 2024-09-18 14:08:44,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=439245.3333333333, ans=0.125 2024-09-18 14:08:46,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=439245.3333333333, ans=0.0 2024-09-18 14:09:02,554 INFO [train.py:1198] (1/2) Epoch 25, batch 1100, loss[loss=0.2191, simple_loss=0.2682, pruned_loss=0.06333, ctc_loss=0.1338, cr_loss=0.415, over 34374.00 frames. ], tot_loss[loss=0.2194, simple_loss=0.2708, pruned_loss=0.06275, ctc_loss=0.131, cr_loss=0.4067, over 6718253.75 frames. ], batch size: 91, lr: 4.82e-03, grad_scale: 16.0 2024-09-18 14:09:11,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=439292.0, ans=0.125 2024-09-18 14:09:46,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.18 vs. limit=15.0 2024-09-18 14:09:47,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=439385.3333333333, ans=0.125 2024-09-18 14:09:50,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=439385.3333333333, ans=0.2 2024-09-18 14:09:59,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=439432.0, ans=0.2 2024-09-18 14:10:03,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=439432.0, ans=0.1 2024-09-18 14:10:19,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.13 vs. limit=10.0 2024-09-18 14:10:28,822 INFO [train.py:1198] (1/2) Epoch 25, batch 1150, loss[loss=0.2132, simple_loss=0.2612, pruned_loss=0.06154, ctc_loss=0.1295, cr_loss=0.4064, over 34336.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.271, pruned_loss=0.06303, ctc_loss=0.1314, cr_loss=0.4067, over 6715497.85 frames. ], batch size: 91, lr: 4.82e-03, grad_scale: 16.0 2024-09-18 14:10:30,422 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.456e+02 2.891e+02 3.520e+02 5.671e+02, threshold=5.782e+02, percent-clipped=0.0 2024-09-18 14:11:42,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=439712.0, ans=0.125 2024-09-18 14:11:52,256 INFO [train.py:1198] (1/2) Epoch 25, batch 1200, loss[loss=0.2281, simple_loss=0.2863, pruned_loss=0.06381, ctc_loss=0.1316, cr_loss=0.3989, over 34580.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2719, pruned_loss=0.06343, ctc_loss=0.1322, cr_loss=0.4085, over 6708773.89 frames. ], batch size: 99, lr: 4.81e-03, grad_scale: 32.0 2024-09-18 14:11:57,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.min_positive, batch_count=439758.6666666667, ans=0.025 2024-09-18 14:11:59,276 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:12:04,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=439758.6666666667, ans=0.0 2024-09-18 14:12:14,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=439805.3333333333, ans=0.2 2024-09-18 14:12:25,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=439852.0, ans=0.125 2024-09-18 14:13:16,718 INFO [train.py:1198] (1/2) Epoch 25, batch 1250, loss[loss=0.2411, simple_loss=0.2919, pruned_loss=0.07179, ctc_loss=0.1494, cr_loss=0.4194, over 34306.00 frames. ], tot_loss[loss=0.2211, simple_loss=0.2724, pruned_loss=0.06347, ctc_loss=0.1323, cr_loss=0.409, over 6742864.69 frames. ], batch size: 107, lr: 4.81e-03, grad_scale: 32.0 2024-09-18 14:13:18,360 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.452e+02 2.687e+02 3.130e+02 5.665e+02, threshold=5.375e+02, percent-clipped=0.0 2024-09-18 14:13:22,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=439992.0, ans=0.2 2024-09-18 14:13:30,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=439992.0, ans=0.2 2024-09-18 14:13:34,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=440038.6666666667, ans=0.2 2024-09-18 14:13:45,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=440038.6666666667, ans=0.1 2024-09-18 14:14:13,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=440132.0, ans=0.95 2024-09-18 14:14:32,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=440178.6666666667, ans=0.04949747468305833 2024-09-18 14:14:41,718 INFO [train.py:1198] (1/2) Epoch 25, batch 1300, loss[loss=0.225, simple_loss=0.2821, pruned_loss=0.06255, ctc_loss=0.1336, cr_loss=0.4004, over 33088.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2717, pruned_loss=0.06317, ctc_loss=0.1318, cr_loss=0.4083, over 6744699.71 frames. ], batch size: 130, lr: 4.81e-03, grad_scale: 16.0 2024-09-18 14:14:55,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=440225.3333333333, ans=0.2 2024-09-18 14:15:21,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=440318.6666666667, ans=0.125 2024-09-18 14:15:58,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2024-09-18 14:16:04,486 INFO [train.py:1198] (1/2) Epoch 25, batch 1350, loss[loss=0.2191, simple_loss=0.272, pruned_loss=0.06228, ctc_loss=0.1293, cr_loss=0.3965, over 34514.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2712, pruned_loss=0.06292, ctc_loss=0.1313, cr_loss=0.4076, over 6764616.20 frames. ], batch size: 94, lr: 4.81e-03, grad_scale: 16.0 2024-09-18 14:16:07,734 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.901e+02 2.461e+02 3.116e+02 3.875e+02 8.007e+02, threshold=6.231e+02, percent-clipped=4.0 2024-09-18 14:16:26,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=440505.3333333333, ans=0.1 2024-09-18 14:16:34,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=440505.3333333333, ans=0.09899494936611666 2024-09-18 14:16:48,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=440552.0, ans=22.5 2024-09-18 14:16:56,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.23 vs. limit=15.0 2024-09-18 14:17:01,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.73 vs. limit=10.0 2024-09-18 14:17:14,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440645.3333333333, ans=0.1 2024-09-18 14:17:28,648 INFO [train.py:1198] (1/2) Epoch 25, batch 1400, loss[loss=0.1992, simple_loss=0.2452, pruned_loss=0.05759, ctc_loss=0.1162, cr_loss=0.3705, over 34326.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2712, pruned_loss=0.06301, ctc_loss=0.1312, cr_loss=0.4077, over 6776717.95 frames. ], batch size: 80, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 14:17:30,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=440692.0, ans=0.125 2024-09-18 14:17:32,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=440692.0, ans=0.125 2024-09-18 14:17:37,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=440692.0, ans=0.125 2024-09-18 14:17:37,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=440692.0, ans=0.0 2024-09-18 14:17:57,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=440738.6666666667, ans=0.025 2024-09-18 14:18:24,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=440832.0, ans=0.0 2024-09-18 14:18:32,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2024-09-18 14:18:48,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=440878.6666666667, ans=0.5 2024-09-18 14:18:53,522 INFO [train.py:1198] (1/2) Epoch 25, batch 1450, loss[loss=0.2312, simple_loss=0.284, pruned_loss=0.06734, ctc_loss=0.1352, cr_loss=0.4176, over 34486.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2716, pruned_loss=0.06285, ctc_loss=0.1311, cr_loss=0.4069, over 6773259.66 frames. ], batch size: 110, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 14:18:58,352 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.537e+02 2.942e+02 3.409e+02 5.016e+02, threshold=5.885e+02, percent-clipped=0.0 2024-09-18 14:19:00,475 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.078e-02 2024-09-18 14:19:22,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=440972.0, ans=0.125 2024-09-18 14:19:38,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=441018.6666666667, ans=0.0 2024-09-18 14:19:59,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=441112.0, ans=0.125 2024-09-18 14:20:01,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=441112.0, ans=0.125 2024-09-18 14:20:02,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=441112.0, ans=0.2 2024-09-18 14:20:06,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=441112.0, ans=0.125 2024-09-18 14:20:15,778 INFO [train.py:1198] (1/2) Epoch 25, batch 1500, loss[loss=0.22, simple_loss=0.2757, pruned_loss=0.06107, ctc_loss=0.1288, cr_loss=0.4118, over 34435.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2719, pruned_loss=0.06294, ctc_loss=0.1313, cr_loss=0.4072, over 6774597.90 frames. ], batch size: 100, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 14:20:49,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=441252.0, ans=0.125 2024-09-18 14:20:51,028 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:20:51,383 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.08 vs. limit=15.0 2024-09-18 14:20:55,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=441252.0, ans=0.0 2024-09-18 14:21:11,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.15 vs. limit=15.0 2024-09-18 14:21:20,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=441298.6666666667, ans=0.125 2024-09-18 14:21:37,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=441345.3333333333, ans=0.125 2024-09-18 14:21:40,080 INFO [train.py:1198] (1/2) Epoch 25, batch 1550, loss[loss=0.2343, simple_loss=0.2821, pruned_loss=0.07082, ctc_loss=0.14, cr_loss=0.4223, over 34415.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2719, pruned_loss=0.06315, ctc_loss=0.1316, cr_loss=0.4074, over 6746558.35 frames. ], batch size: 105, lr: 4.81e-03, grad_scale: 8.0 2024-09-18 14:21:44,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.13 vs. limit=15.0 2024-09-18 14:21:45,008 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.128e+02 2.521e+02 3.136e+02 4.108e+02 5.917e+02, threshold=6.272e+02, percent-clipped=1.0 2024-09-18 14:21:59,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=441438.6666666667, ans=0.125 2024-09-18 14:22:09,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=22.5 2024-09-18 14:22:11,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=441438.6666666667, ans=0.125 2024-09-18 14:22:25,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=441485.3333333333, ans=0.125 2024-09-18 14:22:47,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=441578.6666666667, ans=0.125 2024-09-18 14:22:50,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=441578.6666666667, ans=0.125 2024-09-18 14:23:05,095 INFO [train.py:1198] (1/2) Epoch 25, batch 1600, loss[loss=0.2124, simple_loss=0.2715, pruned_loss=0.0565, ctc_loss=0.1226, cr_loss=0.3934, over 34583.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2716, pruned_loss=0.06309, ctc_loss=0.1316, cr_loss=0.4072, over 6725070.42 frames. ], batch size: 99, lr: 4.80e-03, grad_scale: 16.0 2024-09-18 14:23:28,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=441672.0, ans=0.2 2024-09-18 14:23:31,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=441672.0, ans=0.0 2024-09-18 14:23:40,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=441718.6666666667, ans=0.125 2024-09-18 14:23:41,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2024-09-18 14:23:45,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.08 vs. limit=22.5 2024-09-18 14:23:46,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=441718.6666666667, ans=0.0 2024-09-18 14:23:58,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=441765.3333333333, ans=0.1 2024-09-18 14:24:16,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=441812.0, ans=0.0 2024-09-18 14:24:28,897 INFO [train.py:1198] (1/2) Epoch 25, batch 1650, loss[loss=0.226, simple_loss=0.279, pruned_loss=0.06444, ctc_loss=0.1347, cr_loss=0.4284, over 34392.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2714, pruned_loss=0.06286, ctc_loss=0.1313, cr_loss=0.4061, over 6718185.86 frames. ], batch size: 103, lr: 4.80e-03, grad_scale: 16.0 2024-09-18 14:24:33,882 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.994e+02 2.477e+02 2.834e+02 3.516e+02 5.864e+02, threshold=5.668e+02, percent-clipped=0.0 2024-09-18 14:24:55,648 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.67 vs. limit=10.0 2024-09-18 14:25:00,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=441952.0, ans=0.125 2024-09-18 14:25:25,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=441998.6666666667, ans=0.1 2024-09-18 14:25:35,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=442045.3333333333, ans=0.0 2024-09-18 14:25:35,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=442045.3333333333, ans=0.07 2024-09-18 14:25:42,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.25 vs. limit=22.5 2024-09-18 14:25:53,171 INFO [train.py:1198] (1/2) Epoch 25, batch 1700, loss[loss=0.192, simple_loss=0.2425, pruned_loss=0.05258, ctc_loss=0.1112, cr_loss=0.352, over 34281.00 frames. ], tot_loss[loss=0.2192, simple_loss=0.271, pruned_loss=0.06252, ctc_loss=0.1304, cr_loss=0.4042, over 6743655.04 frames. ], batch size: 80, lr: 4.80e-03, grad_scale: 16.0 2024-09-18 14:26:03,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=442092.0, ans=0.125 2024-09-18 14:26:23,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=442138.6666666667, ans=0.0 2024-09-18 14:26:34,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=442185.3333333333, ans=0.125 2024-09-18 14:26:51,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.15 vs. limit=15.0 2024-09-18 14:27:11,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.24 vs. limit=22.5 2024-09-18 14:27:15,484 INFO [train.py:1198] (1/2) Epoch 25, batch 1750, loss[loss=0.1848, simple_loss=0.234, pruned_loss=0.04968, ctc_loss=0.1085, cr_loss=0.3608, over 34138.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2707, pruned_loss=0.06239, ctc_loss=0.1302, cr_loss=0.4037, over 6753960.51 frames. ], batch size: 78, lr: 4.80e-03, grad_scale: 16.0 2024-09-18 14:27:20,350 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.479e+02 2.851e+02 3.622e+02 8.025e+02, threshold=5.701e+02, percent-clipped=2.0 2024-09-18 14:27:23,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=442325.3333333333, ans=0.125 2024-09-18 14:27:29,217 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=12.0 2024-09-18 14:27:33,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=442372.0, ans=0.0 2024-09-18 14:27:57,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=442418.6666666667, ans=0.125 2024-09-18 14:28:17,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442465.3333333333, ans=0.1 2024-09-18 14:28:23,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=442512.0, ans=0.0 2024-09-18 14:28:28,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=442512.0, ans=0.125 2024-09-18 14:28:39,495 INFO [train.py:1198] (1/2) Epoch 25, batch 1800, loss[loss=0.2218, simple_loss=0.2798, pruned_loss=0.06145, ctc_loss=0.1266, cr_loss=0.3906, over 34715.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2708, pruned_loss=0.06233, ctc_loss=0.1302, cr_loss=0.4043, over 6756571.50 frames. ], batch size: 97, lr: 4.80e-03, grad_scale: 16.0 2024-09-18 14:28:39,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=442558.6666666667, ans=0.125 2024-09-18 14:28:44,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=442558.6666666667, ans=0.125 2024-09-18 14:28:50,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=15.0 2024-09-18 14:29:15,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.04 vs. limit=22.5 2024-09-18 14:29:15,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=442652.0, ans=0.125 2024-09-18 14:29:36,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=442698.6666666667, ans=0.125 2024-09-18 14:29:44,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=442698.6666666667, ans=0.0 2024-09-18 14:29:47,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=442745.3333333333, ans=0.025 2024-09-18 14:29:54,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.54 vs. limit=10.0 2024-09-18 14:30:04,006 INFO [train.py:1198] (1/2) Epoch 25, batch 1850, loss[loss=0.2165, simple_loss=0.2727, pruned_loss=0.05977, ctc_loss=0.1252, cr_loss=0.3914, over 34468.00 frames. ], tot_loss[loss=0.2187, simple_loss=0.2707, pruned_loss=0.06228, ctc_loss=0.1302, cr_loss=0.4043, over 6765310.53 frames. ], batch size: 100, lr: 4.80e-03, grad_scale: 16.0 2024-09-18 14:30:07,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=442792.0, ans=0.0 2024-09-18 14:30:09,045 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.061e+02 2.540e+02 3.152e+02 4.081e+02 7.560e+02, threshold=6.305e+02, percent-clipped=5.0 2024-09-18 14:30:32,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=442838.6666666667, ans=10.0 2024-09-18 14:30:42,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=442885.3333333333, ans=0.125 2024-09-18 14:30:47,540 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:31:00,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=442932.0, ans=0.2 2024-09-18 14:31:00,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=442932.0, ans=0.05 2024-09-18 14:31:25,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=443025.3333333333, ans=0.05 2024-09-18 14:31:25,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=443025.3333333333, ans=0.1 2024-09-18 14:31:26,514 INFO [train.py:1198] (1/2) Epoch 25, batch 1900, loss[loss=0.2295, simple_loss=0.2855, pruned_loss=0.0642, ctc_loss=0.1392, cr_loss=0.431, over 34357.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2714, pruned_loss=0.06261, ctc_loss=0.1308, cr_loss=0.4062, over 6773566.45 frames. ], batch size: 103, lr: 4.80e-03, grad_scale: 16.0 2024-09-18 14:31:35,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=443025.3333333333, ans=0.125 2024-09-18 14:31:44,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=443072.0, ans=0.1 2024-09-18 14:32:08,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=443118.6666666667, ans=0.125 2024-09-18 14:32:09,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=443118.6666666667, ans=0.125 2024-09-18 14:32:24,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=443165.3333333333, ans=0.0 2024-09-18 14:32:32,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=443212.0, ans=0.125 2024-09-18 14:32:37,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=443212.0, ans=0.2 2024-09-18 14:32:50,685 INFO [train.py:1198] (1/2) Epoch 25, batch 1950, loss[loss=0.1989, simple_loss=0.2557, pruned_loss=0.05284, ctc_loss=0.1121, cr_loss=0.3494, over 34332.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.2723, pruned_loss=0.0628, ctc_loss=0.1312, cr_loss=0.4072, over 6790561.95 frames. ], batch size: 91, lr: 4.80e-03, grad_scale: 16.0 2024-09-18 14:32:55,700 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.406e+02 2.811e+02 3.553e+02 5.781e+02, threshold=5.622e+02, percent-clipped=0.0 2024-09-18 14:33:21,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=443305.3333333333, ans=0.2 2024-09-18 14:34:15,527 INFO [train.py:1198] (1/2) Epoch 25, batch 2000, loss[loss=0.1991, simple_loss=0.25, pruned_loss=0.05444, ctc_loss=0.1196, cr_loss=0.3825, over 34164.00 frames. ], tot_loss[loss=0.2206, simple_loss=0.2728, pruned_loss=0.06294, ctc_loss=0.1314, cr_loss=0.4076, over 6766831.98 frames. ], batch size: 78, lr: 4.79e-03, grad_scale: 32.0 2024-09-18 14:34:39,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.10 vs. limit=15.0 2024-09-18 14:34:48,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443585.3333333333, ans=0.1 2024-09-18 14:35:14,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=443632.0, ans=0.125 2024-09-18 14:35:22,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.96 vs. limit=15.0 2024-09-18 14:35:30,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=443678.6666666667, ans=0.125 2024-09-18 14:35:39,997 INFO [train.py:1198] (1/2) Epoch 25, batch 2050, loss[loss=0.1957, simple_loss=0.2446, pruned_loss=0.05454, ctc_loss=0.1134, cr_loss=0.3768, over 34463.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2718, pruned_loss=0.06271, ctc_loss=0.131, cr_loss=0.4063, over 6756564.99 frames. ], batch size: 82, lr: 4.79e-03, grad_scale: 16.0 2024-09-18 14:35:41,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=443725.3333333333, ans=0.125 2024-09-18 14:35:46,570 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.462e+02 3.034e+02 3.712e+02 6.634e+02, threshold=6.068e+02, percent-clipped=3.0 2024-09-18 14:36:11,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=443818.6666666667, ans=0.2 2024-09-18 14:36:21,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=443818.6666666667, ans=0.125 2024-09-18 14:36:24,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=443818.6666666667, ans=0.125 2024-09-18 14:36:37,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=443865.3333333333, ans=0.09899494936611666 2024-09-18 14:36:47,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=443912.0, ans=0.0 2024-09-18 14:36:56,169 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.75 vs. limit=10.0 2024-09-18 14:37:00,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=443958.6666666667, ans=0.125 2024-09-18 14:37:02,029 INFO [train.py:1198] (1/2) Epoch 25, batch 2100, loss[loss=0.2149, simple_loss=0.2694, pruned_loss=0.05957, ctc_loss=0.1251, cr_loss=0.4089, over 34526.00 frames. ], tot_loss[loss=0.2192, simple_loss=0.2712, pruned_loss=0.06248, ctc_loss=0.1305, cr_loss=0.4052, over 6769762.95 frames. ], batch size: 94, lr: 4.79e-03, grad_scale: 16.0 2024-09-18 14:37:15,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=443958.6666666667, ans=0.125 2024-09-18 14:37:20,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=444005.3333333333, ans=0.2 2024-09-18 14:37:28,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=444005.3333333333, ans=0.025 2024-09-18 14:38:12,512 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2024-09-18 14:38:16,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=444145.3333333333, ans=0.125 2024-09-18 14:38:26,054 INFO [train.py:1198] (1/2) Epoch 25, batch 2150, loss[loss=0.2252, simple_loss=0.2742, pruned_loss=0.06572, ctc_loss=0.1371, cr_loss=0.4321, over 34374.00 frames. ], tot_loss[loss=0.2187, simple_loss=0.2708, pruned_loss=0.06218, ctc_loss=0.1299, cr_loss=0.4044, over 6788829.25 frames. ], batch size: 91, lr: 4.79e-03, grad_scale: 16.0 2024-09-18 14:38:26,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=444192.0, ans=0.1 2024-09-18 14:38:32,740 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.039e+02 2.451e+02 2.717e+02 3.463e+02 7.166e+02, threshold=5.434e+02, percent-clipped=2.0 2024-09-18 14:38:53,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=444238.6666666667, ans=0.0 2024-09-18 14:39:18,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=444332.0, ans=0.0 2024-09-18 14:39:31,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=444332.0, ans=0.025 2024-09-18 14:39:50,880 INFO [train.py:1198] (1/2) Epoch 25, batch 2200, loss[loss=0.2355, simple_loss=0.29, pruned_loss=0.06787, ctc_loss=0.1393, cr_loss=0.4372, over 34426.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2709, pruned_loss=0.0623, ctc_loss=0.1301, cr_loss=0.4052, over 6784168.77 frames. ], batch size: 100, lr: 4.79e-03, grad_scale: 16.0 2024-09-18 14:39:53,025 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 14:40:15,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=444472.0, ans=0.5 2024-09-18 14:40:19,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.02 vs. limit=15.0 2024-09-18 14:40:32,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=444518.6666666667, ans=0.125 2024-09-18 14:40:34,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=444518.6666666667, ans=0.125 2024-09-18 14:41:08,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.79 vs. limit=22.5 2024-09-18 14:41:15,581 INFO [train.py:1198] (1/2) Epoch 25, batch 2250, loss[loss=0.2301, simple_loss=0.286, pruned_loss=0.0652, ctc_loss=0.1367, cr_loss=0.4104, over 34434.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.271, pruned_loss=0.06233, ctc_loss=0.1302, cr_loss=0.4049, over 6780308.86 frames. ], batch size: 95, lr: 4.79e-03, grad_scale: 16.0 2024-09-18 14:41:22,175 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.619e+02 3.071e+02 4.197e+02 7.345e+02, threshold=6.142e+02, percent-clipped=10.0 2024-09-18 14:41:44,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=12.0 2024-09-18 14:42:03,711 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.85 vs. limit=15.0 2024-09-18 14:42:11,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=444798.6666666667, ans=0.125 2024-09-18 14:42:24,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=444845.3333333333, ans=0.125 2024-09-18 14:42:31,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=444845.3333333333, ans=0.0 2024-09-18 14:42:37,715 INFO [train.py:1198] (1/2) Epoch 25, batch 2300, loss[loss=0.1923, simple_loss=0.2471, pruned_loss=0.05048, ctc_loss=0.1098, cr_loss=0.364, over 34236.00 frames. ], tot_loss[loss=0.2179, simple_loss=0.2699, pruned_loss=0.06196, ctc_loss=0.1294, cr_loss=0.403, over 6766073.67 frames. ], batch size: 83, lr: 4.79e-03, grad_scale: 16.0 2024-09-18 14:43:07,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=444938.6666666667, ans=0.125 2024-09-18 14:43:17,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=444985.3333333333, ans=0.2 2024-09-18 14:43:32,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=445032.0, ans=0.125 2024-09-18 14:43:38,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=445032.0, ans=0.125 2024-09-18 14:44:01,720 INFO [train.py:1198] (1/2) Epoch 25, batch 2350, loss[loss=0.2331, simple_loss=0.2844, pruned_loss=0.06852, ctc_loss=0.1391, cr_loss=0.422, over 34683.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2702, pruned_loss=0.06217, ctc_loss=0.1299, cr_loss=0.4039, over 6772649.66 frames. ], batch size: 97, lr: 4.79e-03, grad_scale: 16.0 2024-09-18 14:44:08,258 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.073e+02 2.318e+02 2.783e+02 3.414e+02 5.541e+02, threshold=5.566e+02, percent-clipped=0.0 2024-09-18 14:44:20,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=445172.0, ans=0.0 2024-09-18 14:44:26,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=445172.0, ans=0.125 2024-09-18 14:44:56,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.64 vs. limit=12.0 2024-09-18 14:45:02,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445265.3333333333, ans=0.1 2024-09-18 14:45:17,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=445312.0, ans=0.1 2024-09-18 14:45:22,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=445312.0, ans=0.125 2024-09-18 14:45:26,794 INFO [train.py:1198] (1/2) Epoch 25, batch 2400, loss[loss=0.2086, simple_loss=0.2611, pruned_loss=0.05818, ctc_loss=0.1215, cr_loss=0.3876, over 34584.00 frames. ], tot_loss[loss=0.2192, simple_loss=0.271, pruned_loss=0.06253, ctc_loss=0.1305, cr_loss=0.405, over 6777846.74 frames. ], batch size: 89, lr: 4.78e-03, grad_scale: 32.0 2024-09-18 14:45:27,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=445358.6666666667, ans=0.0 2024-09-18 14:46:19,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.09 vs. limit=15.0 2024-09-18 14:46:27,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=445498.6666666667, ans=0.125 2024-09-18 14:46:28,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=445498.6666666667, ans=0.1 2024-09-18 14:46:28,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=445498.6666666667, ans=0.025 2024-09-18 14:46:48,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=445545.3333333333, ans=0.04949747468305833 2024-09-18 14:46:51,611 INFO [train.py:1198] (1/2) Epoch 25, batch 2450, loss[loss=0.2186, simple_loss=0.2689, pruned_loss=0.06283, ctc_loss=0.1307, cr_loss=0.4108, over 34394.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2719, pruned_loss=0.06293, ctc_loss=0.1313, cr_loss=0.4064, over 6752119.33 frames. ], batch size: 95, lr: 4.78e-03, grad_scale: 32.0 2024-09-18 14:46:58,165 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.597e+02 3.203e+02 4.018e+02 6.372e+02, threshold=6.406e+02, percent-clipped=4.0 2024-09-18 14:47:00,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=445592.0, ans=0.125 2024-09-18 14:47:05,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=445592.0, ans=0.0 2024-09-18 14:47:16,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=445638.6666666667, ans=0.0 2024-09-18 14:47:51,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.66 vs. limit=22.5 2024-09-18 14:48:13,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=445825.3333333333, ans=0.5 2024-09-18 14:48:14,378 INFO [train.py:1198] (1/2) Epoch 25, batch 2500, loss[loss=0.2328, simple_loss=0.2854, pruned_loss=0.0673, ctc_loss=0.1406, cr_loss=0.4366, over 34431.00 frames. ], tot_loss[loss=0.2197, simple_loss=0.2714, pruned_loss=0.06279, ctc_loss=0.1311, cr_loss=0.4066, over 6763009.91 frames. ], batch size: 100, lr: 4.78e-03, grad_scale: 32.0 2024-09-18 14:49:00,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2024-09-18 14:49:06,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=445965.3333333333, ans=0.2 2024-09-18 14:49:08,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=445965.3333333333, ans=0.2 2024-09-18 14:49:09,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.60 vs. limit=22.5 2024-09-18 14:49:23,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=446012.0, ans=0.125 2024-09-18 14:49:39,475 INFO [train.py:1198] (1/2) Epoch 25, batch 2550, loss[loss=0.1887, simple_loss=0.2389, pruned_loss=0.05128, ctc_loss=0.1112, cr_loss=0.3432, over 34167.00 frames. ], tot_loss[loss=0.2194, simple_loss=0.2711, pruned_loss=0.06262, ctc_loss=0.1308, cr_loss=0.4065, over 6766894.34 frames. ], batch size: 78, lr: 4.78e-03, grad_scale: 32.0 2024-09-18 14:49:47,624 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.998e+02 2.417e+02 2.792e+02 3.370e+02 7.380e+02, threshold=5.585e+02, percent-clipped=4.0 2024-09-18 14:49:48,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.46 vs. limit=10.0 2024-09-18 14:50:00,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=446105.3333333333, ans=0.2 2024-09-18 14:50:02,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=446105.3333333333, ans=0.125 2024-09-18 14:50:08,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=22.5 2024-09-18 14:50:44,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=446198.6666666667, ans=0.0 2024-09-18 14:51:03,533 INFO [train.py:1198] (1/2) Epoch 25, batch 2600, loss[loss=0.2076, simple_loss=0.2577, pruned_loss=0.05858, ctc_loss=0.1233, cr_loss=0.3929, over 34723.00 frames. ], tot_loss[loss=0.2197, simple_loss=0.2714, pruned_loss=0.06276, ctc_loss=0.1312, cr_loss=0.407, over 6762676.73 frames. ], batch size: 92, lr: 4.78e-03, grad_scale: 16.0 2024-09-18 14:51:03,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=446292.0, ans=0.2 2024-09-18 14:51:18,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=446338.6666666667, ans=0.0 2024-09-18 14:51:46,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=446385.3333333333, ans=0.1 2024-09-18 14:51:53,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=446432.0, ans=0.1 2024-09-18 14:52:19,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.53 vs. limit=22.5 2024-09-18 14:52:23,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=22.5 2024-09-18 14:52:27,678 INFO [train.py:1198] (1/2) Epoch 25, batch 2650, loss[loss=0.2405, simple_loss=0.2925, pruned_loss=0.07029, ctc_loss=0.1475, cr_loss=0.4605, over 34222.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.2719, pruned_loss=0.06285, ctc_loss=0.1313, cr_loss=0.4079, over 6769073.29 frames. ], batch size: 117, lr: 4.78e-03, grad_scale: 16.0 2024-09-18 14:52:36,009 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.085e+02 2.461e+02 3.062e+02 3.785e+02 8.282e+02, threshold=6.124e+02, percent-clipped=5.0 2024-09-18 14:53:05,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=446618.6666666667, ans=0.125 2024-09-18 14:53:05,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=446618.6666666667, ans=0.0 2024-09-18 14:53:39,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=446712.0, ans=0.0 2024-09-18 14:53:52,028 INFO [train.py:1198] (1/2) Epoch 25, batch 2700, loss[loss=0.2366, simple_loss=0.2899, pruned_loss=0.06919, ctc_loss=0.1418, cr_loss=0.4147, over 34613.00 frames. ], tot_loss[loss=0.2201, simple_loss=0.272, pruned_loss=0.06283, ctc_loss=0.1313, cr_loss=0.4076, over 6764421.72 frames. ], batch size: 102, lr: 4.78e-03, grad_scale: 16.0 2024-09-18 14:53:54,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=446758.6666666667, ans=0.125 2024-09-18 14:54:31,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=446852.0, ans=0.125 2024-09-18 14:54:36,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=446852.0, ans=0.1 2024-09-18 14:54:53,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=446898.6666666667, ans=0.0 2024-09-18 14:55:14,817 INFO [train.py:1198] (1/2) Epoch 25, batch 2750, loss[loss=0.2173, simple_loss=0.2654, pruned_loss=0.06341, ctc_loss=0.1289, cr_loss=0.4152, over 34633.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2713, pruned_loss=0.06268, ctc_loss=0.131, cr_loss=0.4065, over 6761334.98 frames. ], batch size: 88, lr: 4.78e-03, grad_scale: 16.0 2024-09-18 14:55:15,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=446992.0, ans=0.0 2024-09-18 14:55:23,028 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.060e+02 2.528e+02 2.961e+02 3.745e+02 6.518e+02, threshold=5.923e+02, percent-clipped=2.0 2024-09-18 14:55:26,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=446992.0, ans=0.125 2024-09-18 14:55:33,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=447038.6666666667, ans=0.1 2024-09-18 14:55:43,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=447038.6666666667, ans=0.025 2024-09-18 14:55:53,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=447085.3333333333, ans=0.0 2024-09-18 14:55:57,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=447085.3333333333, ans=0.125 2024-09-18 14:56:39,667 INFO [train.py:1198] (1/2) Epoch 25, batch 2800, loss[loss=0.2548, simple_loss=0.2927, pruned_loss=0.08227, ctc_loss=0.1719, cr_loss=0.4528, over 23264.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2713, pruned_loss=0.06289, ctc_loss=0.1313, cr_loss=0.407, over 6739309.66 frames. ], batch size: 244, lr: 4.77e-03, grad_scale: 32.0 2024-09-18 14:56:51,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=447225.3333333333, ans=0.125 2024-09-18 14:56:56,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=447272.0, ans=0.125 2024-09-18 14:57:06,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=447272.0, ans=0.07 2024-09-18 14:57:34,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=447365.3333333333, ans=0.2 2024-09-18 14:58:03,962 INFO [train.py:1198] (1/2) Epoch 25, batch 2850, loss[loss=0.2166, simple_loss=0.2632, pruned_loss=0.06393, ctc_loss=0.1296, cr_loss=0.4054, over 34476.00 frames. ], tot_loss[loss=0.2208, simple_loss=0.2722, pruned_loss=0.06327, ctc_loss=0.1321, cr_loss=0.4086, over 6724166.57 frames. ], batch size: 90, lr: 4.77e-03, grad_scale: 16.0 2024-09-18 14:58:13,917 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.047e+02 2.503e+02 2.881e+02 3.479e+02 5.682e+02, threshold=5.763e+02, percent-clipped=0.0 2024-09-18 14:58:14,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=447458.6666666667, ans=0.1 2024-09-18 14:58:23,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=22.5 2024-09-18 14:58:29,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2024-09-18 14:58:32,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=447505.3333333333, ans=0.025 2024-09-18 14:58:35,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=447552.0, ans=0.0 2024-09-18 14:59:00,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=447598.6666666667, ans=0.0 2024-09-18 14:59:08,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=447645.3333333333, ans=0.125 2024-09-18 14:59:16,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=447645.3333333333, ans=0.2 2024-09-18 14:59:26,175 INFO [train.py:1198] (1/2) Epoch 25, batch 2900, loss[loss=0.2219, simple_loss=0.2761, pruned_loss=0.06264, ctc_loss=0.1313, cr_loss=0.4052, over 34554.00 frames. ], tot_loss[loss=0.2216, simple_loss=0.2732, pruned_loss=0.06358, ctc_loss=0.1325, cr_loss=0.4101, over 6754874.10 frames. ], batch size: 94, lr: 4.77e-03, grad_scale: 16.0 2024-09-18 14:59:43,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=447738.6666666667, ans=0.95 2024-09-18 15:00:03,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=447785.3333333333, ans=0.125 2024-09-18 15:00:06,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=447785.3333333333, ans=0.125 2024-09-18 15:00:07,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.08 vs. limit=15.0 2024-09-18 15:00:08,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=447785.3333333333, ans=0.125 2024-09-18 15:00:23,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=447832.0, ans=10.0 2024-09-18 15:00:25,430 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.41 vs. limit=15.0 2024-09-18 15:00:36,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=447878.6666666667, ans=0.1 2024-09-18 15:00:41,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=447878.6666666667, ans=0.1 2024-09-18 15:00:51,355 INFO [train.py:1198] (1/2) Epoch 25, batch 2950, loss[loss=0.2027, simple_loss=0.2515, pruned_loss=0.05749, ctc_loss=0.1188, cr_loss=0.3804, over 34643.00 frames. ], tot_loss[loss=0.22, simple_loss=0.2717, pruned_loss=0.06293, ctc_loss=0.1314, cr_loss=0.4074, over 6748711.57 frames. ], batch size: 88, lr: 4.77e-03, grad_scale: 16.0 2024-09-18 15:00:55,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=447925.3333333333, ans=0.1 2024-09-18 15:01:01,267 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.156e+02 2.612e+02 3.168e+02 4.250e+02 7.400e+02, threshold=6.335e+02, percent-clipped=8.0 2024-09-18 15:01:31,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=448018.6666666667, ans=0.125 2024-09-18 15:01:51,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=15.0 2024-09-18 15:02:09,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=448112.0, ans=0.2 2024-09-18 15:02:11,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448112.0, ans=0.1 2024-09-18 15:02:22,231 INFO [train.py:1198] (1/2) Epoch 25, batch 3000, loss[loss=0.2237, simple_loss=0.2736, pruned_loss=0.0655, ctc_loss=0.1327, cr_loss=0.4081, over 34531.00 frames. ], tot_loss[loss=0.2194, simple_loss=0.2712, pruned_loss=0.06261, ctc_loss=0.1309, cr_loss=0.4057, over 6748955.74 frames. ], batch size: 94, lr: 4.77e-03, grad_scale: 16.0 2024-09-18 15:02:22,231 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 15:02:39,061 INFO [train.py:1230] (1/2) Epoch 25, validation: loss=0.1484, simple_loss=0.2443, pruned_loss=0.02227, ctc_loss=0.03986, cr_loss=1.899e-14, over 944034.00 frames. 2024-09-18 15:02:39,062 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 15:02:41,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=448158.6666666667, ans=0.0 2024-09-18 15:02:42,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=448158.6666666667, ans=0.1 2024-09-18 15:03:24,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.12 vs. limit=15.0 2024-09-18 15:03:33,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=448298.6666666667, ans=0.125 2024-09-18 15:03:33,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2024-09-18 15:03:38,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=448298.6666666667, ans=0.125 2024-09-18 15:03:38,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=448298.6666666667, ans=0.0 2024-09-18 15:03:49,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=448345.3333333333, ans=0.0 2024-09-18 15:04:00,846 INFO [train.py:1198] (1/2) Epoch 25, batch 3050, loss[loss=0.2048, simple_loss=0.2592, pruned_loss=0.05528, ctc_loss=0.1216, cr_loss=0.3857, over 34567.00 frames. ], tot_loss[loss=0.2202, simple_loss=0.272, pruned_loss=0.06293, ctc_loss=0.1314, cr_loss=0.407, over 6740514.63 frames. ], batch size: 89, lr: 4.77e-03, grad_scale: 16.0 2024-09-18 15:04:01,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=448392.0, ans=0.1 2024-09-18 15:04:01,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=448392.0, ans=0.0 2024-09-18 15:04:12,712 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.028e+02 2.435e+02 2.736e+02 3.240e+02 7.283e+02, threshold=5.472e+02, percent-clipped=1.0 2024-09-18 15:04:34,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=448485.3333333333, ans=0.0 2024-09-18 15:04:51,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=448532.0, ans=0.2 2024-09-18 15:04:59,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=448532.0, ans=0.1 2024-09-18 15:05:03,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.54 vs. limit=12.0 2024-09-18 15:05:20,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=448578.6666666667, ans=0.0 2024-09-18 15:05:23,683 INFO [train.py:1198] (1/2) Epoch 25, batch 3100, loss[loss=0.2456, simple_loss=0.2934, pruned_loss=0.07497, ctc_loss=0.151, cr_loss=0.4425, over 34286.00 frames. ], tot_loss[loss=0.2203, simple_loss=0.2719, pruned_loss=0.06301, ctc_loss=0.1316, cr_loss=0.4075, over 6740947.87 frames. ], batch size: 117, lr: 4.77e-03, grad_scale: 16.0 2024-09-18 15:05:25,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=448625.3333333333, ans=0.125 2024-09-18 15:05:27,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=448625.3333333333, ans=0.125 2024-09-18 15:05:44,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2024-09-18 15:06:15,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=448765.3333333333, ans=0.025 2024-09-18 15:06:33,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=448812.0, ans=0.95 2024-09-18 15:06:46,305 INFO [train.py:1198] (1/2) Epoch 25, batch 3150, loss[loss=0.2491, simple_loss=0.3014, pruned_loss=0.07403, ctc_loss=0.1535, cr_loss=0.4487, over 33915.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.272, pruned_loss=0.06307, ctc_loss=0.1318, cr_loss=0.4076, over 6747993.48 frames. ], batch size: 122, lr: 4.77e-03, grad_scale: 16.0 2024-09-18 15:06:52,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=448858.6666666667, ans=0.035 2024-09-18 15:06:55,892 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.470e+02 2.866e+02 3.945e+02 7.457e+02, threshold=5.733e+02, percent-clipped=5.0 2024-09-18 15:06:57,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=448858.6666666667, ans=0.125 2024-09-18 15:07:01,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448905.3333333333, ans=0.1 2024-09-18 15:07:28,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=448952.0, ans=0.125 2024-09-18 15:07:28,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448952.0, ans=0.1 2024-09-18 15:07:30,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=448952.0, ans=0.1 2024-09-18 15:07:32,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=448952.0, ans=0.125 2024-09-18 15:07:41,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=448998.6666666667, ans=0.125 2024-09-18 15:07:42,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.16 vs. limit=10.0 2024-09-18 15:07:46,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=448998.6666666667, ans=0.0 2024-09-18 15:07:56,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=449045.3333333333, ans=0.0 2024-09-18 15:07:58,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=449045.3333333333, ans=0.125 2024-09-18 15:08:04,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=449045.3333333333, ans=0.0 2024-09-18 15:08:07,325 INFO [train.py:1198] (1/2) Epoch 25, batch 3200, loss[loss=0.2147, simple_loss=0.2702, pruned_loss=0.05906, ctc_loss=0.1266, cr_loss=0.3942, over 34523.00 frames. ], tot_loss[loss=0.2198, simple_loss=0.2716, pruned_loss=0.06279, ctc_loss=0.1313, cr_loss=0.4065, over 6761004.62 frames. ], batch size: 94, lr: 4.76e-03, grad_scale: 32.0 2024-09-18 15:08:15,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=449092.0, ans=0.025 2024-09-18 15:08:25,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=449138.6666666667, ans=0.2 2024-09-18 15:08:26,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=449138.6666666667, ans=0.2 2024-09-18 15:08:45,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=449185.3333333333, ans=0.0 2024-09-18 15:08:47,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=449185.3333333333, ans=15.0 2024-09-18 15:08:50,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=449185.3333333333, ans=0.09899494936611666 2024-09-18 15:08:58,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=449232.0, ans=0.125 2024-09-18 15:09:24,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=449278.6666666667, ans=0.025 2024-09-18 15:09:28,759 INFO [train.py:1198] (1/2) Epoch 25, batch 3250, loss[loss=0.2316, simple_loss=0.2804, pruned_loss=0.06868, ctc_loss=0.1399, cr_loss=0.4381, over 34664.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2718, pruned_loss=0.06273, ctc_loss=0.1312, cr_loss=0.4069, over 6770715.95 frames. ], batch size: 98, lr: 4.76e-03, grad_scale: 32.0 2024-09-18 15:09:38,489 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.055e+02 2.597e+02 2.913e+02 3.419e+02 5.220e+02, threshold=5.825e+02, percent-clipped=0.0 2024-09-18 15:09:58,242 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:10:07,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=449418.6666666667, ans=0.125 2024-09-18 15:10:23,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2024-09-18 15:10:51,275 INFO [train.py:1198] (1/2) Epoch 25, batch 3300, loss[loss=0.2023, simple_loss=0.2575, pruned_loss=0.05413, ctc_loss=0.1192, cr_loss=0.3744, over 33028.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2703, pruned_loss=0.06226, ctc_loss=0.1302, cr_loss=0.405, over 6768343.91 frames. ], batch size: 130, lr: 4.76e-03, grad_scale: 32.0 2024-09-18 15:10:55,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=449558.6666666667, ans=0.125 2024-09-18 15:11:06,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=449605.3333333333, ans=0.125 2024-09-18 15:11:16,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449605.3333333333, ans=0.1 2024-09-18 15:11:40,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449698.6666666667, ans=0.1 2024-09-18 15:11:40,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-09-18 15:11:53,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=449698.6666666667, ans=0.035 2024-09-18 15:11:53,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=449698.6666666667, ans=0.0 2024-09-18 15:11:56,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=449745.3333333333, ans=0.025 2024-09-18 15:11:56,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2024-09-18 15:12:03,127 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:12:12,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=449792.0, ans=0.0 2024-09-18 15:12:12,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=449792.0, ans=0.125 2024-09-18 15:12:13,872 INFO [train.py:1198] (1/2) Epoch 25, batch 3350, loss[loss=0.2315, simple_loss=0.2853, pruned_loss=0.06651, ctc_loss=0.1383, cr_loss=0.4254, over 33798.00 frames. ], tot_loss[loss=0.2197, simple_loss=0.2713, pruned_loss=0.06275, ctc_loss=0.1313, cr_loss=0.4071, over 6742627.23 frames. ], batch size: 122, lr: 4.76e-03, grad_scale: 16.0 2024-09-18 15:12:24,934 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.452e+02 2.784e+02 3.137e+02 5.786e+02, threshold=5.569e+02, percent-clipped=0.0 2024-09-18 15:13:17,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=449978.6666666667, ans=0.025 2024-09-18 15:13:24,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=449978.6666666667, ans=0.125 2024-09-18 15:13:35,312 INFO [train.py:1198] (1/2) Epoch 25, batch 3400, loss[loss=0.1819, simple_loss=0.2342, pruned_loss=0.04785, ctc_loss=0.1025, cr_loss=0.3342, over 34123.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2711, pruned_loss=0.06269, ctc_loss=0.1311, cr_loss=0.4064, over 6733223.09 frames. ], batch size: 78, lr: 4.76e-03, grad_scale: 16.0 2024-09-18 15:14:07,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=450118.6666666667, ans=0.125 2024-09-18 15:14:12,840 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.96 vs. limit=15.0 2024-09-18 15:14:31,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450165.3333333333, ans=0.1 2024-09-18 15:14:33,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=450165.3333333333, ans=0.0 2024-09-18 15:14:39,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=450212.0, ans=0.1 2024-09-18 15:14:41,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=450212.0, ans=0.125 2024-09-18 15:14:55,566 INFO [train.py:1198] (1/2) Epoch 25, batch 3450, loss[loss=0.2278, simple_loss=0.2816, pruned_loss=0.06457, ctc_loss=0.1377, cr_loss=0.4311, over 33239.00 frames. ], tot_loss[loss=0.2197, simple_loss=0.2715, pruned_loss=0.06273, ctc_loss=0.1311, cr_loss=0.4073, over 6745682.05 frames. ], batch size: 130, lr: 4.76e-03, grad_scale: 16.0 2024-09-18 15:15:06,818 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.471e+02 3.069e+02 3.783e+02 6.803e+02, threshold=6.138e+02, percent-clipped=1.0 2024-09-18 15:15:18,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=450305.3333333333, ans=0.0 2024-09-18 15:15:19,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=450305.3333333333, ans=0.125 2024-09-18 15:15:34,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.45 vs. limit=22.5 2024-09-18 15:15:53,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=450398.6666666667, ans=0.0 2024-09-18 15:16:10,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.67 vs. limit=15.0 2024-09-18 15:16:17,984 INFO [train.py:1198] (1/2) Epoch 25, batch 3500, loss[loss=0.2002, simple_loss=0.2492, pruned_loss=0.05671, ctc_loss=0.1167, cr_loss=0.3615, over 34454.00 frames. ], tot_loss[loss=0.2187, simple_loss=0.2706, pruned_loss=0.06228, ctc_loss=0.1304, cr_loss=0.4059, over 6747237.67 frames. ], batch size: 85, lr: 4.76e-03, grad_scale: 16.0 2024-09-18 15:16:53,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=450585.3333333333, ans=0.125 2024-09-18 15:17:00,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=450585.3333333333, ans=0.2 2024-09-18 15:17:14,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=450632.0, ans=0.0 2024-09-18 15:17:24,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=450678.6666666667, ans=0.125 2024-09-18 15:17:26,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=450678.6666666667, ans=0.0 2024-09-18 15:17:29,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=450678.6666666667, ans=0.125 2024-09-18 15:17:35,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=450678.6666666667, ans=0.125 2024-09-18 15:17:38,100 INFO [train.py:1198] (1/2) Epoch 25, batch 3550, loss[loss=0.2325, simple_loss=0.2888, pruned_loss=0.06646, ctc_loss=0.135, cr_loss=0.4058, over 34369.00 frames. ], tot_loss[loss=0.2187, simple_loss=0.2707, pruned_loss=0.06219, ctc_loss=0.1301, cr_loss=0.4056, over 6757365.07 frames. ], batch size: 103, lr: 4.76e-03, grad_scale: 16.0 2024-09-18 15:17:46,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=450725.3333333333, ans=0.1 2024-09-18 15:17:49,549 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.951e+02 2.442e+02 2.722e+02 3.469e+02 5.538e+02, threshold=5.445e+02, percent-clipped=0.0 2024-09-18 15:17:49,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=450725.3333333333, ans=0.0 2024-09-18 15:17:51,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=450725.3333333333, ans=0.1 2024-09-18 15:18:10,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=450818.6666666667, ans=0.0 2024-09-18 15:18:12,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.84 vs. limit=15.0 2024-09-18 15:18:43,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=450912.0, ans=0.0 2024-09-18 15:18:58,913 INFO [train.py:1198] (1/2) Epoch 25, batch 3600, loss[loss=0.2032, simple_loss=0.2592, pruned_loss=0.05443, ctc_loss=0.1164, cr_loss=0.3772, over 34473.00 frames. ], tot_loss[loss=0.219, simple_loss=0.2709, pruned_loss=0.06236, ctc_loss=0.1304, cr_loss=0.4067, over 6766294.82 frames. ], batch size: 90, lr: 4.76e-03, grad_scale: 32.0 2024-09-18 15:18:59,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=450958.6666666667, ans=0.125 2024-09-18 15:18:59,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=450958.6666666667, ans=0.125 2024-09-18 15:19:04,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=450958.6666666667, ans=0.0 2024-09-18 15:19:18,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=451005.3333333333, ans=0.2 2024-09-18 15:19:24,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=451005.3333333333, ans=0.125 2024-09-18 15:19:31,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=451052.0, ans=0.04949747468305833 2024-09-18 15:19:54,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=451098.6666666667, ans=0.125 2024-09-18 15:20:01,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=451098.6666666667, ans=0.125 2024-09-18 15:20:20,926 INFO [train.py:1198] (1/2) Epoch 25, batch 3650, loss[loss=0.229, simple_loss=0.2837, pruned_loss=0.06484, ctc_loss=0.1363, cr_loss=0.4348, over 34489.00 frames. ], tot_loss[loss=0.2186, simple_loss=0.2704, pruned_loss=0.06224, ctc_loss=0.13, cr_loss=0.4061, over 6769293.13 frames. ], batch size: 110, lr: 4.75e-03, grad_scale: 32.0 2024-09-18 15:20:29,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=451192.0, ans=0.1 2024-09-18 15:20:32,351 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.191e+02 2.630e+02 3.240e+02 4.496e+02 8.133e+02, threshold=6.479e+02, percent-clipped=14.0 2024-09-18 15:20:34,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451192.0, ans=0.1 2024-09-18 15:20:38,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.49 vs. limit=22.5 2024-09-18 15:20:48,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=451238.6666666667, ans=0.125 2024-09-18 15:20:59,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=451285.3333333333, ans=0.125 2024-09-18 15:21:03,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=451285.3333333333, ans=0.125 2024-09-18 15:21:10,060 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.44 vs. limit=15.0 2024-09-18 15:21:14,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=451332.0, ans=0.0 2024-09-18 15:21:22,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.23 vs. limit=12.0 2024-09-18 15:21:28,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=451378.6666666667, ans=0.0 2024-09-18 15:21:30,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=451378.6666666667, ans=0.09899494936611666 2024-09-18 15:21:31,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=451378.6666666667, ans=0.025 2024-09-18 15:21:35,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=451378.6666666667, ans=0.125 2024-09-18 15:21:36,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=451378.6666666667, ans=0.0 2024-09-18 15:21:41,244 INFO [train.py:1198] (1/2) Epoch 25, batch 3700, loss[loss=0.2196, simple_loss=0.278, pruned_loss=0.05982, ctc_loss=0.1279, cr_loss=0.3995, over 34599.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2705, pruned_loss=0.06206, ctc_loss=0.1299, cr_loss=0.4057, over 6784310.54 frames. ], batch size: 102, lr: 4.75e-03, grad_scale: 32.0 2024-09-18 15:21:43,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=451425.3333333333, ans=0.125 2024-09-18 15:21:44,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451425.3333333333, ans=0.1 2024-09-18 15:21:51,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=451425.3333333333, ans=0.1 2024-09-18 15:21:57,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=451472.0, ans=0.0 2024-09-18 15:22:25,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=451518.6666666667, ans=0.2 2024-09-18 15:22:32,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.62 vs. limit=6.0 2024-09-18 15:22:57,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=451612.0, ans=0.125 2024-09-18 15:23:03,429 INFO [train.py:1198] (1/2) Epoch 25, batch 3750, loss[loss=0.2424, simple_loss=0.2955, pruned_loss=0.07152, ctc_loss=0.1442, cr_loss=0.4373, over 34327.00 frames. ], tot_loss[loss=0.2213, simple_loss=0.2735, pruned_loss=0.06317, ctc_loss=0.132, cr_loss=0.4105, over 6785986.03 frames. ], batch size: 113, lr: 4.75e-03, grad_scale: 32.0 2024-09-18 15:23:10,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.87 vs. limit=15.0 2024-09-18 15:23:14,582 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.091e+02 2.351e+02 2.521e+02 3.017e+02 7.203e+02, threshold=5.043e+02, percent-clipped=1.0 2024-09-18 15:23:31,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=451705.3333333333, ans=0.125 2024-09-18 15:23:35,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2024-09-18 15:23:42,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=451752.0, ans=0.0 2024-09-18 15:23:59,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.69 vs. limit=12.0 2024-09-18 15:24:24,606 INFO [train.py:1198] (1/2) Epoch 25, batch 3800, loss[loss=0.2442, simple_loss=0.2876, pruned_loss=0.07537, ctc_loss=0.157, cr_loss=0.4654, over 29854.00 frames. ], tot_loss[loss=0.2248, simple_loss=0.2764, pruned_loss=0.06474, ctc_loss=0.135, cr_loss=0.4157, over 6676221.27 frames. ], batch size: 175, lr: 4.75e-03, grad_scale: 32.0 2024-09-18 15:24:29,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451892.0, ans=0.1 2024-09-18 15:24:30,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=451892.0, ans=0.2 2024-09-18 15:25:01,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=451985.3333333333, ans=0.125 2024-09-18 15:25:07,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=451985.3333333333, ans=0.05 2024-09-18 15:25:26,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=452032.0, ans=0.125 2024-09-18 15:25:42,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452078.6666666667, ans=0.1 2024-09-18 15:25:49,050 INFO [train.py:1198] (1/2) Epoch 25, batch 3850, loss[loss=0.2562, simple_loss=0.2972, pruned_loss=0.0814, ctc_loss=0.1718, cr_loss=0.449, over 23077.00 frames. ], tot_loss[loss=0.2291, simple_loss=0.2792, pruned_loss=0.06712, ctc_loss=0.1398, cr_loss=0.4201, over 6253562.55 frames. ], batch size: 244, lr: 4.75e-03, grad_scale: 32.0 2024-09-18 15:26:00,588 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.154e+02 2.538e+02 2.819e+02 3.087e+02 8.625e+02, threshold=5.639e+02, percent-clipped=1.0 2024-09-18 15:26:18,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.29 vs. limit=15.0 2024-09-18 15:27:16,125 INFO [train.py:1198] (1/2) Epoch 26, batch 0, loss[loss=0.2012, simple_loss=0.2535, pruned_loss=0.05561, ctc_loss=0.1146, cr_loss=0.3691, over 34477.00 frames. ], tot_loss[loss=0.2012, simple_loss=0.2535, pruned_loss=0.05561, ctc_loss=0.1146, cr_loss=0.3691, over 34477.00 frames. ], batch size: 85, lr: 4.65e-03, grad_scale: 32.0 2024-09-18 15:27:16,125 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 15:27:33,186 INFO [train.py:1230] (1/2) Epoch 26, validation: loss=0.1481, simple_loss=0.2453, pruned_loss=0.02141, ctc_loss=0.03986, cr_loss=1.939e-14, over 944034.00 frames. 2024-09-18 15:27:33,186 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 15:27:41,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=452246.6666666667, ans=0.125 2024-09-18 15:27:43,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=452246.6666666667, ans=0.0 2024-09-18 15:27:45,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=452246.6666666667, ans=0.0 2024-09-18 15:27:52,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=452293.3333333333, ans=0.04949747468305833 2024-09-18 15:28:15,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=452340.0, ans=0.2 2024-09-18 15:28:16,008 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.520e-03 2024-09-18 15:28:25,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=452386.6666666667, ans=0.125 2024-09-18 15:28:25,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=452386.6666666667, ans=0.125 2024-09-18 15:28:56,768 INFO [train.py:1198] (1/2) Epoch 26, batch 50, loss[loss=0.2033, simple_loss=0.2508, pruned_loss=0.05817, ctc_loss=0.1224, cr_loss=0.3739, over 34490.00 frames. ], tot_loss[loss=0.221, simple_loss=0.2723, pruned_loss=0.06349, ctc_loss=0.1322, cr_loss=0.4072, over 1482133.06 frames. ], batch size: 82, lr: 4.65e-03, grad_scale: 32.0 2024-09-18 15:28:58,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=452480.0, ans=0.125 2024-09-18 15:29:04,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2024-09-18 15:29:42,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=452573.3333333333, ans=0.125 2024-09-18 15:29:48,142 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.433e+02 2.728e+02 3.237e+02 6.739e+02, threshold=5.456e+02, percent-clipped=2.0 2024-09-18 15:29:51,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=452620.0, ans=0.04949747468305833 2024-09-18 15:30:20,010 INFO [train.py:1198] (1/2) Epoch 26, batch 100, loss[loss=0.1973, simple_loss=0.2511, pruned_loss=0.05315, ctc_loss=0.1126, cr_loss=0.3664, over 34608.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.274, pruned_loss=0.06361, ctc_loss=0.1326, cr_loss=0.411, over 2631863.73 frames. ], batch size: 89, lr: 4.65e-03, grad_scale: 32.0 2024-09-18 15:30:56,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=452806.6666666667, ans=0.125 2024-09-18 15:31:35,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=452900.0, ans=0.0 2024-09-18 15:31:38,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=452900.0, ans=0.125 2024-09-18 15:31:44,467 INFO [train.py:1198] (1/2) Epoch 26, batch 150, loss[loss=0.1921, simple_loss=0.2405, pruned_loss=0.05304, ctc_loss=0.1141, cr_loss=0.3689, over 34492.00 frames. ], tot_loss[loss=0.219, simple_loss=0.2713, pruned_loss=0.06214, ctc_loss=0.1303, cr_loss=0.4066, over 3558355.87 frames. ], batch size: 82, lr: 4.65e-03, grad_scale: 32.0 2024-09-18 15:31:51,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=452946.6666666667, ans=0.1 2024-09-18 15:32:20,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=453040.0, ans=0.125 2024-09-18 15:32:26,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.85 vs. limit=10.0 2024-09-18 15:32:27,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=453040.0, ans=0.125 2024-09-18 15:32:35,496 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.045e+02 2.379e+02 2.724e+02 3.754e+02 6.975e+02, threshold=5.448e+02, percent-clipped=2.0 2024-09-18 15:33:06,700 INFO [train.py:1198] (1/2) Epoch 26, batch 200, loss[loss=0.2394, simple_loss=0.2863, pruned_loss=0.07261, ctc_loss=0.1497, cr_loss=0.4329, over 31914.00 frames. ], tot_loss[loss=0.2179, simple_loss=0.27, pruned_loss=0.06182, ctc_loss=0.1294, cr_loss=0.4042, over 4272491.16 frames. ], batch size: 145, lr: 4.65e-03, grad_scale: 32.0 2024-09-18 15:33:36,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=453226.6666666667, ans=0.025 2024-09-18 15:34:07,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.51 vs. limit=15.0 2024-09-18 15:34:31,526 INFO [train.py:1198] (1/2) Epoch 26, batch 250, loss[loss=0.2212, simple_loss=0.2795, pruned_loss=0.06102, ctc_loss=0.1255, cr_loss=0.396, over 34216.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.27, pruned_loss=0.06156, ctc_loss=0.129, cr_loss=0.4031, over 4834856.74 frames. ], batch size: 117, lr: 4.65e-03, grad_scale: 32.0 2024-09-18 15:34:33,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=453413.3333333333, ans=0.125 2024-09-18 15:34:36,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=453413.3333333333, ans=0.2 2024-09-18 15:34:55,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.60 vs. limit=10.0 2024-09-18 15:35:05,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=22.5 2024-09-18 15:35:06,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=453506.6666666667, ans=0.0 2024-09-18 15:35:09,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2024-09-18 15:35:12,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=453506.6666666667, ans=0.125 2024-09-18 15:35:13,648 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.55 vs. limit=10.0 2024-09-18 15:35:26,409 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.541e+02 3.429e+02 4.427e+02 9.484e+02, threshold=6.857e+02, percent-clipped=10.0 2024-09-18 15:35:42,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=453600.0, ans=0.2 2024-09-18 15:35:56,500 INFO [train.py:1198] (1/2) Epoch 26, batch 300, loss[loss=0.2432, simple_loss=0.292, pruned_loss=0.07375, ctc_loss=0.1487, cr_loss=0.4304, over 34363.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2699, pruned_loss=0.06183, ctc_loss=0.1294, cr_loss=0.4041, over 5262630.13 frames. ], batch size: 107, lr: 4.65e-03, grad_scale: 16.0 2024-09-18 15:36:28,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=453740.0, ans=0.125 2024-09-18 15:36:29,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2024-09-18 15:36:34,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=453740.0, ans=0.1 2024-09-18 15:36:43,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=453740.0, ans=0.1 2024-09-18 15:36:45,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.09 vs. limit=15.0 2024-09-18 15:36:51,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=453786.6666666667, ans=0.0 2024-09-18 15:37:19,206 INFO [train.py:1198] (1/2) Epoch 26, batch 350, loss[loss=0.2009, simple_loss=0.2514, pruned_loss=0.05599, ctc_loss=0.1149, cr_loss=0.387, over 34267.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2701, pruned_loss=0.06173, ctc_loss=0.1292, cr_loss=0.4034, over 5596939.98 frames. ], batch size: 83, lr: 4.65e-03, grad_scale: 16.0 2024-09-18 15:37:58,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.43 vs. limit=10.0 2024-09-18 15:38:07,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=454020.0, ans=0.07 2024-09-18 15:38:13,618 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.028e+02 2.460e+02 2.954e+02 3.669e+02 6.989e+02, threshold=5.909e+02, percent-clipped=1.0 2024-09-18 15:38:35,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=454066.6666666667, ans=0.0 2024-09-18 15:38:43,189 INFO [train.py:1198] (1/2) Epoch 26, batch 400, loss[loss=0.2113, simple_loss=0.2695, pruned_loss=0.05685, ctc_loss=0.1202, cr_loss=0.3816, over 34408.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2699, pruned_loss=0.06179, ctc_loss=0.1292, cr_loss=0.4034, over 5863624.10 frames. ], batch size: 95, lr: 4.64e-03, grad_scale: 32.0 2024-09-18 15:39:04,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=454160.0, ans=0.125 2024-09-18 15:39:17,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=454206.6666666667, ans=0.125 2024-09-18 15:39:18,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=454206.6666666667, ans=0.125 2024-09-18 15:39:28,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=454206.6666666667, ans=0.1 2024-09-18 15:39:40,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=454253.3333333333, ans=0.95 2024-09-18 15:39:51,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.92 vs. limit=15.0 2024-09-18 15:39:58,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=454300.0, ans=0.0 2024-09-18 15:40:01,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=454300.0, ans=0.0 2024-09-18 15:40:08,031 INFO [train.py:1198] (1/2) Epoch 26, batch 450, loss[loss=0.2236, simple_loss=0.2768, pruned_loss=0.06374, ctc_loss=0.1328, cr_loss=0.4074, over 34704.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.27, pruned_loss=0.06178, ctc_loss=0.1292, cr_loss=0.4041, over 6053288.05 frames. ], batch size: 97, lr: 4.64e-03, grad_scale: 32.0 2024-09-18 15:40:09,978 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=2.821e-03 2024-09-18 15:40:39,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=454440.0, ans=0.2 2024-09-18 15:40:39,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=454440.0, ans=0.2 2024-09-18 15:40:47,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=454440.0, ans=0.1 2024-09-18 15:41:01,048 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.053e+02 2.386e+02 2.859e+02 3.579e+02 6.744e+02, threshold=5.717e+02, percent-clipped=2.0 2024-09-18 15:41:01,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=454486.6666666667, ans=0.125 2024-09-18 15:41:01,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=454486.6666666667, ans=0.125 2024-09-18 15:41:03,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.54 vs. limit=6.0 2024-09-18 15:41:16,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=454533.3333333333, ans=0.0 2024-09-18 15:41:16,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=454533.3333333333, ans=0.125 2024-09-18 15:41:31,135 INFO [train.py:1198] (1/2) Epoch 26, batch 500, loss[loss=0.233, simple_loss=0.2847, pruned_loss=0.06724, ctc_loss=0.143, cr_loss=0.4563, over 34475.00 frames. ], tot_loss[loss=0.2168, simple_loss=0.2689, pruned_loss=0.06142, ctc_loss=0.1284, cr_loss=0.4027, over 6219043.82 frames. ], batch size: 110, lr: 4.64e-03, grad_scale: 32.0 2024-09-18 15:41:48,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=454626.6666666667, ans=0.125 2024-09-18 15:41:54,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=454626.6666666667, ans=0.0 2024-09-18 15:41:58,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=454626.6666666667, ans=0.07 2024-09-18 15:42:21,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=454720.0, ans=0.1 2024-09-18 15:42:24,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=454720.0, ans=0.1 2024-09-18 15:42:47,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=454766.6666666667, ans=0.0 2024-09-18 15:42:55,700 INFO [train.py:1198] (1/2) Epoch 26, batch 550, loss[loss=0.2274, simple_loss=0.2817, pruned_loss=0.06492, ctc_loss=0.1325, cr_loss=0.4219, over 33895.00 frames. ], tot_loss[loss=0.2171, simple_loss=0.2692, pruned_loss=0.06153, ctc_loss=0.1288, cr_loss=0.4037, over 6327712.70 frames. ], batch size: 122, lr: 4.64e-03, grad_scale: 32.0 2024-09-18 15:43:00,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=454813.3333333333, ans=0.125 2024-09-18 15:43:20,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.80 vs. limit=15.0 2024-09-18 15:43:21,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=454860.0, ans=0.025 2024-09-18 15:43:46,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=454953.3333333333, ans=0.125 2024-09-18 15:43:50,999 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.368e+02 2.722e+02 3.421e+02 5.066e+02, threshold=5.444e+02, percent-clipped=0.0 2024-09-18 15:43:53,369 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2024-09-18 15:44:01,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=454953.3333333333, ans=0.1 2024-09-18 15:44:20,747 INFO [train.py:1198] (1/2) Epoch 26, batch 600, loss[loss=0.2414, simple_loss=0.2946, pruned_loss=0.07119, ctc_loss=0.1425, cr_loss=0.4356, over 34220.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2698, pruned_loss=0.06179, ctc_loss=0.1292, cr_loss=0.4046, over 6428684.03 frames. ], batch size: 117, lr: 4.64e-03, grad_scale: 32.0 2024-09-18 15:44:21,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=455046.6666666667, ans=0.125 2024-09-18 15:44:44,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=15.0 2024-09-18 15:44:47,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.56 vs. limit=15.0 2024-09-18 15:45:07,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455140.0, ans=0.1 2024-09-18 15:45:13,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=455186.6666666667, ans=0.125 2024-09-18 15:45:32,578 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.37 vs. limit=15.0 2024-09-18 15:45:41,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=455280.0, ans=0.0 2024-09-18 15:45:42,862 INFO [train.py:1198] (1/2) Epoch 26, batch 650, loss[loss=0.2244, simple_loss=0.2767, pruned_loss=0.06379, ctc_loss=0.1375, cr_loss=0.4275, over 34541.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2694, pruned_loss=0.06169, ctc_loss=0.1292, cr_loss=0.4051, over 6520899.44 frames. ], batch size: 94, lr: 4.64e-03, grad_scale: 32.0 2024-09-18 15:46:03,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455326.6666666667, ans=0.1 2024-09-18 15:46:28,055 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 15:46:37,639 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.067e+02 2.370e+02 2.949e+02 3.712e+02 8.434e+02, threshold=5.899e+02, percent-clipped=7.0 2024-09-18 15:47:03,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=455466.6666666667, ans=0.125 2024-09-18 15:47:05,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=455466.6666666667, ans=0.0 2024-09-18 15:47:09,776 INFO [train.py:1198] (1/2) Epoch 26, batch 700, loss[loss=0.221, simple_loss=0.2641, pruned_loss=0.06702, ctc_loss=0.1359, cr_loss=0.4143, over 34607.00 frames. ], tot_loss[loss=0.2176, simple_loss=0.2698, pruned_loss=0.06174, ctc_loss=0.1292, cr_loss=0.4045, over 6578651.68 frames. ], batch size: 89, lr: 4.64e-03, grad_scale: 16.0 2024-09-18 15:47:23,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455513.3333333333, ans=0.1 2024-09-18 15:47:36,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=455560.0, ans=0.125 2024-09-18 15:47:38,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=455560.0, ans=0.2 2024-09-18 15:47:46,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=455606.6666666667, ans=0.125 2024-09-18 15:47:49,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=455606.6666666667, ans=0.0 2024-09-18 15:47:52,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=455606.6666666667, ans=0.125 2024-09-18 15:48:13,528 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.04 vs. limit=15.0 2024-09-18 15:48:30,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=455746.6666666667, ans=0.07 2024-09-18 15:48:31,993 INFO [train.py:1198] (1/2) Epoch 26, batch 750, loss[loss=0.2192, simple_loss=0.2723, pruned_loss=0.06256, ctc_loss=0.1279, cr_loss=0.3851, over 34434.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2695, pruned_loss=0.0616, ctc_loss=0.129, cr_loss=0.4037, over 6622623.80 frames. ], batch size: 95, lr: 4.64e-03, grad_scale: 16.0 2024-09-18 15:48:40,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=455746.6666666667, ans=0.125 2024-09-18 15:48:52,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.41 vs. limit=15.0 2024-09-18 15:49:26,477 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.539e+02 3.143e+02 3.875e+02 6.815e+02, threshold=6.285e+02, percent-clipped=5.0 2024-09-18 15:49:27,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.94 vs. limit=10.0 2024-09-18 15:49:56,334 INFO [train.py:1198] (1/2) Epoch 26, batch 800, loss[loss=0.1968, simple_loss=0.2503, pruned_loss=0.05255, ctc_loss=0.1155, cr_loss=0.3749, over 34447.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2699, pruned_loss=0.0618, ctc_loss=0.1293, cr_loss=0.4046, over 6659052.11 frames. ], batch size: 85, lr: 4.64e-03, grad_scale: 32.0 2024-09-18 15:50:13,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=456026.6666666667, ans=0.1 2024-09-18 15:50:43,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2024-09-18 15:51:04,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=456166.6666666667, ans=0.0 2024-09-18 15:51:13,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.24 vs. limit=10.0 2024-09-18 15:51:20,713 INFO [train.py:1198] (1/2) Epoch 26, batch 850, loss[loss=0.2159, simple_loss=0.2766, pruned_loss=0.05751, ctc_loss=0.1241, cr_loss=0.3825, over 34381.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.2696, pruned_loss=0.06167, ctc_loss=0.1292, cr_loss=0.4048, over 6691952.13 frames. ], batch size: 103, lr: 4.63e-03, grad_scale: 32.0 2024-09-18 15:51:40,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=456260.0, ans=0.1 2024-09-18 15:51:42,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=456260.0, ans=0.0 2024-09-18 15:51:42,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=456260.0, ans=0.125 2024-09-18 15:51:42,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=456260.0, ans=0.125 2024-09-18 15:51:57,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=22.5 2024-09-18 15:51:58,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=456306.6666666667, ans=0.04949747468305833 2024-09-18 15:52:06,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=456306.6666666667, ans=0.025 2024-09-18 15:52:16,407 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.085e+02 2.395e+02 2.716e+02 3.321e+02 6.590e+02, threshold=5.432e+02, percent-clipped=1.0 2024-09-18 15:52:37,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=456400.0, ans=0.04949747468305833 2024-09-18 15:52:43,453 INFO [train.py:1198] (1/2) Epoch 26, batch 900, loss[loss=0.1967, simple_loss=0.2509, pruned_loss=0.05222, ctc_loss=0.1147, cr_loss=0.3788, over 34471.00 frames. ], tot_loss[loss=0.2179, simple_loss=0.2699, pruned_loss=0.06189, ctc_loss=0.1295, cr_loss=0.4056, over 6697890.25 frames. ], batch size: 85, lr: 4.63e-03, grad_scale: 16.0 2024-09-18 15:52:48,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=456446.6666666667, ans=0.125 2024-09-18 15:52:57,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=456446.6666666667, ans=0.2 2024-09-18 15:52:58,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=456493.3333333333, ans=0.125 2024-09-18 15:53:05,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=456493.3333333333, ans=0.05 2024-09-18 15:53:06,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=456493.3333333333, ans=0.2 2024-09-18 15:53:42,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.82 vs. limit=15.0 2024-09-18 15:53:49,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=456633.3333333333, ans=0.1 2024-09-18 15:54:07,390 INFO [train.py:1198] (1/2) Epoch 26, batch 950, loss[loss=0.1925, simple_loss=0.244, pruned_loss=0.05144, ctc_loss=0.115, cr_loss=0.3792, over 34707.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.27, pruned_loss=0.06179, ctc_loss=0.1293, cr_loss=0.4051, over 6700047.58 frames. ], batch size: 87, lr: 4.63e-03, grad_scale: 16.0 2024-09-18 15:54:09,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=456680.0, ans=0.2 2024-09-18 15:54:11,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=456680.0, ans=0.125 2024-09-18 15:54:56,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=456773.3333333333, ans=0.125 2024-09-18 15:55:05,777 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.078e+02 2.746e+02 3.618e+02 4.535e+02 7.862e+02, threshold=7.237e+02, percent-clipped=13.0 2024-09-18 15:55:07,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=456820.0, ans=0.125 2024-09-18 15:55:29,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.34 vs. limit=15.0 2024-09-18 15:55:31,877 INFO [train.py:1198] (1/2) Epoch 26, batch 1000, loss[loss=0.2157, simple_loss=0.2663, pruned_loss=0.06214, ctc_loss=0.1264, cr_loss=0.391, over 34476.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2705, pruned_loss=0.06208, ctc_loss=0.1299, cr_loss=0.4061, over 6693947.28 frames. ], batch size: 90, lr: 4.63e-03, grad_scale: 16.0 2024-09-18 15:55:54,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=456960.0, ans=0.0 2024-09-18 15:55:58,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=456960.0, ans=0.125 2024-09-18 15:56:05,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=457006.6666666667, ans=0.125 2024-09-18 15:56:10,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=457006.6666666667, ans=0.125 2024-09-18 15:56:10,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=457006.6666666667, ans=0.125 2024-09-18 15:56:23,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=457053.3333333333, ans=0.0 2024-09-18 15:56:30,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=457053.3333333333, ans=0.1 2024-09-18 15:56:42,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.38 vs. limit=15.0 2024-09-18 15:56:43,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=457100.0, ans=0.125 2024-09-18 15:56:56,124 INFO [train.py:1198] (1/2) Epoch 26, batch 1050, loss[loss=0.2305, simple_loss=0.2887, pruned_loss=0.06466, ctc_loss=0.1343, cr_loss=0.4051, over 34565.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2701, pruned_loss=0.06198, ctc_loss=0.1296, cr_loss=0.405, over 6704224.84 frames. ], batch size: 99, lr: 4.63e-03, grad_scale: 16.0 2024-09-18 15:57:16,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2024-09-18 15:57:24,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=457193.3333333333, ans=0.125 2024-09-18 15:57:40,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.23 vs. limit=22.5 2024-09-18 15:57:47,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=457286.6666666667, ans=0.2 2024-09-18 15:57:52,138 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.066e+02 2.401e+02 2.612e+02 3.197e+02 5.115e+02, threshold=5.224e+02, percent-clipped=0.0 2024-09-18 15:58:03,161 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=12.0 2024-09-18 15:58:12,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=457333.3333333333, ans=0.0 2024-09-18 15:58:19,025 INFO [train.py:1198] (1/2) Epoch 26, batch 1100, loss[loss=0.2177, simple_loss=0.2679, pruned_loss=0.06219, ctc_loss=0.13, cr_loss=0.4253, over 34383.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.27, pruned_loss=0.0618, ctc_loss=0.1293, cr_loss=0.4043, over 6717279.38 frames. ], batch size: 91, lr: 4.63e-03, grad_scale: 16.0 2024-09-18 15:58:49,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=457426.6666666667, ans=0.1 2024-09-18 15:58:58,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=457473.3333333333, ans=0.05 2024-09-18 15:59:13,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.41 vs. limit=15.0 2024-09-18 15:59:19,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=457520.0, ans=0.125 2024-09-18 15:59:19,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=457520.0, ans=0.125 2024-09-18 15:59:44,305 INFO [train.py:1198] (1/2) Epoch 26, batch 1150, loss[loss=0.2269, simple_loss=0.2756, pruned_loss=0.06693, ctc_loss=0.1368, cr_loss=0.4238, over 34359.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2701, pruned_loss=0.06193, ctc_loss=0.1297, cr_loss=0.4047, over 6715790.77 frames. ], batch size: 91, lr: 4.63e-03, grad_scale: 16.0 2024-09-18 15:59:51,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=457613.3333333333, ans=0.0 2024-09-18 15:59:56,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=457613.3333333333, ans=0.0 2024-09-18 16:00:19,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=457706.6666666667, ans=0.1 2024-09-18 16:00:21,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=457706.6666666667, ans=0.125 2024-09-18 16:00:28,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=457706.6666666667, ans=0.125 2024-09-18 16:00:41,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=457753.3333333333, ans=0.0 2024-09-18 16:00:42,771 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.415e+02 2.834e+02 3.509e+02 5.941e+02, threshold=5.667e+02, percent-clipped=2.0 2024-09-18 16:00:43,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=457753.3333333333, ans=0.125 2024-09-18 16:00:52,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=457800.0, ans=0.125 2024-09-18 16:01:07,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=457846.6666666667, ans=0.025 2024-09-18 16:01:08,973 INFO [train.py:1198] (1/2) Epoch 26, batch 1200, loss[loss=0.2127, simple_loss=0.2693, pruned_loss=0.0581, ctc_loss=0.1221, cr_loss=0.3859, over 34581.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2706, pruned_loss=0.06195, ctc_loss=0.1299, cr_loss=0.4052, over 6706594.05 frames. ], batch size: 99, lr: 4.63e-03, grad_scale: 32.0 2024-09-18 16:01:12,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=457846.6666666667, ans=0.125 2024-09-18 16:01:14,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-09-18 16:01:15,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.31 vs. limit=15.0 2024-09-18 16:01:27,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=457893.3333333333, ans=0.2 2024-09-18 16:01:32,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=457893.3333333333, ans=0.125 2024-09-18 16:02:07,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.62 vs. limit=6.0 2024-09-18 16:02:10,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=457986.6666666667, ans=0.025 2024-09-18 16:02:18,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=15.0 2024-09-18 16:02:30,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=458033.3333333333, ans=0.2 2024-09-18 16:02:32,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=458080.0, ans=0.0 2024-09-18 16:02:33,768 INFO [train.py:1198] (1/2) Epoch 26, batch 1250, loss[loss=0.2368, simple_loss=0.2881, pruned_loss=0.06977, ctc_loss=0.1439, cr_loss=0.4303, over 34334.00 frames. ], tot_loss[loss=0.2191, simple_loss=0.2714, pruned_loss=0.0622, ctc_loss=0.1304, cr_loss=0.4071, over 6740784.27 frames. ], batch size: 107, lr: 4.62e-03, grad_scale: 32.0 2024-09-18 16:02:54,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.55 vs. limit=22.5 2024-09-18 16:02:55,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=458126.6666666667, ans=0.125 2024-09-18 16:03:00,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=458126.6666666667, ans=0.125 2024-09-18 16:03:05,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=458173.3333333333, ans=0.0 2024-09-18 16:03:14,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.25 vs. limit=22.5 2024-09-18 16:03:25,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=458220.0, ans=0.1 2024-09-18 16:03:30,127 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.422e+02 2.902e+02 4.108e+02 1.160e+03, threshold=5.803e+02, percent-clipped=6.0 2024-09-18 16:03:57,022 INFO [train.py:1198] (1/2) Epoch 26, batch 1300, loss[loss=0.2301, simple_loss=0.2906, pruned_loss=0.06308, ctc_loss=0.1337, cr_loss=0.416, over 33117.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2706, pruned_loss=0.0619, ctc_loss=0.1298, cr_loss=0.405, over 6745973.88 frames. ], batch size: 130, lr: 4.62e-03, grad_scale: 32.0 2024-09-18 16:04:15,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2024-09-18 16:04:27,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=458360.0, ans=0.0 2024-09-18 16:04:37,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2024-09-18 16:04:43,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=458406.6666666667, ans=0.0 2024-09-18 16:04:53,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.14 vs. limit=22.5 2024-09-18 16:05:21,597 INFO [train.py:1198] (1/2) Epoch 26, batch 1350, loss[loss=0.2033, simple_loss=0.2571, pruned_loss=0.05525, ctc_loss=0.1179, cr_loss=0.3861, over 34532.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2702, pruned_loss=0.06163, ctc_loss=0.1292, cr_loss=0.4043, over 6765768.28 frames. ], batch size: 94, lr: 4.62e-03, grad_scale: 32.0 2024-09-18 16:05:46,290 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:05:57,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=458640.0, ans=0.0 2024-09-18 16:06:19,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.982e+02 2.391e+02 2.903e+02 3.594e+02 7.508e+02, threshold=5.806e+02, percent-clipped=1.0 2024-09-18 16:06:23,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.61 vs. limit=15.0 2024-09-18 16:06:27,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=458733.3333333333, ans=0.0 2024-09-18 16:06:45,771 INFO [train.py:1198] (1/2) Epoch 26, batch 1400, loss[loss=0.1842, simple_loss=0.2348, pruned_loss=0.04973, ctc_loss=0.1045, cr_loss=0.3304, over 34276.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2701, pruned_loss=0.06174, ctc_loss=0.1293, cr_loss=0.4049, over 6777800.79 frames. ], batch size: 80, lr: 4.62e-03, grad_scale: 32.0 2024-09-18 16:06:50,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=458780.0, ans=0.125 2024-09-18 16:07:05,757 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:07:13,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=458826.6666666667, ans=0.035 2024-09-18 16:07:33,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=458920.0, ans=0.125 2024-09-18 16:07:40,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=458920.0, ans=0.2 2024-09-18 16:07:45,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=458920.0, ans=0.125 2024-09-18 16:07:51,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=458966.6666666667, ans=0.1 2024-09-18 16:08:05,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=458966.6666666667, ans=0.125 2024-09-18 16:08:09,752 INFO [train.py:1198] (1/2) Epoch 26, batch 1450, loss[loss=0.246, simple_loss=0.2921, pruned_loss=0.07492, ctc_loss=0.1551, cr_loss=0.4753, over 34418.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2708, pruned_loss=0.06184, ctc_loss=0.1295, cr_loss=0.4063, over 6774360.53 frames. ], batch size: 110, lr: 4.62e-03, grad_scale: 32.0 2024-09-18 16:08:37,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=459060.0, ans=0.0 2024-09-18 16:08:46,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.51 vs. limit=22.5 2024-09-18 16:08:48,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=459106.6666666667, ans=0.0 2024-09-18 16:08:49,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=459106.6666666667, ans=0.1 2024-09-18 16:08:56,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.38 vs. limit=10.0 2024-09-18 16:09:05,508 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.571e+02 2.982e+02 3.470e+02 7.833e+02, threshold=5.965e+02, percent-clipped=1.0 2024-09-18 16:09:18,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.22 vs. limit=22.5 2024-09-18 16:09:32,163 INFO [train.py:1198] (1/2) Epoch 26, batch 1500, loss[loss=0.2234, simple_loss=0.2813, pruned_loss=0.06165, ctc_loss=0.1301, cr_loss=0.4045, over 34470.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.271, pruned_loss=0.06193, ctc_loss=0.1297, cr_loss=0.4063, over 6774773.40 frames. ], batch size: 100, lr: 4.62e-03, grad_scale: 16.0 2024-09-18 16:09:52,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=459293.3333333333, ans=0.125 2024-09-18 16:10:47,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=459433.3333333333, ans=0.2 2024-09-18 16:10:50,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=459433.3333333333, ans=0.025 2024-09-18 16:10:57,273 INFO [train.py:1198] (1/2) Epoch 26, batch 1550, loss[loss=0.2524, simple_loss=0.3, pruned_loss=0.077, ctc_loss=0.1579, cr_loss=0.4827, over 34421.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2711, pruned_loss=0.06215, ctc_loss=0.13, cr_loss=0.406, over 6746295.68 frames. ], batch size: 105, lr: 4.62e-03, grad_scale: 16.0 2024-09-18 16:11:02,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=459480.0, ans=0.125 2024-09-18 16:11:14,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=459526.6666666667, ans=0.1 2024-09-18 16:11:33,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=459573.3333333333, ans=0.95 2024-09-18 16:11:33,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=459573.3333333333, ans=0.0 2024-09-18 16:11:37,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2024-09-18 16:11:56,710 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.503e+02 2.931e+02 4.396e+02 8.101e+02, threshold=5.862e+02, percent-clipped=5.0 2024-09-18 16:11:57,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=15.0 2024-09-18 16:12:05,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.89 vs. limit=15.0 2024-09-18 16:12:11,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=459666.6666666667, ans=0.125 2024-09-18 16:12:16,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=459666.6666666667, ans=0.0 2024-09-18 16:12:21,392 INFO [train.py:1198] (1/2) Epoch 26, batch 1600, loss[loss=0.227, simple_loss=0.2832, pruned_loss=0.06417, ctc_loss=0.1298, cr_loss=0.4126, over 34554.00 frames. ], tot_loss[loss=0.2186, simple_loss=0.2709, pruned_loss=0.0621, ctc_loss=0.1299, cr_loss=0.4052, over 6723638.13 frames. ], batch size: 99, lr: 4.62e-03, grad_scale: 32.0 2024-09-18 16:12:29,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=459713.3333333333, ans=0.125 2024-09-18 16:12:53,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=459806.6666666667, ans=0.05 2024-09-18 16:12:58,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=459806.6666666667, ans=0.125 2024-09-18 16:12:58,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.67 vs. limit=15.0 2024-09-18 16:13:00,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.39 vs. limit=22.5 2024-09-18 16:13:01,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=459806.6666666667, ans=0.125 2024-09-18 16:13:45,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=459946.6666666667, ans=0.125 2024-09-18 16:13:46,573 INFO [train.py:1198] (1/2) Epoch 26, batch 1650, loss[loss=0.2151, simple_loss=0.2737, pruned_loss=0.05797, ctc_loss=0.1215, cr_loss=0.4058, over 34373.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2708, pruned_loss=0.06203, ctc_loss=0.1297, cr_loss=0.4048, over 6717446.10 frames. ], batch size: 103, lr: 4.62e-03, grad_scale: 32.0 2024-09-18 16:13:50,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=459946.6666666667, ans=0.125 2024-09-18 16:13:51,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=459946.6666666667, ans=0.125 2024-09-18 16:14:06,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=459993.3333333333, ans=0.125 2024-09-18 16:14:11,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=459993.3333333333, ans=0.0 2024-09-18 16:14:31,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=460040.0, ans=0.125 2024-09-18 16:14:32,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=460040.0, ans=0.0 2024-09-18 16:14:43,987 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.505e+02 2.877e+02 3.661e+02 8.552e+02, threshold=5.754e+02, percent-clipped=4.0 2024-09-18 16:14:49,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2024-09-18 16:14:58,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.40 vs. limit=15.0 2024-09-18 16:15:02,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=460133.3333333333, ans=0.125 2024-09-18 16:15:08,839 INFO [train.py:1198] (1/2) Epoch 26, batch 1700, loss[loss=0.1847, simple_loss=0.2366, pruned_loss=0.04911, ctc_loss=0.1048, cr_loss=0.3401, over 34306.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2705, pruned_loss=0.06191, ctc_loss=0.1295, cr_loss=0.4049, over 6743433.99 frames. ], batch size: 80, lr: 4.61e-03, grad_scale: 32.0 2024-09-18 16:15:16,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.57 vs. limit=15.0 2024-09-18 16:15:18,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=460180.0, ans=0.125 2024-09-18 16:15:37,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=460226.6666666667, ans=0.125 2024-09-18 16:15:40,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=460226.6666666667, ans=0.1 2024-09-18 16:16:03,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.58 vs. limit=15.0 2024-09-18 16:16:32,816 INFO [train.py:1198] (1/2) Epoch 26, batch 1750, loss[loss=0.1946, simple_loss=0.2437, pruned_loss=0.05432, ctc_loss=0.1121, cr_loss=0.3586, over 34229.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.27, pruned_loss=0.06179, ctc_loss=0.1292, cr_loss=0.4039, over 6752581.73 frames. ], batch size: 78, lr: 4.61e-03, grad_scale: 32.0 2024-09-18 16:16:33,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=460413.3333333333, ans=0.125 2024-09-18 16:16:34,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=460413.3333333333, ans=0.125 2024-09-18 16:16:34,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=460413.3333333333, ans=0.1 2024-09-18 16:16:39,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=460413.3333333333, ans=0.125 2024-09-18 16:16:52,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.48 vs. limit=22.5 2024-09-18 16:17:32,464 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.387e+02 2.789e+02 3.355e+02 5.772e+02, threshold=5.579e+02, percent-clipped=1.0 2024-09-18 16:17:32,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=460553.3333333333, ans=0.0 2024-09-18 16:17:36,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=460553.3333333333, ans=0.0 2024-09-18 16:17:56,844 INFO [train.py:1198] (1/2) Epoch 26, batch 1800, loss[loss=0.2267, simple_loss=0.2855, pruned_loss=0.06288, ctc_loss=0.1295, cr_loss=0.4058, over 34678.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2703, pruned_loss=0.06199, ctc_loss=0.1295, cr_loss=0.4047, over 6754575.41 frames. ], batch size: 97, lr: 4.61e-03, grad_scale: 32.0 2024-09-18 16:18:11,055 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2024-09-18 16:18:13,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=460693.3333333333, ans=0.125 2024-09-18 16:18:13,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=460693.3333333333, ans=0.0 2024-09-18 16:18:18,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=460693.3333333333, ans=0.025 2024-09-18 16:18:23,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=460693.3333333333, ans=0.0 2024-09-18 16:18:36,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=460740.0, ans=0.125 2024-09-18 16:19:06,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=460833.3333333333, ans=0.0 2024-09-18 16:19:21,508 INFO [train.py:1198] (1/2) Epoch 26, batch 1850, loss[loss=0.2206, simple_loss=0.2762, pruned_loss=0.0618, ctc_loss=0.1283, cr_loss=0.3915, over 34467.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2702, pruned_loss=0.06196, ctc_loss=0.1294, cr_loss=0.4045, over 6760122.35 frames. ], batch size: 100, lr: 4.61e-03, grad_scale: 32.0 2024-09-18 16:19:23,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=460880.0, ans=0.125 2024-09-18 16:19:43,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=460926.6666666667, ans=0.0 2024-09-18 16:20:04,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=460973.3333333333, ans=0.0 2024-09-18 16:20:10,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.08 vs. limit=12.0 2024-09-18 16:20:19,239 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.070e+02 2.566e+02 3.202e+02 4.077e+02 6.820e+02, threshold=6.403e+02, percent-clipped=5.0 2024-09-18 16:20:36,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=461066.6666666667, ans=0.1 2024-09-18 16:20:44,176 INFO [train.py:1198] (1/2) Epoch 26, batch 1900, loss[loss=0.2387, simple_loss=0.2963, pruned_loss=0.06755, ctc_loss=0.1407, cr_loss=0.4465, over 34411.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.271, pruned_loss=0.0622, ctc_loss=0.1299, cr_loss=0.4057, over 6769738.17 frames. ], batch size: 103, lr: 4.61e-03, grad_scale: 32.0 2024-09-18 16:21:05,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.25 vs. limit=22.5 2024-09-18 16:21:06,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.41 vs. limit=15.0 2024-09-18 16:21:23,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.38 vs. limit=22.5 2024-09-18 16:21:29,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=461206.6666666667, ans=0.07 2024-09-18 16:22:03,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=461300.0, ans=0.0 2024-09-18 16:22:07,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=461346.6666666667, ans=0.1 2024-09-18 16:22:08,524 INFO [train.py:1198] (1/2) Epoch 26, batch 1950, loss[loss=0.2104, simple_loss=0.2626, pruned_loss=0.05922, ctc_loss=0.1223, cr_loss=0.3815, over 34348.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2718, pruned_loss=0.06226, ctc_loss=0.1301, cr_loss=0.4065, over 6787549.03 frames. ], batch size: 91, lr: 4.61e-03, grad_scale: 32.0 2024-09-18 16:22:08,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=461346.6666666667, ans=0.0 2024-09-18 16:22:28,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=461393.3333333333, ans=0.0 2024-09-18 16:22:30,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=461393.3333333333, ans=0.0 2024-09-18 16:22:48,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=461440.0, ans=0.125 2024-09-18 16:22:54,049 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=3.049e-02 2024-09-18 16:22:56,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.09 vs. limit=10.0 2024-09-18 16:22:59,332 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.73 vs. limit=22.5 2024-09-18 16:23:08,163 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.499e+02 2.700e+02 3.622e+02 6.140e+02, threshold=5.400e+02, percent-clipped=0.0 2024-09-18 16:23:28,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=461533.3333333333, ans=0.0 2024-09-18 16:23:31,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=461580.0, ans=0.0 2024-09-18 16:23:32,994 INFO [train.py:1198] (1/2) Epoch 26, batch 2000, loss[loss=0.1878, simple_loss=0.2369, pruned_loss=0.05168, ctc_loss=0.1076, cr_loss=0.3468, over 34167.00 frames. ], tot_loss[loss=0.2199, simple_loss=0.2724, pruned_loss=0.06251, ctc_loss=0.1306, cr_loss=0.407, over 6762778.54 frames. ], batch size: 78, lr: 4.61e-03, grad_scale: 32.0 2024-09-18 16:23:37,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.29 vs. limit=22.5 2024-09-18 16:23:39,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.97 vs. limit=15.0 2024-09-18 16:23:55,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=461626.6666666667, ans=15.0 2024-09-18 16:24:27,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=461720.0, ans=0.125 2024-09-18 16:24:29,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=461720.0, ans=0.125 2024-09-18 16:24:36,179 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2024-09-18 16:24:55,338 INFO [train.py:1198] (1/2) Epoch 26, batch 2050, loss[loss=0.2009, simple_loss=0.2481, pruned_loss=0.05719, ctc_loss=0.1199, cr_loss=0.386, over 34460.00 frames. ], tot_loss[loss=0.219, simple_loss=0.2713, pruned_loss=0.06219, ctc_loss=0.1301, cr_loss=0.4055, over 6753799.17 frames. ], batch size: 82, lr: 4.61e-03, grad_scale: 16.0 2024-09-18 16:24:59,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=461813.3333333333, ans=0.125 2024-09-18 16:25:09,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=461813.3333333333, ans=0.2 2024-09-18 16:25:15,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.68 vs. limit=10.0 2024-09-18 16:25:20,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.37 vs. limit=22.5 2024-09-18 16:25:29,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=461906.6666666667, ans=0.0 2024-09-18 16:25:44,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=461906.6666666667, ans=0.125 2024-09-18 16:25:49,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=461953.3333333333, ans=0.2 2024-09-18 16:25:57,083 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.911e+02 2.410e+02 2.718e+02 3.685e+02 5.420e+02, threshold=5.435e+02, percent-clipped=1.0 2024-09-18 16:26:09,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=462000.0, ans=0.125 2024-09-18 16:26:17,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=462000.0, ans=0.1 2024-09-18 16:26:17,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=462000.0, ans=0.025 2024-09-18 16:26:19,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=462046.6666666667, ans=0.125 2024-09-18 16:26:20,452 INFO [train.py:1198] (1/2) Epoch 26, batch 2100, loss[loss=0.2239, simple_loss=0.2793, pruned_loss=0.06298, ctc_loss=0.1303, cr_loss=0.4099, over 34542.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2706, pruned_loss=0.06189, ctc_loss=0.1294, cr_loss=0.4042, over 6768191.74 frames. ], batch size: 94, lr: 4.61e-03, grad_scale: 16.0 2024-09-18 16:26:28,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=462046.6666666667, ans=0.125 2024-09-18 16:26:58,788 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:27:06,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=462140.0, ans=0.025 2024-09-18 16:27:06,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=462140.0, ans=0.025 2024-09-18 16:27:18,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=462186.6666666667, ans=0.125 2024-09-18 16:27:22,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=11.57 vs. limit=15.0 2024-09-18 16:27:30,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=462233.3333333333, ans=0.0 2024-09-18 16:27:33,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=462233.3333333333, ans=0.125 2024-09-18 16:27:44,585 INFO [train.py:1198] (1/2) Epoch 26, batch 2150, loss[loss=0.2149, simple_loss=0.2653, pruned_loss=0.06127, ctc_loss=0.1273, cr_loss=0.4153, over 34354.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2699, pruned_loss=0.06154, ctc_loss=0.1287, cr_loss=0.4029, over 6786300.02 frames. ], batch size: 91, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 16:27:44,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=462280.0, ans=0.125 2024-09-18 16:27:44,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=462280.0, ans=0.0 2024-09-18 16:27:48,441 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:27:51,697 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:27:56,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=462280.0, ans=0.2 2024-09-18 16:28:03,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=462326.6666666667, ans=0.125 2024-09-18 16:28:03,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=462326.6666666667, ans=0.0 2024-09-18 16:28:42,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=462420.0, ans=0.125 2024-09-18 16:28:45,590 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.473e+02 2.993e+02 3.801e+02 9.048e+02, threshold=5.986e+02, percent-clipped=4.0 2024-09-18 16:28:53,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=22.5 2024-09-18 16:28:55,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=462466.6666666667, ans=0.125 2024-09-18 16:28:57,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=462466.6666666667, ans=0.0 2024-09-18 16:29:02,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=462466.6666666667, ans=0.0 2024-09-18 16:29:02,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=462466.6666666667, ans=0.125 2024-09-18 16:29:09,057 INFO [train.py:1198] (1/2) Epoch 26, batch 2200, loss[loss=0.23, simple_loss=0.2865, pruned_loss=0.06491, ctc_loss=0.1349, cr_loss=0.4184, over 34473.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2698, pruned_loss=0.0615, ctc_loss=0.1286, cr_loss=0.403, over 6782502.42 frames. ], batch size: 100, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 16:29:19,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=462513.3333333333, ans=0.125 2024-09-18 16:29:22,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=462513.3333333333, ans=0.2 2024-09-18 16:29:22,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=462513.3333333333, ans=0.025 2024-09-18 16:29:28,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=462560.0, ans=0.0 2024-09-18 16:29:51,845 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:29:54,287 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2024-09-18 16:29:55,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=462606.6666666667, ans=0.0 2024-09-18 16:29:58,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2024-09-18 16:30:05,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=462653.3333333333, ans=0.125 2024-09-18 16:30:18,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=462700.0, ans=0.0 2024-09-18 16:30:27,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=12.0 2024-09-18 16:30:28,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=462700.0, ans=0.1 2024-09-18 16:30:33,467 INFO [train.py:1198] (1/2) Epoch 26, batch 2250, loss[loss=0.232, simple_loss=0.283, pruned_loss=0.0678, ctc_loss=0.1441, cr_loss=0.4159, over 34416.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.2698, pruned_loss=0.06141, ctc_loss=0.1286, cr_loss=0.4027, over 6779350.35 frames. ], batch size: 95, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 16:30:42,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=462746.6666666667, ans=0.125 2024-09-18 16:31:00,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.57 vs. limit=15.0 2024-09-18 16:31:14,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=462840.0, ans=0.125 2024-09-18 16:31:18,909 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2024-09-18 16:31:21,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.23 vs. limit=22.5 2024-09-18 16:31:29,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=462886.6666666667, ans=0.0 2024-09-18 16:31:34,096 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.033e+02 2.637e+02 3.192e+02 4.201e+02 8.552e+02, threshold=6.384e+02, percent-clipped=5.0 2024-09-18 16:31:37,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=462933.3333333333, ans=0.125 2024-09-18 16:31:49,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=462933.3333333333, ans=0.125 2024-09-18 16:31:53,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=462933.3333333333, ans=0.125 2024-09-18 16:31:55,888 INFO [train.py:1198] (1/2) Epoch 26, batch 2300, loss[loss=0.191, simple_loss=0.2411, pruned_loss=0.0518, ctc_loss=0.1125, cr_loss=0.3678, over 34292.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2688, pruned_loss=0.06119, ctc_loss=0.1282, cr_loss=0.4019, over 6763661.53 frames. ], batch size: 83, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 16:31:57,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=462980.0, ans=0.125 2024-09-18 16:32:07,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=462980.0, ans=0.025 2024-09-18 16:32:44,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=463120.0, ans=0.0 2024-09-18 16:32:49,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=463120.0, ans=0.2 2024-09-18 16:33:20,671 INFO [train.py:1198] (1/2) Epoch 26, batch 2350, loss[loss=0.2288, simple_loss=0.2796, pruned_loss=0.06684, ctc_loss=0.1369, cr_loss=0.4241, over 34688.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2691, pruned_loss=0.06148, ctc_loss=0.1288, cr_loss=0.4036, over 6769813.24 frames. ], batch size: 97, lr: 4.60e-03, grad_scale: 8.0 2024-09-18 16:33:30,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=463213.3333333333, ans=0.125 2024-09-18 16:33:32,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=463213.3333333333, ans=0.0 2024-09-18 16:33:55,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=463306.6666666667, ans=0.125 2024-09-18 16:34:03,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=463306.6666666667, ans=0.125 2024-09-18 16:34:03,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=463306.6666666667, ans=0.125 2024-09-18 16:34:07,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.61 vs. limit=22.5 2024-09-18 16:34:22,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.03 vs. limit=15.0 2024-09-18 16:34:23,349 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.435e+02 2.736e+02 3.539e+02 5.320e+02, threshold=5.472e+02, percent-clipped=0.0 2024-09-18 16:34:30,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=463400.0, ans=0.125 2024-09-18 16:34:35,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.28 vs. limit=10.0 2024-09-18 16:34:38,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=463400.0, ans=0.125 2024-09-18 16:34:39,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=9.45 vs. limit=15.0 2024-09-18 16:34:45,062 INFO [train.py:1198] (1/2) Epoch 26, batch 2400, loss[loss=0.2074, simple_loss=0.2607, pruned_loss=0.05784, ctc_loss=0.1177, cr_loss=0.3692, over 34573.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.2698, pruned_loss=0.06164, ctc_loss=0.1292, cr_loss=0.4043, over 6775559.82 frames. ], batch size: 89, lr: 4.60e-03, grad_scale: 16.0 2024-09-18 16:34:52,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.66 vs. limit=12.0 2024-09-18 16:35:15,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=463493.3333333333, ans=0.0 2024-09-18 16:35:35,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=463586.6666666667, ans=0.2 2024-09-18 16:35:36,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.29 vs. limit=15.0 2024-09-18 16:35:52,313 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.16 vs. limit=6.0 2024-09-18 16:35:55,490 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:35:57,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=463633.3333333333, ans=0.125 2024-09-18 16:35:57,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2024-09-18 16:36:08,345 INFO [train.py:1198] (1/2) Epoch 26, batch 2450, loss[loss=0.2138, simple_loss=0.2729, pruned_loss=0.05721, ctc_loss=0.123, cr_loss=0.392, over 34420.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2708, pruned_loss=0.06198, ctc_loss=0.1298, cr_loss=0.4054, over 6750295.53 frames. ], batch size: 95, lr: 4.60e-03, grad_scale: 16.0 2024-09-18 16:36:15,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=463680.0, ans=0.1 2024-09-18 16:36:20,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=463680.0, ans=0.125 2024-09-18 16:36:23,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=463726.6666666667, ans=0.125 2024-09-18 16:36:28,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.08 vs. limit=10.0 2024-09-18 16:36:34,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=463726.6666666667, ans=0.1 2024-09-18 16:36:41,833 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:36:56,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=463773.3333333333, ans=0.125 2024-09-18 16:37:11,094 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.437e+02 2.798e+02 3.729e+02 9.227e+02, threshold=5.595e+02, percent-clipped=4.0 2024-09-18 16:37:32,727 INFO [train.py:1198] (1/2) Epoch 26, batch 2500, loss[loss=0.2264, simple_loss=0.2861, pruned_loss=0.06218, ctc_loss=0.1316, cr_loss=0.404, over 34460.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2707, pruned_loss=0.0619, ctc_loss=0.1295, cr_loss=0.4049, over 6760109.55 frames. ], batch size: 100, lr: 4.60e-03, grad_scale: 16.0 2024-09-18 16:37:36,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=463913.3333333333, ans=0.0 2024-09-18 16:37:41,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-09-18 16:37:48,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=463960.0, ans=0.125 2024-09-18 16:38:01,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=463960.0, ans=0.0 2024-09-18 16:38:09,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=464006.6666666667, ans=0.125 2024-09-18 16:38:26,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2024-09-18 16:38:52,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=464100.0, ans=0.125 2024-09-18 16:38:53,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.80 vs. limit=22.5 2024-09-18 16:38:57,355 INFO [train.py:1198] (1/2) Epoch 26, batch 2550, loss[loss=0.1926, simple_loss=0.241, pruned_loss=0.05356, ctc_loss=0.1138, cr_loss=0.3595, over 34137.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2702, pruned_loss=0.0616, ctc_loss=0.129, cr_loss=0.4041, over 6764676.65 frames. ], batch size: 78, lr: 4.59e-03, grad_scale: 16.0 2024-09-18 16:38:57,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=464146.6666666667, ans=0.2 2024-09-18 16:39:02,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=464146.6666666667, ans=0.125 2024-09-18 16:39:12,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=464193.3333333333, ans=0.025 2024-09-18 16:39:14,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=464193.3333333333, ans=0.1 2024-09-18 16:39:17,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=464193.3333333333, ans=0.2 2024-09-18 16:39:24,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.44 vs. limit=12.0 2024-09-18 16:39:25,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=464193.3333333333, ans=0.025 2024-09-18 16:39:25,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=464193.3333333333, ans=0.0 2024-09-18 16:39:35,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=464240.0, ans=0.0 2024-09-18 16:39:56,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=464286.6666666667, ans=0.0 2024-09-18 16:39:58,112 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.062e+02 2.447e+02 3.042e+02 3.793e+02 9.578e+02, threshold=6.085e+02, percent-clipped=5.0 2024-09-18 16:40:00,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=464286.6666666667, ans=0.0 2024-09-18 16:40:00,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=12.0 2024-09-18 16:40:05,750 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.69 vs. limit=15.0 2024-09-18 16:40:19,558 INFO [train.py:1198] (1/2) Epoch 26, batch 2600, loss[loss=0.2099, simple_loss=0.2589, pruned_loss=0.05973, ctc_loss=0.1254, cr_loss=0.4107, over 34347.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2706, pruned_loss=0.06179, ctc_loss=0.1294, cr_loss=0.4055, over 6759991.10 frames. ], batch size: 91, lr: 4.59e-03, grad_scale: 16.0 2024-09-18 16:40:21,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=464380.0, ans=0.125 2024-09-18 16:41:05,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=464473.3333333333, ans=15.0 2024-09-18 16:41:19,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=464520.0, ans=0.125 2024-09-18 16:41:23,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.24 vs. limit=6.0 2024-09-18 16:41:44,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=464613.3333333333, ans=0.125 2024-09-18 16:41:45,525 INFO [train.py:1198] (1/2) Epoch 26, batch 2650, loss[loss=0.2263, simple_loss=0.2773, pruned_loss=0.06522, ctc_loss=0.1385, cr_loss=0.427, over 34267.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2707, pruned_loss=0.06172, ctc_loss=0.1293, cr_loss=0.4053, over 6767800.70 frames. ], batch size: 117, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 16:42:47,907 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.990e+02 2.477e+02 3.049e+02 3.689e+02 6.094e+02, threshold=6.099e+02, percent-clipped=1.0 2024-09-18 16:42:48,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=464753.3333333333, ans=0.1 2024-09-18 16:43:08,100 INFO [train.py:1198] (1/2) Epoch 26, batch 2700, loss[loss=0.2252, simple_loss=0.2789, pruned_loss=0.06419, ctc_loss=0.1327, cr_loss=0.4141, over 34599.00 frames. ], tot_loss[loss=0.2187, simple_loss=0.2712, pruned_loss=0.06199, ctc_loss=0.1297, cr_loss=0.4053, over 6763244.26 frames. ], batch size: 102, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 16:43:10,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=464846.6666666667, ans=0.0 2024-09-18 16:43:15,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=464846.6666666667, ans=0.2 2024-09-18 16:43:41,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=464940.0, ans=0.125 2024-09-18 16:43:48,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=464940.0, ans=0.05 2024-09-18 16:43:56,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=464986.6666666667, ans=0.015 2024-09-18 16:43:59,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=464986.6666666667, ans=0.125 2024-09-18 16:44:16,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=465033.3333333333, ans=0.05 2024-09-18 16:44:16,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=465033.3333333333, ans=0.125 2024-09-18 16:44:18,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=465033.3333333333, ans=0.0 2024-09-18 16:44:24,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=465033.3333333333, ans=0.2 2024-09-18 16:44:28,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=465033.3333333333, ans=0.04949747468305833 2024-09-18 16:44:32,800 INFO [train.py:1198] (1/2) Epoch 26, batch 2750, loss[loss=0.2078, simple_loss=0.2551, pruned_loss=0.06027, ctc_loss=0.124, cr_loss=0.3796, over 34631.00 frames. ], tot_loss[loss=0.2171, simple_loss=0.2698, pruned_loss=0.06133, ctc_loss=0.1285, cr_loss=0.4028, over 6761303.36 frames. ], batch size: 88, lr: 4.59e-03, grad_scale: 8.0 2024-09-18 16:44:59,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=465126.6666666667, ans=0.125 2024-09-18 16:45:20,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=465173.3333333333, ans=0.125 2024-09-18 16:45:34,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=465220.0, ans=0.025 2024-09-18 16:45:37,196 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.372e+02 2.677e+02 3.181e+02 4.694e+02, threshold=5.354e+02, percent-clipped=0.0 2024-09-18 16:45:49,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=465266.6666666667, ans=0.05 2024-09-18 16:45:49,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=465266.6666666667, ans=0.125 2024-09-18 16:45:56,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=465313.3333333333, ans=0.125 2024-09-18 16:45:57,339 INFO [train.py:1198] (1/2) Epoch 26, batch 2800, loss[loss=0.2345, simple_loss=0.2843, pruned_loss=0.06959, ctc_loss=0.1448, cr_loss=0.4149, over 24064.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2704, pruned_loss=0.06164, ctc_loss=0.1291, cr_loss=0.4043, over 6740560.06 frames. ], batch size: 244, lr: 4.59e-03, grad_scale: 16.0 2024-09-18 16:46:00,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=465313.3333333333, ans=0.0 2024-09-18 16:46:13,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=465360.0, ans=0.125 2024-09-18 16:46:19,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2024-09-18 16:46:50,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465453.3333333333, ans=0.1 2024-09-18 16:47:07,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=465500.0, ans=0.1 2024-09-18 16:47:19,749 INFO [train.py:1198] (1/2) Epoch 26, batch 2850, loss[loss=0.2188, simple_loss=0.2686, pruned_loss=0.0636, ctc_loss=0.1299, cr_loss=0.3939, over 34502.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2709, pruned_loss=0.06194, ctc_loss=0.1297, cr_loss=0.4056, over 6725503.24 frames. ], batch size: 90, lr: 4.59e-03, grad_scale: 16.0 2024-09-18 16:47:39,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=465593.3333333333, ans=0.1 2024-09-18 16:47:43,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=465593.3333333333, ans=0.125 2024-09-18 16:48:02,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.36 vs. limit=22.5 2024-09-18 16:48:14,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.24 vs. limit=10.0 2024-09-18 16:48:24,751 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.440e+02 2.778e+02 3.484e+02 6.314e+02, threshold=5.556e+02, percent-clipped=4.0 2024-09-18 16:48:36,336 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=22.5 2024-09-18 16:48:39,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.87 vs. limit=15.0 2024-09-18 16:48:45,155 INFO [train.py:1198] (1/2) Epoch 26, batch 2900, loss[loss=0.2194, simple_loss=0.2676, pruned_loss=0.06458, ctc_loss=0.1282, cr_loss=0.4082, over 34545.00 frames. ], tot_loss[loss=0.2194, simple_loss=0.2719, pruned_loss=0.06227, ctc_loss=0.1303, cr_loss=0.4072, over 6755860.97 frames. ], batch size: 94, lr: 4.59e-03, grad_scale: 16.0 2024-09-18 16:48:47,882 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=22.5 2024-09-18 16:49:32,664 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2024-09-18 16:49:38,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=465920.0, ans=0.0 2024-09-18 16:49:43,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=465920.0, ans=0.125 2024-09-18 16:49:51,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.38 vs. limit=22.5 2024-09-18 16:49:55,374 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 16:49:58,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=465966.6666666667, ans=0.125 2024-09-18 16:50:04,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=465966.6666666667, ans=0.2 2024-09-18 16:50:09,645 INFO [train.py:1198] (1/2) Epoch 26, batch 2950, loss[loss=0.2149, simple_loss=0.2651, pruned_loss=0.06115, ctc_loss=0.1297, cr_loss=0.4127, over 34644.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2708, pruned_loss=0.06185, ctc_loss=0.1295, cr_loss=0.4051, over 6751202.19 frames. ], batch size: 88, lr: 4.59e-03, grad_scale: 16.0 2024-09-18 16:50:16,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=466013.3333333333, ans=0.1 2024-09-18 16:50:23,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2024-09-18 16:50:34,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=466060.0, ans=0.025 2024-09-18 16:50:54,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=12.0 2024-09-18 16:50:57,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=466153.3333333333, ans=0.025 2024-09-18 16:51:04,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=466153.3333333333, ans=0.125 2024-09-18 16:51:11,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=466153.3333333333, ans=0.125 2024-09-18 16:51:12,383 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.004e+02 2.360e+02 2.859e+02 3.752e+02 7.080e+02, threshold=5.718e+02, percent-clipped=6.0 2024-09-18 16:51:30,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.47 vs. limit=15.0 2024-09-18 16:51:32,497 INFO [train.py:1198] (1/2) Epoch 26, batch 3000, loss[loss=0.2134, simple_loss=0.2697, pruned_loss=0.05832, ctc_loss=0.1241, cr_loss=0.3926, over 34542.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2704, pruned_loss=0.06171, ctc_loss=0.1293, cr_loss=0.4049, over 6750691.19 frames. ], batch size: 94, lr: 4.58e-03, grad_scale: 16.0 2024-09-18 16:51:32,497 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 16:51:50,355 INFO [train.py:1230] (1/2) Epoch 26, validation: loss=0.1493, simple_loss=0.2446, pruned_loss=0.023, ctc_loss=0.04055, cr_loss=1.998e-14, over 944034.00 frames. 2024-09-18 16:51:50,355 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 16:52:10,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=466293.3333333333, ans=0.125 2024-09-18 16:53:06,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=466433.3333333333, ans=0.95 2024-09-18 16:53:09,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=466433.3333333333, ans=0.125 2024-09-18 16:53:10,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=466433.3333333333, ans=0.125 2024-09-18 16:53:13,952 INFO [train.py:1198] (1/2) Epoch 26, batch 3050, loss[loss=0.2042, simple_loss=0.2586, pruned_loss=0.05563, ctc_loss=0.1169, cr_loss=0.3807, over 34596.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.271, pruned_loss=0.06193, ctc_loss=0.1297, cr_loss=0.4053, over 6742535.19 frames. ], batch size: 89, lr: 4.58e-03, grad_scale: 16.0 2024-09-18 16:53:15,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=466480.0, ans=0.0 2024-09-18 16:53:41,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=466526.6666666667, ans=0.1 2024-09-18 16:53:57,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=466573.3333333333, ans=0.0 2024-09-18 16:54:08,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.82 vs. limit=15.0 2024-09-18 16:54:09,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.87 vs. limit=15.0 2024-09-18 16:54:10,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=466620.0, ans=0.125 2024-09-18 16:54:15,123 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.475e+02 2.831e+02 3.412e+02 6.265e+02, threshold=5.661e+02, percent-clipped=1.0 2024-09-18 16:54:29,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.01 vs. limit=15.0 2024-09-18 16:54:41,260 INFO [train.py:1198] (1/2) Epoch 26, batch 3100, loss[loss=0.2262, simple_loss=0.278, pruned_loss=0.06461, ctc_loss=0.1383, cr_loss=0.4382, over 34196.00 frames. ], tot_loss[loss=0.2179, simple_loss=0.2704, pruned_loss=0.0617, ctc_loss=0.1293, cr_loss=0.4046, over 6742008.04 frames. ], batch size: 117, lr: 4.58e-03, grad_scale: 16.0 2024-09-18 16:54:48,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=15.0 2024-09-18 16:55:50,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=22.5 2024-09-18 16:56:02,640 INFO [train.py:1198] (1/2) Epoch 26, batch 3150, loss[loss=0.2344, simple_loss=0.2883, pruned_loss=0.06759, ctc_loss=0.1408, cr_loss=0.4291, over 33854.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2704, pruned_loss=0.06158, ctc_loss=0.129, cr_loss=0.4039, over 6748298.71 frames. ], batch size: 122, lr: 4.58e-03, grad_scale: 16.0 2024-09-18 16:56:15,081 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2024-09-18 16:56:16,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=466946.6666666667, ans=0.125 2024-09-18 16:56:19,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=466993.3333333333, ans=0.125 2024-09-18 16:56:19,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=466993.3333333333, ans=0.125 2024-09-18 16:56:24,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.62 vs. limit=15.0 2024-09-18 16:56:32,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=466993.3333333333, ans=0.125 2024-09-18 16:56:40,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=467040.0, ans=0.0 2024-09-18 16:56:51,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=467086.6666666667, ans=0.125 2024-09-18 16:56:58,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=467086.6666666667, ans=0.125 2024-09-18 16:57:04,188 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.560e+02 3.162e+02 4.057e+02 7.664e+02, threshold=6.323e+02, percent-clipped=4.0 2024-09-18 16:57:23,730 INFO [train.py:1198] (1/2) Epoch 26, batch 3200, loss[loss=0.2116, simple_loss=0.2689, pruned_loss=0.05717, ctc_loss=0.1217, cr_loss=0.39, over 34559.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.2699, pruned_loss=0.0613, ctc_loss=0.1286, cr_loss=0.4037, over 6761539.36 frames. ], batch size: 94, lr: 4.58e-03, grad_scale: 32.0 2024-09-18 16:57:51,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=467226.6666666667, ans=0.1 2024-09-18 16:58:04,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=467273.3333333333, ans=0.0 2024-09-18 16:58:07,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=467273.3333333333, ans=0.2 2024-09-18 16:58:40,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2024-09-18 16:58:41,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=467366.6666666667, ans=0.0 2024-09-18 16:58:41,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=467366.6666666667, ans=0.025 2024-09-18 16:58:46,316 INFO [train.py:1198] (1/2) Epoch 26, batch 3250, loss[loss=0.2126, simple_loss=0.2692, pruned_loss=0.05831, ctc_loss=0.1199, cr_loss=0.3837, over 34670.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2702, pruned_loss=0.06137, ctc_loss=0.1287, cr_loss=0.4042, over 6770924.54 frames. ], batch size: 98, lr: 4.58e-03, grad_scale: 32.0 2024-09-18 16:58:48,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=467413.3333333333, ans=0.2 2024-09-18 16:58:49,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=467413.3333333333, ans=0.2 2024-09-18 16:58:52,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=467413.3333333333, ans=0.125 2024-09-18 16:59:21,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=467506.6666666667, ans=0.05 2024-09-18 16:59:48,992 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.019e+02 2.457e+02 2.855e+02 3.521e+02 7.516e+02, threshold=5.710e+02, percent-clipped=1.0 2024-09-18 17:00:07,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=467646.6666666667, ans=0.125 2024-09-18 17:00:09,042 INFO [train.py:1198] (1/2) Epoch 26, batch 3300, loss[loss=0.2168, simple_loss=0.2704, pruned_loss=0.06083, ctc_loss=0.1294, cr_loss=0.3918, over 33198.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.269, pruned_loss=0.06114, ctc_loss=0.1283, cr_loss=0.4033, over 6769005.51 frames. ], batch size: 130, lr: 4.58e-03, grad_scale: 32.0 2024-09-18 17:00:20,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=467646.6666666667, ans=0.2 2024-09-18 17:00:35,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=467693.3333333333, ans=0.2 2024-09-18 17:00:38,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=467693.3333333333, ans=0.05 2024-09-18 17:00:43,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=467740.0, ans=0.125 2024-09-18 17:01:13,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.75 vs. limit=12.0 2024-09-18 17:01:30,006 INFO [train.py:1198] (1/2) Epoch 26, batch 3350, loss[loss=0.237, simple_loss=0.2856, pruned_loss=0.07133, ctc_loss=0.1427, cr_loss=0.431, over 33857.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2702, pruned_loss=0.06166, ctc_loss=0.1291, cr_loss=0.4054, over 6743078.39 frames. ], batch size: 122, lr: 4.58e-03, grad_scale: 32.0 2024-09-18 17:01:30,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=467880.0, ans=0.125 2024-09-18 17:01:54,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=467926.6666666667, ans=0.1 2024-09-18 17:01:56,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=467926.6666666667, ans=0.125 2024-09-18 17:02:22,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=468020.0, ans=0.2 2024-09-18 17:02:27,026 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:02:31,410 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.448e+02 2.684e+02 3.186e+02 4.760e+02, threshold=5.369e+02, percent-clipped=0.0 2024-09-18 17:02:38,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=468066.6666666667, ans=15.0 2024-09-18 17:02:41,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=468066.6666666667, ans=0.125 2024-09-18 17:02:47,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=468066.6666666667, ans=0.125 2024-09-18 17:02:50,663 INFO [train.py:1198] (1/2) Epoch 26, batch 3400, loss[loss=0.1874, simple_loss=0.2356, pruned_loss=0.0519, ctc_loss=0.1078, cr_loss=0.3462, over 34122.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2704, pruned_loss=0.06193, ctc_loss=0.1295, cr_loss=0.4059, over 6733223.09 frames. ], batch size: 78, lr: 4.58e-03, grad_scale: 32.0 2024-09-18 17:03:08,114 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=22.5 2024-09-18 17:03:16,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.70 vs. limit=12.0 2024-09-18 17:03:19,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=468160.0, ans=0.125 2024-09-18 17:03:35,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=468206.6666666667, ans=0.125 2024-09-18 17:03:45,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=468253.3333333333, ans=0.1 2024-09-18 17:03:59,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=468300.0, ans=0.1 2024-09-18 17:04:05,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=468300.0, ans=0.125 2024-09-18 17:04:10,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=468300.0, ans=0.0 2024-09-18 17:04:13,533 INFO [train.py:1198] (1/2) Epoch 26, batch 3450, loss[loss=0.2162, simple_loss=0.2761, pruned_loss=0.0577, ctc_loss=0.124, cr_loss=0.402, over 33151.00 frames. ], tot_loss[loss=0.2187, simple_loss=0.271, pruned_loss=0.06206, ctc_loss=0.1298, cr_loss=0.4065, over 6745031.03 frames. ], batch size: 130, lr: 4.57e-03, grad_scale: 32.0 2024-09-18 17:04:26,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=468346.6666666667, ans=0.1 2024-09-18 17:04:46,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=468440.0, ans=0.025 2024-09-18 17:04:57,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=468440.0, ans=0.125 2024-09-18 17:05:05,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=468486.6666666667, ans=0.2 2024-09-18 17:05:14,635 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.457e+02 2.849e+02 3.624e+02 6.796e+02, threshold=5.699e+02, percent-clipped=2.0 2024-09-18 17:05:14,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=468486.6666666667, ans=0.2 2024-09-18 17:05:23,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=468533.3333333333, ans=0.125 2024-09-18 17:05:24,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=468533.3333333333, ans=0.125 2024-09-18 17:05:26,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=468533.3333333333, ans=0.025 2024-09-18 17:05:34,069 INFO [train.py:1198] (1/2) Epoch 26, batch 3500, loss[loss=0.1947, simple_loss=0.2494, pruned_loss=0.05161, ctc_loss=0.1113, cr_loss=0.3603, over 34500.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2707, pruned_loss=0.06206, ctc_loss=0.1297, cr_loss=0.4063, over 6746897.66 frames. ], batch size: 85, lr: 4.57e-03, grad_scale: 32.0 2024-09-18 17:06:24,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=468720.0, ans=0.125 2024-09-18 17:06:30,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=468720.0, ans=0.2 2024-09-18 17:06:34,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=468720.0, ans=0.125 2024-09-18 17:06:40,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.21 vs. limit=15.0 2024-09-18 17:06:54,342 INFO [train.py:1198] (1/2) Epoch 26, batch 3550, loss[loss=0.2233, simple_loss=0.2804, pruned_loss=0.06171, ctc_loss=0.1317, cr_loss=0.4124, over 34371.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2708, pruned_loss=0.06197, ctc_loss=0.1296, cr_loss=0.4058, over 6757198.44 frames. ], batch size: 103, lr: 4.57e-03, grad_scale: 32.0 2024-09-18 17:06:57,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=468813.3333333333, ans=0.2 2024-09-18 17:07:04,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=468813.3333333333, ans=0.2 2024-09-18 17:07:12,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=468860.0, ans=0.125 2024-09-18 17:07:15,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=468860.0, ans=0.125 2024-09-18 17:07:40,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=468906.6666666667, ans=0.125 2024-09-18 17:07:49,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=468953.3333333333, ans=0.0 2024-09-18 17:07:51,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=468953.3333333333, ans=0.125 2024-09-18 17:07:57,557 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.105e+02 2.578e+02 2.896e+02 3.589e+02 5.255e+02, threshold=5.792e+02, percent-clipped=0.0 2024-09-18 17:08:10,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=469000.0, ans=0.1 2024-09-18 17:08:16,606 INFO [train.py:1198] (1/2) Epoch 26, batch 3600, loss[loss=0.2122, simple_loss=0.2651, pruned_loss=0.05927, ctc_loss=0.1234, cr_loss=0.4049, over 34467.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2706, pruned_loss=0.06191, ctc_loss=0.1295, cr_loss=0.4058, over 6766025.30 frames. ], batch size: 90, lr: 4.57e-03, grad_scale: 32.0 2024-09-18 17:08:44,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=469093.3333333333, ans=0.125 2024-09-18 17:09:06,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=469186.6666666667, ans=0.0 2024-09-18 17:09:09,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=469186.6666666667, ans=15.0 2024-09-18 17:09:36,194 INFO [train.py:1198] (1/2) Epoch 26, batch 3650, loss[loss=0.245, simple_loss=0.2964, pruned_loss=0.07305, ctc_loss=0.149, cr_loss=0.4434, over 34466.00 frames. ], tot_loss[loss=0.2176, simple_loss=0.2699, pruned_loss=0.06165, ctc_loss=0.129, cr_loss=0.4051, over 6769283.04 frames. ], batch size: 110, lr: 4.57e-03, grad_scale: 32.0 2024-09-18 17:09:47,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=469280.0, ans=0.2 2024-09-18 17:10:00,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=469326.6666666667, ans=0.04949747468305833 2024-09-18 17:10:36,982 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.049e+02 2.433e+02 3.006e+02 4.218e+02 9.240e+02, threshold=6.012e+02, percent-clipped=7.0 2024-09-18 17:10:56,748 INFO [train.py:1198] (1/2) Epoch 26, batch 3700, loss[loss=0.2166, simple_loss=0.2765, pruned_loss=0.05763, ctc_loss=0.1268, cr_loss=0.4028, over 34611.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2699, pruned_loss=0.06146, ctc_loss=0.1288, cr_loss=0.4044, over 6783389.49 frames. ], batch size: 102, lr: 4.57e-03, grad_scale: 32.0 2024-09-18 17:11:23,125 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.520e-03 2024-09-18 17:11:37,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=469606.6666666667, ans=0.2 2024-09-18 17:11:51,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=469653.3333333333, ans=0.125 2024-09-18 17:11:53,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=15.0 2024-09-18 17:12:04,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=469700.0, ans=0.1 2024-09-18 17:12:15,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=469700.0, ans=15.0 2024-09-18 17:12:19,023 INFO [train.py:1198] (1/2) Epoch 26, batch 3750, loss[loss=0.2303, simple_loss=0.2835, pruned_loss=0.06651, ctc_loss=0.1378, cr_loss=0.4116, over 34371.00 frames. ], tot_loss[loss=0.2206, simple_loss=0.2731, pruned_loss=0.06278, ctc_loss=0.1313, cr_loss=0.4098, over 6784528.20 frames. ], batch size: 113, lr: 4.57e-03, grad_scale: 32.0 2024-09-18 17:12:27,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2024-09-18 17:12:41,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=469793.3333333333, ans=0.1 2024-09-18 17:12:56,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=469840.0, ans=0.125 2024-09-18 17:13:19,972 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.146e+02 2.319e+02 2.471e+02 2.831e+02 4.781e+02, threshold=4.943e+02, percent-clipped=0.0 2024-09-18 17:13:23,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2024-09-18 17:13:33,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=469933.3333333333, ans=0.125 2024-09-18 17:13:33,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=469933.3333333333, ans=0.125 2024-09-18 17:13:40,520 INFO [train.py:1198] (1/2) Epoch 26, batch 3800, loss[loss=0.2435, simple_loss=0.2845, pruned_loss=0.07641, ctc_loss=0.1573, cr_loss=0.454, over 30143.00 frames. ], tot_loss[loss=0.2239, simple_loss=0.2757, pruned_loss=0.06427, ctc_loss=0.1342, cr_loss=0.4153, over 6675822.80 frames. ], batch size: 175, lr: 4.57e-03, grad_scale: 32.0 2024-09-18 17:13:50,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=469980.0, ans=0.125 2024-09-18 17:14:03,212 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.75 vs. limit=22.5 2024-09-18 17:14:12,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-09-18 17:14:25,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=470073.3333333333, ans=0.2 2024-09-18 17:14:34,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=470120.0, ans=0.0 2024-09-18 17:14:42,587 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=11.17 vs. limit=12.0 2024-09-18 17:15:03,630 INFO [train.py:1198] (1/2) Epoch 26, batch 3850, loss[loss=0.2428, simple_loss=0.2902, pruned_loss=0.07407, ctc_loss=0.1527, cr_loss=0.4174, over 23650.00 frames. ], tot_loss[loss=0.2276, simple_loss=0.2781, pruned_loss=0.06629, ctc_loss=0.1382, cr_loss=0.4189, over 6253377.61 frames. ], batch size: 244, lr: 4.57e-03, grad_scale: 4.0 2024-09-18 17:15:18,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=470260.0, ans=0.05 2024-09-18 17:15:30,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=470260.0, ans=0.125 2024-09-18 17:16:36,799 INFO [train.py:1198] (1/2) Epoch 27, batch 0, loss[loss=0.1873, simple_loss=0.2403, pruned_loss=0.04879, ctc_loss=0.1091, cr_loss=0.3725, over 34498.00 frames. ], tot_loss[loss=0.1873, simple_loss=0.2403, pruned_loss=0.04879, ctc_loss=0.1091, cr_loss=0.3725, over 34498.00 frames. ], batch size: 85, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 17:16:36,800 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 17:16:45,342 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.5068, 4.7655, 5.2696, 4.9536], device='cuda:1') 2024-09-18 17:16:54,495 INFO [train.py:1230] (1/2) Epoch 27, validation: loss=0.1493, simple_loss=0.2455, pruned_loss=0.02259, ctc_loss=0.04012, cr_loss=1.993e-14, over 944034.00 frames. 2024-09-18 17:16:54,495 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 17:16:56,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=470339.3333333333, ans=0.0 2024-09-18 17:17:05,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.02 vs. limit=22.5 2024-09-18 17:17:18,384 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.973e+02 2.647e+02 2.778e+02 3.145e+02 2.224e+03, threshold=5.555e+02, percent-clipped=7.0 2024-09-18 17:17:22,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=470386.0, ans=0.015 2024-09-18 17:17:22,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=470386.0, ans=0.2 2024-09-18 17:17:25,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=470386.0, ans=0.1 2024-09-18 17:17:28,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=470432.6666666667, ans=0.125 2024-09-18 17:17:50,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=470479.3333333333, ans=0.125 2024-09-18 17:17:51,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=470479.3333333333, ans=0.0 2024-09-18 17:18:03,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.25 vs. limit=15.0 2024-09-18 17:18:19,403 INFO [train.py:1198] (1/2) Epoch 27, batch 50, loss[loss=0.1888, simple_loss=0.2376, pruned_loss=0.05121, ctc_loss=0.1137, cr_loss=0.3698, over 34536.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2707, pruned_loss=0.06191, ctc_loss=0.1303, cr_loss=0.4076, over 1482138.16 frames. ], batch size: 82, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 17:18:19,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=470572.6666666667, ans=0.0 2024-09-18 17:18:56,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=470666.0, ans=0.125 2024-09-18 17:19:09,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=470712.6666666667, ans=0.125 2024-09-18 17:19:16,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=470712.6666666667, ans=0.125 2024-09-18 17:19:26,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=470759.3333333333, ans=0.2 2024-09-18 17:19:42,295 INFO [train.py:1198] (1/2) Epoch 27, batch 100, loss[loss=0.2149, simple_loss=0.2641, pruned_loss=0.062, ctc_loss=0.1284, cr_loss=0.4018, over 34602.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.274, pruned_loss=0.06328, ctc_loss=0.1323, cr_loss=0.4127, over 2629417.16 frames. ], batch size: 89, lr: 4.48e-03, grad_scale: 8.0 2024-09-18 17:19:53,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=470806.0, ans=0.1 2024-09-18 17:20:07,666 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.421e+02 2.938e+02 3.779e+02 6.087e+02, threshold=5.876e+02, percent-clipped=4.0 2024-09-18 17:20:11,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=470852.6666666667, ans=0.5 2024-09-18 17:20:16,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=470899.3333333333, ans=0.125 2024-09-18 17:20:31,091 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.80 vs. limit=15.0 2024-09-18 17:20:45,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=470946.0, ans=0.125 2024-09-18 17:20:50,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=470992.6666666667, ans=0.0 2024-09-18 17:21:06,082 INFO [train.py:1198] (1/2) Epoch 27, batch 150, loss[loss=0.205, simple_loss=0.258, pruned_loss=0.05639, ctc_loss=0.1193, cr_loss=0.3824, over 34509.00 frames. ], tot_loss[loss=0.2195, simple_loss=0.2719, pruned_loss=0.06229, ctc_loss=0.1305, cr_loss=0.4081, over 3558419.37 frames. ], batch size: 82, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 17:21:08,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.54 vs. limit=15.0 2024-09-18 17:21:10,395 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.11 vs. limit=15.0 2024-09-18 17:21:13,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=471039.3333333333, ans=0.09899494936611666 2024-09-18 17:21:28,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=471086.0, ans=0.05 2024-09-18 17:21:29,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=471086.0, ans=0.0 2024-09-18 17:21:36,809 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.58 vs. limit=15.0 2024-09-18 17:21:49,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=471132.6666666667, ans=0.0 2024-09-18 17:21:52,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=471132.6666666667, ans=0.0 2024-09-18 17:22:04,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=471179.3333333333, ans=0.125 2024-09-18 17:22:12,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=471226.0, ans=0.125 2024-09-18 17:22:30,336 INFO [train.py:1198] (1/2) Epoch 27, batch 200, loss[loss=0.231, simple_loss=0.284, pruned_loss=0.06645, ctc_loss=0.1407, cr_loss=0.4236, over 31953.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2707, pruned_loss=0.06182, ctc_loss=0.1298, cr_loss=0.4066, over 4273444.04 frames. ], batch size: 146, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 17:22:33,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=471272.6666666667, ans=0.125 2024-09-18 17:22:53,670 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.572e+02 2.843e+02 3.746e+02 6.842e+02, threshold=5.686e+02, percent-clipped=1.0 2024-09-18 17:22:54,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=471319.3333333333, ans=0.0 2024-09-18 17:22:55,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=471319.3333333333, ans=0.0 2024-09-18 17:23:08,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=471366.0, ans=0.5 2024-09-18 17:23:23,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=471412.6666666667, ans=0.125 2024-09-18 17:23:28,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471412.6666666667, ans=0.1 2024-09-18 17:23:28,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471412.6666666667, ans=0.1 2024-09-18 17:23:41,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=471459.3333333333, ans=0.125 2024-09-18 17:23:55,635 INFO [train.py:1198] (1/2) Epoch 27, batch 250, loss[loss=0.2306, simple_loss=0.2826, pruned_loss=0.06622, ctc_loss=0.1402, cr_loss=0.4495, over 34244.00 frames. ], tot_loss[loss=0.2179, simple_loss=0.2705, pruned_loss=0.06159, ctc_loss=0.1293, cr_loss=0.4061, over 4834002.65 frames. ], batch size: 117, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 17:23:56,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=471506.0, ans=0.025 2024-09-18 17:24:02,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=471506.0, ans=0.2 2024-09-18 17:24:10,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=471552.6666666667, ans=0.0 2024-09-18 17:24:17,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=471552.6666666667, ans=0.0 2024-09-18 17:24:28,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=471599.3333333333, ans=0.2 2024-09-18 17:24:40,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=471599.3333333333, ans=0.0 2024-09-18 17:24:42,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=12.0 2024-09-18 17:24:51,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=471646.0, ans=0.2 2024-09-18 17:24:58,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=471646.0, ans=0.125 2024-09-18 17:25:00,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=471692.6666666667, ans=0.125 2024-09-18 17:25:11,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=471692.6666666667, ans=0.125 2024-09-18 17:25:16,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=471739.3333333333, ans=0.125 2024-09-18 17:25:17,632 INFO [train.py:1198] (1/2) Epoch 27, batch 300, loss[loss=0.2431, simple_loss=0.2908, pruned_loss=0.0735, ctc_loss=0.1497, cr_loss=0.4595, over 34378.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.2701, pruned_loss=0.06143, ctc_loss=0.129, cr_loss=0.4061, over 5262274.25 frames. ], batch size: 107, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 17:25:32,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=471786.0, ans=10.0 2024-09-18 17:25:34,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=471786.0, ans=0.0 2024-09-18 17:25:40,795 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.465e+02 2.802e+02 3.530e+02 7.974e+02, threshold=5.604e+02, percent-clipped=1.0 2024-09-18 17:26:12,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=471879.3333333333, ans=0.125 2024-09-18 17:26:14,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=471879.3333333333, ans=0.025 2024-09-18 17:26:41,873 INFO [train.py:1198] (1/2) Epoch 27, batch 350, loss[loss=0.1829, simple_loss=0.2395, pruned_loss=0.04651, ctc_loss=0.09927, cr_loss=0.3353, over 34274.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.27, pruned_loss=0.06125, ctc_loss=0.1287, cr_loss=0.4053, over 5598077.05 frames. ], batch size: 83, lr: 4.47e-03, grad_scale: 8.0 2024-09-18 17:26:56,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=472019.3333333333, ans=0.125 2024-09-18 17:27:01,891 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:27:02,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=22.5 2024-09-18 17:27:57,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=472159.3333333333, ans=0.2 2024-09-18 17:28:01,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=472159.3333333333, ans=0.125 2024-09-18 17:28:06,460 INFO [train.py:1198] (1/2) Epoch 27, batch 400, loss[loss=0.2093, simple_loss=0.2669, pruned_loss=0.05614, ctc_loss=0.1198, cr_loss=0.3866, over 34434.00 frames. ], tot_loss[loss=0.2168, simple_loss=0.2696, pruned_loss=0.06106, ctc_loss=0.1284, cr_loss=0.4043, over 5865717.87 frames. ], batch size: 95, lr: 4.47e-03, grad_scale: 16.0 2024-09-18 17:28:30,057 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.399e+02 2.863e+02 3.463e+02 5.535e+02, threshold=5.725e+02, percent-clipped=0.0 2024-09-18 17:28:32,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=472252.6666666667, ans=0.0 2024-09-18 17:28:50,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=472299.3333333333, ans=0.2 2024-09-18 17:28:56,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=472346.0, ans=0.125 2024-09-18 17:29:01,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=472346.0, ans=0.125 2024-09-18 17:29:01,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=472346.0, ans=0.0 2024-09-18 17:29:05,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=472346.0, ans=0.05 2024-09-18 17:29:13,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=472392.6666666667, ans=0.0 2024-09-18 17:29:31,350 INFO [train.py:1198] (1/2) Epoch 27, batch 450, loss[loss=0.2321, simple_loss=0.2846, pruned_loss=0.0672, ctc_loss=0.1408, cr_loss=0.4257, over 34708.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2698, pruned_loss=0.0612, ctc_loss=0.1286, cr_loss=0.4051, over 6055613.67 frames. ], batch size: 97, lr: 4.47e-03, grad_scale: 16.0 2024-09-18 17:30:10,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=472532.6666666667, ans=0.125 2024-09-18 17:30:11,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=472532.6666666667, ans=0.2 2024-09-18 17:30:14,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=472532.6666666667, ans=0.125 2024-09-18 17:30:32,092 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.71 vs. limit=12.0 2024-09-18 17:30:38,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2024-09-18 17:30:47,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=472626.0, ans=0.125 2024-09-18 17:30:49,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=472626.0, ans=0.125 2024-09-18 17:30:52,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=472672.6666666667, ans=0.0 2024-09-18 17:30:54,039 INFO [train.py:1198] (1/2) Epoch 27, batch 500, loss[loss=0.2439, simple_loss=0.2916, pruned_loss=0.07385, ctc_loss=0.1501, cr_loss=0.4602, over 34407.00 frames. ], tot_loss[loss=0.2161, simple_loss=0.2689, pruned_loss=0.06082, ctc_loss=0.1278, cr_loss=0.4035, over 6222193.63 frames. ], batch size: 110, lr: 4.47e-03, grad_scale: 16.0 2024-09-18 17:30:54,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.40 vs. limit=15.0 2024-09-18 17:31:06,736 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.53 vs. limit=12.0 2024-09-18 17:31:14,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=472719.3333333333, ans=0.2 2024-09-18 17:31:17,268 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.965e+02 2.400e+02 2.934e+02 3.663e+02 6.319e+02, threshold=5.868e+02, percent-clipped=2.0 2024-09-18 17:31:36,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=472766.0, ans=0.125 2024-09-18 17:31:39,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=472766.0, ans=0.1 2024-09-18 17:31:42,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=472766.0, ans=0.0 2024-09-18 17:31:51,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=472812.6666666667, ans=0.2 2024-09-18 17:32:01,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=15.0 2024-09-18 17:32:09,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=472859.3333333333, ans=0.0 2024-09-18 17:32:16,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=472859.3333333333, ans=0.125 2024-09-18 17:32:19,141 INFO [train.py:1198] (1/2) Epoch 27, batch 550, loss[loss=0.2254, simple_loss=0.2798, pruned_loss=0.06316, ctc_loss=0.1372, cr_loss=0.4312, over 33882.00 frames. ], tot_loss[loss=0.2162, simple_loss=0.2688, pruned_loss=0.06094, ctc_loss=0.128, cr_loss=0.4032, over 6331617.14 frames. ], batch size: 122, lr: 4.47e-03, grad_scale: 16.0 2024-09-18 17:32:21,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=472906.0, ans=0.1 2024-09-18 17:32:31,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=472906.0, ans=0.1 2024-09-18 17:32:36,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=472952.6666666667, ans=0.0 2024-09-18 17:32:54,892 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=15.0 2024-09-18 17:32:57,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=472999.3333333333, ans=22.5 2024-09-18 17:33:08,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=473046.0, ans=0.1 2024-09-18 17:33:23,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=473046.0, ans=0.2 2024-09-18 17:33:28,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2024-09-18 17:33:28,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=473092.6666666667, ans=0.1 2024-09-18 17:33:38,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=473092.6666666667, ans=0.125 2024-09-18 17:33:43,472 INFO [train.py:1198] (1/2) Epoch 27, batch 600, loss[loss=0.243, simple_loss=0.2965, pruned_loss=0.07145, ctc_loss=0.145, cr_loss=0.4413, over 34251.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.2693, pruned_loss=0.06101, ctc_loss=0.1281, cr_loss=0.4038, over 6431069.12 frames. ], batch size: 117, lr: 4.46e-03, grad_scale: 16.0 2024-09-18 17:33:48,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=473139.3333333333, ans=0.0 2024-09-18 17:33:55,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=473139.3333333333, ans=0.125 2024-09-18 17:34:03,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=473186.0, ans=0.025 2024-09-18 17:34:06,898 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.039e+02 2.456e+02 2.896e+02 3.474e+02 5.742e+02, threshold=5.791e+02, percent-clipped=0.0 2024-09-18 17:34:08,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=473186.0, ans=0.125 2024-09-18 17:34:52,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=473326.0, ans=0.125 2024-09-18 17:34:54,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=473326.0, ans=0.05 2024-09-18 17:35:05,812 INFO [train.py:1198] (1/2) Epoch 27, batch 650, loss[loss=0.212, simple_loss=0.265, pruned_loss=0.05888, ctc_loss=0.1267, cr_loss=0.3955, over 34522.00 frames. ], tot_loss[loss=0.2155, simple_loss=0.2684, pruned_loss=0.06053, ctc_loss=0.1271, cr_loss=0.4018, over 6522285.68 frames. ], batch size: 94, lr: 4.46e-03, grad_scale: 16.0 2024-09-18 17:35:19,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=473372.6666666667, ans=0.0 2024-09-18 17:35:26,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=473419.3333333333, ans=0.0 2024-09-18 17:35:43,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.65 vs. limit=12.0 2024-09-18 17:36:13,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=473559.3333333333, ans=0.025 2024-09-18 17:36:16,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.94 vs. limit=6.0 2024-09-18 17:36:23,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=473559.3333333333, ans=0.125 2024-09-18 17:36:29,983 INFO [train.py:1198] (1/2) Epoch 27, batch 700, loss[loss=0.2087, simple_loss=0.2592, pruned_loss=0.05869, ctc_loss=0.1239, cr_loss=0.4023, over 34577.00 frames. ], tot_loss[loss=0.2163, simple_loss=0.2693, pruned_loss=0.06081, ctc_loss=0.1277, cr_loss=0.4032, over 6577199.79 frames. ], batch size: 89, lr: 4.46e-03, grad_scale: 16.0 2024-09-18 17:36:38,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=473606.0, ans=0.125 2024-09-18 17:36:40,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=473606.0, ans=0.0 2024-09-18 17:36:46,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=473652.6666666667, ans=0.125 2024-09-18 17:36:48,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=473652.6666666667, ans=0.125 2024-09-18 17:36:52,732 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.924e+02 2.464e+02 3.170e+02 4.499e+02 7.340e+02, threshold=6.341e+02, percent-clipped=9.0 2024-09-18 17:36:59,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=473652.6666666667, ans=0.0 2024-09-18 17:37:48,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=473792.6666666667, ans=0.0 2024-09-18 17:37:54,367 INFO [train.py:1198] (1/2) Epoch 27, batch 750, loss[loss=0.2366, simple_loss=0.2904, pruned_loss=0.06872, ctc_loss=0.1405, cr_loss=0.4304, over 34439.00 frames. ], tot_loss[loss=0.216, simple_loss=0.2689, pruned_loss=0.06075, ctc_loss=0.1275, cr_loss=0.4025, over 6621078.71 frames. ], batch size: 95, lr: 4.46e-03, grad_scale: 16.0 2024-09-18 17:37:57,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=473839.3333333333, ans=0.125 2024-09-18 17:38:20,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=473886.0, ans=0.0 2024-09-18 17:38:27,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=473932.6666666667, ans=0.025 2024-09-18 17:38:34,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=473932.6666666667, ans=0.0 2024-09-18 17:38:57,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=473979.3333333333, ans=0.0 2024-09-18 17:39:17,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=474072.6666666667, ans=0.2 2024-09-18 17:39:19,162 INFO [train.py:1198] (1/2) Epoch 27, batch 800, loss[loss=0.179, simple_loss=0.2365, pruned_loss=0.04435, ctc_loss=0.09729, cr_loss=0.3352, over 34486.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2692, pruned_loss=0.06097, ctc_loss=0.1278, cr_loss=0.4035, over 6656589.32 frames. ], batch size: 85, lr: 4.46e-03, grad_scale: 32.0 2024-09-18 17:39:21,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=474072.6666666667, ans=0.125 2024-09-18 17:39:24,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=474072.6666666667, ans=0.125 2024-09-18 17:39:41,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=474119.3333333333, ans=0.0 2024-09-18 17:39:42,523 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.976e+02 2.492e+02 2.854e+02 3.396e+02 9.981e+02, threshold=5.708e+02, percent-clipped=1.0 2024-09-18 17:39:54,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=474166.0, ans=0.2 2024-09-18 17:39:56,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.96 vs. limit=22.5 2024-09-18 17:40:07,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=474212.6666666667, ans=0.0 2024-09-18 17:40:20,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=474212.6666666667, ans=0.0 2024-09-18 17:40:41,737 INFO [train.py:1198] (1/2) Epoch 27, batch 850, loss[loss=0.2157, simple_loss=0.2761, pruned_loss=0.0581, ctc_loss=0.1211, cr_loss=0.3727, over 34383.00 frames. ], tot_loss[loss=0.216, simple_loss=0.269, pruned_loss=0.06073, ctc_loss=0.1273, cr_loss=0.4027, over 6689297.24 frames. ], batch size: 103, lr: 4.46e-03, grad_scale: 32.0 2024-09-18 17:41:05,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=474352.6666666667, ans=0.125 2024-09-18 17:41:18,586 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:41:26,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=474399.3333333333, ans=0.125 2024-09-18 17:41:29,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=474399.3333333333, ans=0.125 2024-09-18 17:41:30,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=474399.3333333333, ans=0.2 2024-09-18 17:41:36,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=474446.0, ans=0.125 2024-09-18 17:42:03,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=474492.6666666667, ans=0.2 2024-09-18 17:42:06,498 INFO [train.py:1198] (1/2) Epoch 27, batch 900, loss[loss=0.1999, simple_loss=0.2526, pruned_loss=0.05398, ctc_loss=0.1178, cr_loss=0.3914, over 34494.00 frames. ], tot_loss[loss=0.2163, simple_loss=0.2692, pruned_loss=0.06089, ctc_loss=0.1276, cr_loss=0.4029, over 6695921.54 frames. ], batch size: 85, lr: 4.46e-03, grad_scale: 32.0 2024-09-18 17:42:17,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=474539.3333333333, ans=0.025 2024-09-18 17:42:19,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=474539.3333333333, ans=0.125 2024-09-18 17:42:29,170 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.549e+02 2.994e+02 3.897e+02 5.845e+02, threshold=5.988e+02, percent-clipped=2.0 2024-09-18 17:42:41,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=474632.6666666667, ans=15.0 2024-09-18 17:42:41,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.47 vs. limit=15.0 2024-09-18 17:42:44,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=474632.6666666667, ans=0.0 2024-09-18 17:42:52,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=474632.6666666667, ans=0.0 2024-09-18 17:43:11,736 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2024-09-18 17:43:30,466 INFO [train.py:1198] (1/2) Epoch 27, batch 950, loss[loss=0.2069, simple_loss=0.2547, pruned_loss=0.05844, ctc_loss=0.1285, cr_loss=0.4103, over 34701.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.2694, pruned_loss=0.06095, ctc_loss=0.1278, cr_loss=0.4029, over 6699477.09 frames. ], batch size: 87, lr: 4.46e-03, grad_scale: 16.0 2024-09-18 17:43:35,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=474772.6666666667, ans=0.0 2024-09-18 17:43:55,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=474819.3333333333, ans=0.1 2024-09-18 17:43:58,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=474819.3333333333, ans=0.07 2024-09-18 17:44:13,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.06 vs. limit=8.0 2024-09-18 17:44:37,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=474959.3333333333, ans=0.0 2024-09-18 17:44:44,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=474959.3333333333, ans=0.2 2024-09-18 17:44:46,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.30 vs. limit=10.0 2024-09-18 17:44:53,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=475006.0, ans=0.125 2024-09-18 17:44:55,191 INFO [train.py:1198] (1/2) Epoch 27, batch 1000, loss[loss=0.2011, simple_loss=0.2564, pruned_loss=0.05401, ctc_loss=0.1147, cr_loss=0.371, over 34489.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.27, pruned_loss=0.06127, ctc_loss=0.1284, cr_loss=0.4035, over 6692832.44 frames. ], batch size: 90, lr: 4.46e-03, grad_scale: 16.0 2024-09-18 17:45:08,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=475006.0, ans=0.025 2024-09-18 17:45:20,397 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.198e+02 2.617e+02 3.382e+02 4.221e+02 6.638e+02, threshold=6.763e+02, percent-clipped=1.0 2024-09-18 17:45:39,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=475099.3333333333, ans=15.0 2024-09-18 17:45:40,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=475099.3333333333, ans=0.2 2024-09-18 17:45:40,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=475099.3333333333, ans=0.1 2024-09-18 17:45:52,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=475146.0, ans=0.0 2024-09-18 17:45:57,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.17 vs. limit=22.5 2024-09-18 17:45:58,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=475146.0, ans=0.2 2024-09-18 17:46:00,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=475192.6666666667, ans=0.1 2024-09-18 17:46:17,792 INFO [train.py:1198] (1/2) Epoch 27, batch 1050, loss[loss=0.2166, simple_loss=0.2744, pruned_loss=0.05886, ctc_loss=0.1258, cr_loss=0.3976, over 34568.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2695, pruned_loss=0.06113, ctc_loss=0.1281, cr_loss=0.4026, over 6701596.25 frames. ], batch size: 99, lr: 4.45e-03, grad_scale: 16.0 2024-09-18 17:46:26,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=475239.3333333333, ans=0.1 2024-09-18 17:46:48,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=475286.0, ans=0.125 2024-09-18 17:46:55,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=475332.6666666667, ans=0.0 2024-09-18 17:47:11,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=22.5 2024-09-18 17:47:42,610 INFO [train.py:1198] (1/2) Epoch 27, batch 1100, loss[loss=0.2167, simple_loss=0.2673, pruned_loss=0.06206, ctc_loss=0.1287, cr_loss=0.4073, over 34364.00 frames. ], tot_loss[loss=0.2163, simple_loss=0.2691, pruned_loss=0.06088, ctc_loss=0.1278, cr_loss=0.4024, over 6714972.69 frames. ], batch size: 91, lr: 4.45e-03, grad_scale: 16.0 2024-09-18 17:47:52,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=475472.6666666667, ans=0.125 2024-09-18 17:48:01,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=475519.3333333333, ans=0.125 2024-09-18 17:48:07,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.370e+02 2.841e+02 3.538e+02 5.643e+02, threshold=5.682e+02, percent-clipped=0.0 2024-09-18 17:48:10,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2024-09-18 17:48:14,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=475519.3333333333, ans=22.5 2024-09-18 17:48:57,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=475659.3333333333, ans=0.0 2024-09-18 17:49:02,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=475659.3333333333, ans=0.125 2024-09-18 17:49:02,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=475659.3333333333, ans=0.125 2024-09-18 17:49:02,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=475659.3333333333, ans=0.125 2024-09-18 17:49:06,971 INFO [train.py:1198] (1/2) Epoch 27, batch 1150, loss[loss=0.2085, simple_loss=0.2589, pruned_loss=0.05836, ctc_loss=0.1241, cr_loss=0.4161, over 34711.00 frames. ], tot_loss[loss=0.2162, simple_loss=0.269, pruned_loss=0.0609, ctc_loss=0.1278, cr_loss=0.4022, over 6712611.46 frames. ], batch size: 92, lr: 4.45e-03, grad_scale: 16.0 2024-09-18 17:49:07,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=475706.0, ans=0.125 2024-09-18 17:49:25,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=475752.6666666667, ans=0.1 2024-09-18 17:49:37,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=475752.6666666667, ans=0.125 2024-09-18 17:49:38,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=475799.3333333333, ans=0.035 2024-09-18 17:49:45,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.48 vs. limit=15.0 2024-09-18 17:49:53,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=475799.3333333333, ans=0.95 2024-09-18 17:50:31,427 INFO [train.py:1198] (1/2) Epoch 27, batch 1200, loss[loss=0.2197, simple_loss=0.2782, pruned_loss=0.05986, ctc_loss=0.126, cr_loss=0.4068, over 34546.00 frames. ], tot_loss[loss=0.2171, simple_loss=0.27, pruned_loss=0.06118, ctc_loss=0.1283, cr_loss=0.4033, over 6704491.32 frames. ], batch size: 99, lr: 4.45e-03, grad_scale: 32.0 2024-09-18 17:50:36,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=475939.3333333333, ans=0.125 2024-09-18 17:50:56,485 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.368e+02 2.683e+02 3.176e+02 5.715e+02, threshold=5.366e+02, percent-clipped=1.0 2024-09-18 17:51:21,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=476079.3333333333, ans=0.0 2024-09-18 17:51:36,734 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:51:56,454 INFO [train.py:1198] (1/2) Epoch 27, batch 1250, loss[loss=0.2432, simple_loss=0.2928, pruned_loss=0.07281, ctc_loss=0.1472, cr_loss=0.4645, over 34331.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.271, pruned_loss=0.06175, ctc_loss=0.1293, cr_loss=0.4055, over 6738922.97 frames. ], batch size: 107, lr: 4.45e-03, grad_scale: 16.0 2024-09-18 17:52:04,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2024-09-18 17:52:11,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=476219.3333333333, ans=0.025 2024-09-18 17:52:16,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=476219.3333333333, ans=0.1 2024-09-18 17:52:21,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=476219.3333333333, ans=0.0 2024-09-18 17:52:57,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476312.6666666667, ans=0.1 2024-09-18 17:52:59,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=476312.6666666667, ans=0.1 2024-09-18 17:53:04,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476359.3333333333, ans=0.1 2024-09-18 17:53:18,860 INFO [train.py:1198] (1/2) Epoch 27, batch 1300, loss[loss=0.232, simple_loss=0.287, pruned_loss=0.06622, ctc_loss=0.1404, cr_loss=0.4099, over 33057.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2699, pruned_loss=0.0614, ctc_loss=0.1285, cr_loss=0.4036, over 6742514.03 frames. ], batch size: 130, lr: 4.45e-03, grad_scale: 16.0 2024-09-18 17:53:24,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=476406.0, ans=0.125 2024-09-18 17:53:45,266 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.434e+02 2.858e+02 3.612e+02 5.948e+02, threshold=5.716e+02, percent-clipped=1.0 2024-09-18 17:54:02,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=476499.3333333333, ans=0.0 2024-09-18 17:54:10,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=476546.0, ans=0.0 2024-09-18 17:54:15,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=476546.0, ans=0.025 2024-09-18 17:54:27,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=476592.6666666667, ans=0.025 2024-09-18 17:54:42,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=476639.3333333333, ans=0.125 2024-09-18 17:54:43,618 INFO [train.py:1198] (1/2) Epoch 27, batch 1350, loss[loss=0.2136, simple_loss=0.2671, pruned_loss=0.05932, ctc_loss=0.128, cr_loss=0.3961, over 34508.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2695, pruned_loss=0.06109, ctc_loss=0.128, cr_loss=0.4028, over 6763506.68 frames. ], batch size: 94, lr: 4.45e-03, grad_scale: 16.0 2024-09-18 17:54:55,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=476639.3333333333, ans=0.125 2024-09-18 17:55:03,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476686.0, ans=0.1 2024-09-18 17:55:05,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.40 vs. limit=15.0 2024-09-18 17:55:27,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=476732.6666666667, ans=0.125 2024-09-18 17:55:31,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476779.3333333333, ans=0.1 2024-09-18 17:55:31,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.70 vs. limit=15.0 2024-09-18 17:55:39,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=476779.3333333333, ans=0.0 2024-09-18 17:55:54,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476826.0, ans=0.1 2024-09-18 17:56:07,135 INFO [train.py:1198] (1/2) Epoch 27, batch 1400, loss[loss=0.2, simple_loss=0.2506, pruned_loss=0.0559, ctc_loss=0.116, cr_loss=0.3626, over 34295.00 frames. ], tot_loss[loss=0.2169, simple_loss=0.2695, pruned_loss=0.06124, ctc_loss=0.1282, cr_loss=0.4036, over 6775783.47 frames. ], batch size: 80, lr: 4.45e-03, grad_scale: 16.0 2024-09-18 17:56:25,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=476919.3333333333, ans=0.125 2024-09-18 17:56:33,525 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.668e+02 3.162e+02 3.809e+02 7.476e+02, threshold=6.323e+02, percent-clipped=2.0 2024-09-18 17:56:57,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=477012.6666666667, ans=0.5 2024-09-18 17:57:02,963 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2024-09-18 17:57:05,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=477012.6666666667, ans=0.125 2024-09-18 17:57:29,541 INFO [train.py:1198] (1/2) Epoch 27, batch 1450, loss[loss=0.2411, simple_loss=0.2924, pruned_loss=0.07154, ctc_loss=0.1444, cr_loss=0.4468, over 34437.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.27, pruned_loss=0.06128, ctc_loss=0.1282, cr_loss=0.4038, over 6772650.58 frames. ], batch size: 110, lr: 4.45e-03, grad_scale: 16.0 2024-09-18 17:57:36,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=477106.0, ans=0.1 2024-09-18 17:57:39,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=477106.0, ans=0.05 2024-09-18 17:57:45,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.55 vs. limit=10.0 2024-09-18 17:58:02,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=477199.3333333333, ans=0.0 2024-09-18 17:58:42,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=477292.6666666667, ans=0.1 2024-09-18 17:58:42,872 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 17:58:54,040 INFO [train.py:1198] (1/2) Epoch 27, batch 1500, loss[loss=0.2354, simple_loss=0.2866, pruned_loss=0.06919, ctc_loss=0.1428, cr_loss=0.4329, over 34443.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2703, pruned_loss=0.06132, ctc_loss=0.1285, cr_loss=0.4045, over 6773852.05 frames. ], batch size: 100, lr: 4.44e-03, grad_scale: 16.0 2024-09-18 17:59:14,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=477386.0, ans=0.04949747468305833 2024-09-18 17:59:15,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2024-09-18 17:59:20,659 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.085e+02 2.568e+02 2.758e+02 3.447e+02 7.211e+02, threshold=5.517e+02, percent-clipped=1.0 2024-09-18 17:59:44,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=477479.3333333333, ans=0.1 2024-09-18 17:59:46,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=477479.3333333333, ans=0.2 2024-09-18 18:00:03,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.42 vs. limit=22.5 2024-09-18 18:00:18,996 INFO [train.py:1198] (1/2) Epoch 27, batch 1550, loss[loss=0.2307, simple_loss=0.2833, pruned_loss=0.06644, ctc_loss=0.1395, cr_loss=0.4332, over 34419.00 frames. ], tot_loss[loss=0.2177, simple_loss=0.2702, pruned_loss=0.06159, ctc_loss=0.129, cr_loss=0.4053, over 6746114.52 frames. ], batch size: 105, lr: 4.44e-03, grad_scale: 16.0 2024-09-18 18:00:26,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=477572.6666666667, ans=0.0 2024-09-18 18:00:29,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=477572.6666666667, ans=10.0 2024-09-18 18:00:59,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=477666.0, ans=0.2 2024-09-18 18:01:02,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=477666.0, ans=0.0 2024-09-18 18:01:07,831 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2024-09-18 18:01:25,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=477759.3333333333, ans=0.125 2024-09-18 18:01:25,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=477759.3333333333, ans=0.0 2024-09-18 18:01:31,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.90 vs. limit=15.0 2024-09-18 18:01:35,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=477759.3333333333, ans=0.125 2024-09-18 18:01:41,634 INFO [train.py:1198] (1/2) Epoch 27, batch 1600, loss[loss=0.2118, simple_loss=0.269, pruned_loss=0.057, ctc_loss=0.1238, cr_loss=0.3961, over 34545.00 frames. ], tot_loss[loss=0.2181, simple_loss=0.2703, pruned_loss=0.06182, ctc_loss=0.1295, cr_loss=0.406, over 6723736.37 frames. ], batch size: 99, lr: 4.44e-03, grad_scale: 32.0 2024-09-18 18:01:44,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.50 vs. limit=15.0 2024-09-18 18:02:10,555 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.448e+02 2.802e+02 3.598e+02 1.073e+03, threshold=5.605e+02, percent-clipped=6.0 2024-09-18 18:03:08,115 INFO [train.py:1198] (1/2) Epoch 27, batch 1650, loss[loss=0.2197, simple_loss=0.2773, pruned_loss=0.05999, ctc_loss=0.128, cr_loss=0.4139, over 34377.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2701, pruned_loss=0.06167, ctc_loss=0.1293, cr_loss=0.4057, over 6715384.97 frames. ], batch size: 103, lr: 4.44e-03, grad_scale: 32.0 2024-09-18 18:03:16,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=478039.3333333333, ans=0.0 2024-09-18 18:03:41,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.73 vs. limit=10.0 2024-09-18 18:03:44,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=478132.6666666667, ans=0.125 2024-09-18 18:04:07,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=478179.3333333333, ans=0.1 2024-09-18 18:04:22,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=478226.0, ans=0.0 2024-09-18 18:04:25,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=478226.0, ans=0.0 2024-09-18 18:04:30,207 INFO [train.py:1198] (1/2) Epoch 27, batch 1700, loss[loss=0.1899, simple_loss=0.2444, pruned_loss=0.04999, ctc_loss=0.1067, cr_loss=0.3491, over 34284.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.27, pruned_loss=0.06145, ctc_loss=0.1289, cr_loss=0.4051, over 6740529.99 frames. ], batch size: 80, lr: 4.44e-03, grad_scale: 16.0 2024-09-18 18:04:35,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=478272.6666666667, ans=0.125 2024-09-18 18:04:48,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=478319.3333333333, ans=0.1 2024-09-18 18:04:57,954 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.432e+02 2.950e+02 3.650e+02 6.753e+02, threshold=5.899e+02, percent-clipped=1.0 2024-09-18 18:05:02,256 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2024-09-18 18:05:29,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=478412.6666666667, ans=0.1 2024-09-18 18:05:32,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=478412.6666666667, ans=0.125 2024-09-18 18:05:41,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=478459.3333333333, ans=0.025 2024-09-18 18:05:51,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=478459.3333333333, ans=0.125 2024-09-18 18:05:54,629 INFO [train.py:1198] (1/2) Epoch 27, batch 1750, loss[loss=0.1927, simple_loss=0.2453, pruned_loss=0.05213, ctc_loss=0.1083, cr_loss=0.355, over 34152.00 frames. ], tot_loss[loss=0.2171, simple_loss=0.2696, pruned_loss=0.06135, ctc_loss=0.1285, cr_loss=0.4043, over 6750640.63 frames. ], batch size: 78, lr: 4.44e-03, grad_scale: 16.0 2024-09-18 18:05:59,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=15.0 2024-09-18 18:06:24,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=478552.6666666667, ans=0.125 2024-09-18 18:06:34,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=478599.3333333333, ans=0.0 2024-09-18 18:06:54,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=478646.0, ans=0.2 2024-09-18 18:07:14,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.27 vs. limit=22.5 2024-09-18 18:07:18,855 INFO [train.py:1198] (1/2) Epoch 27, batch 1800, loss[loss=0.2276, simple_loss=0.2826, pruned_loss=0.06443, ctc_loss=0.1362, cr_loss=0.4155, over 34711.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2699, pruned_loss=0.06143, ctc_loss=0.1288, cr_loss=0.4047, over 6754775.77 frames. ], batch size: 97, lr: 4.44e-03, grad_scale: 16.0 2024-09-18 18:07:25,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=478739.3333333333, ans=0.0 2024-09-18 18:07:32,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=478739.3333333333, ans=0.1 2024-09-18 18:07:47,405 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.382e+02 2.834e+02 3.493e+02 6.537e+02, threshold=5.669e+02, percent-clipped=2.0 2024-09-18 18:07:57,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=478832.6666666667, ans=0.2 2024-09-18 18:08:08,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=478879.3333333333, ans=0.0 2024-09-18 18:08:12,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=478879.3333333333, ans=0.1 2024-09-18 18:08:17,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=478879.3333333333, ans=0.125 2024-09-18 18:08:22,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=478879.3333333333, ans=0.2 2024-09-18 18:08:40,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=478972.6666666667, ans=0.025 2024-09-18 18:08:41,497 INFO [train.py:1198] (1/2) Epoch 27, batch 1850, loss[loss=0.2331, simple_loss=0.2896, pruned_loss=0.06601, ctc_loss=0.1385, cr_loss=0.4204, over 34478.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2696, pruned_loss=0.06124, ctc_loss=0.1285, cr_loss=0.4043, over 6762555.75 frames. ], batch size: 100, lr: 4.44e-03, grad_scale: 16.0 2024-09-18 18:08:41,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=478972.6666666667, ans=0.125 2024-09-18 18:08:56,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=479019.3333333333, ans=0.07 2024-09-18 18:09:04,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=479019.3333333333, ans=0.1 2024-09-18 18:09:28,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.37 vs. limit=10.0 2024-09-18 18:09:57,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2024-09-18 18:09:58,092 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:10:05,934 INFO [train.py:1198] (1/2) Epoch 27, batch 1900, loss[loss=0.2225, simple_loss=0.2794, pruned_loss=0.0612, ctc_loss=0.1305, cr_loss=0.4284, over 34396.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.27, pruned_loss=0.06128, ctc_loss=0.1285, cr_loss=0.4043, over 6771764.97 frames. ], batch size: 103, lr: 4.44e-03, grad_scale: 16.0 2024-09-18 18:10:06,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=479206.0, ans=0.1 2024-09-18 18:10:09,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=479206.0, ans=0.125 2024-09-18 18:10:12,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=479206.0, ans=0.0 2024-09-18 18:10:16,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=479206.0, ans=0.1 2024-09-18 18:10:35,754 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.101e+02 2.493e+02 2.926e+02 4.054e+02 8.511e+02, threshold=5.852e+02, percent-clipped=4.0 2024-09-18 18:10:38,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.70 vs. limit=10.0 2024-09-18 18:10:57,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=479346.0, ans=0.125 2024-09-18 18:10:58,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=479346.0, ans=0.125 2024-09-18 18:11:26,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=479392.6666666667, ans=0.125 2024-09-18 18:11:29,760 INFO [train.py:1198] (1/2) Epoch 27, batch 1950, loss[loss=0.2137, simple_loss=0.2656, pruned_loss=0.0603, ctc_loss=0.1274, cr_loss=0.39, over 34357.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2712, pruned_loss=0.06162, ctc_loss=0.1292, cr_loss=0.406, over 6788553.33 frames. ], batch size: 91, lr: 4.44e-03, grad_scale: 8.0 2024-09-18 18:11:30,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=479439.3333333333, ans=0.125 2024-09-18 18:11:30,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=479439.3333333333, ans=0.025 2024-09-18 18:11:36,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=479439.3333333333, ans=0.125 2024-09-18 18:11:55,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=479486.0, ans=0.125 2024-09-18 18:12:06,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=479532.6666666667, ans=0.2 2024-09-18 18:12:46,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.74 vs. limit=22.5 2024-09-18 18:12:47,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.19 vs. limit=15.0 2024-09-18 18:12:49,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=479626.0, ans=0.0 2024-09-18 18:12:52,297 INFO [train.py:1198] (1/2) Epoch 27, batch 2000, loss[loss=0.1944, simple_loss=0.2442, pruned_loss=0.05338, ctc_loss=0.1151, cr_loss=0.3712, over 34171.00 frames. ], tot_loss[loss=0.2182, simple_loss=0.2712, pruned_loss=0.06161, ctc_loss=0.1292, cr_loss=0.4054, over 6763386.45 frames. ], batch size: 78, lr: 4.43e-03, grad_scale: 16.0 2024-09-18 18:13:24,801 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.163e+02 2.422e+02 2.799e+02 3.355e+02 9.341e+02, threshold=5.599e+02, percent-clipped=1.0 2024-09-18 18:13:36,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=479766.0, ans=0.125 2024-09-18 18:13:36,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=479766.0, ans=0.125 2024-09-18 18:13:38,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=479766.0, ans=0.125 2024-09-18 18:14:05,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.65 vs. limit=15.0 2024-09-18 18:14:19,450 INFO [train.py:1198] (1/2) Epoch 27, batch 2050, loss[loss=0.1986, simple_loss=0.2499, pruned_loss=0.05465, ctc_loss=0.1141, cr_loss=0.3759, over 34451.00 frames. ], tot_loss[loss=0.2176, simple_loss=0.2704, pruned_loss=0.06143, ctc_loss=0.1288, cr_loss=0.4043, over 6754301.98 frames. ], batch size: 82, lr: 4.43e-03, grad_scale: 16.0 2024-09-18 18:14:25,342 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.79 vs. limit=22.5 2024-09-18 18:15:05,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=479999.3333333333, ans=0.125 2024-09-18 18:15:13,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=480046.0, ans=0.125 2024-09-18 18:15:22,456 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:15:24,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.96 vs. limit=15.0 2024-09-18 18:15:37,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=480092.6666666667, ans=0.025 2024-09-18 18:15:41,955 INFO [train.py:1198] (1/2) Epoch 27, batch 2100, loss[loss=0.2199, simple_loss=0.2717, pruned_loss=0.06247, ctc_loss=0.1322, cr_loss=0.4194, over 34540.00 frames. ], tot_loss[loss=0.2171, simple_loss=0.2699, pruned_loss=0.06122, ctc_loss=0.1285, cr_loss=0.404, over 6769754.42 frames. ], batch size: 94, lr: 4.43e-03, grad_scale: 16.0 2024-09-18 18:15:44,477 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.58 vs. limit=6.0 2024-09-18 18:15:47,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=480139.3333333333, ans=0.125 2024-09-18 18:16:08,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=480186.0, ans=0.125 2024-09-18 18:16:11,575 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.083e+02 2.379e+02 2.740e+02 3.461e+02 6.367e+02, threshold=5.479e+02, percent-clipped=3.0 2024-09-18 18:16:19,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.00 vs. limit=15.0 2024-09-18 18:16:20,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=480232.6666666667, ans=0.025 2024-09-18 18:16:26,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-09-18 18:16:35,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=480279.3333333333, ans=0.0 2024-09-18 18:16:40,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=480279.3333333333, ans=0.125 2024-09-18 18:16:46,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=480326.0, ans=0.0 2024-09-18 18:17:01,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=480326.0, ans=0.2 2024-09-18 18:17:04,479 INFO [train.py:1198] (1/2) Epoch 27, batch 2150, loss[loss=0.2162, simple_loss=0.2647, pruned_loss=0.06277, ctc_loss=0.1287, cr_loss=0.4112, over 34367.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.2693, pruned_loss=0.06101, ctc_loss=0.1281, cr_loss=0.4031, over 6788377.51 frames. ], batch size: 91, lr: 4.43e-03, grad_scale: 16.0 2024-09-18 18:17:16,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=480372.6666666667, ans=0.0 2024-09-18 18:17:18,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=480372.6666666667, ans=0.125 2024-09-18 18:17:28,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=480419.3333333333, ans=0.2 2024-09-18 18:18:30,897 INFO [train.py:1198] (1/2) Epoch 27, batch 2200, loss[loss=0.2174, simple_loss=0.2717, pruned_loss=0.06117, ctc_loss=0.1261, cr_loss=0.3907, over 34453.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2692, pruned_loss=0.06094, ctc_loss=0.1279, cr_loss=0.4029, over 6783220.22 frames. ], batch size: 100, lr: 4.43e-03, grad_scale: 16.0 2024-09-18 18:18:46,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.72 vs. limit=15.0 2024-09-18 18:18:50,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=480652.6666666667, ans=0.125 2024-09-18 18:19:00,649 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 2.554e+02 3.190e+02 3.983e+02 6.835e+02, threshold=6.379e+02, percent-clipped=4.0 2024-09-18 18:19:22,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=480746.0, ans=0.2 2024-09-18 18:19:42,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=480792.6666666667, ans=0.125 2024-09-18 18:19:53,415 INFO [train.py:1198] (1/2) Epoch 27, batch 2250, loss[loss=0.2066, simple_loss=0.2647, pruned_loss=0.05509, ctc_loss=0.1183, cr_loss=0.3676, over 34441.00 frames. ], tot_loss[loss=0.216, simple_loss=0.2688, pruned_loss=0.06075, ctc_loss=0.1275, cr_loss=0.4019, over 6780267.12 frames. ], batch size: 95, lr: 4.43e-03, grad_scale: 16.0 2024-09-18 18:19:53,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=480839.3333333333, ans=0.025 2024-09-18 18:20:00,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=480839.3333333333, ans=0.0 2024-09-18 18:20:43,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=480979.3333333333, ans=0.2 2024-09-18 18:20:43,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2024-09-18 18:21:06,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=481026.0, ans=0.125 2024-09-18 18:21:17,538 INFO [train.py:1198] (1/2) Epoch 27, batch 2300, loss[loss=0.1936, simple_loss=0.2484, pruned_loss=0.05084, ctc_loss=0.111, cr_loss=0.3715, over 34262.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2678, pruned_loss=0.06045, ctc_loss=0.1269, cr_loss=0.4003, over 6766113.44 frames. ], batch size: 83, lr: 4.43e-03, grad_scale: 16.0 2024-09-18 18:21:37,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=481119.3333333333, ans=0.0 2024-09-18 18:21:40,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=481119.3333333333, ans=0.1 2024-09-18 18:21:46,910 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.105e+02 2.445e+02 2.776e+02 3.518e+02 5.006e+02, threshold=5.552e+02, percent-clipped=0.0 2024-09-18 18:22:00,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=481166.0, ans=0.0 2024-09-18 18:22:28,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=481259.3333333333, ans=0.125 2024-09-18 18:22:30,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=481259.3333333333, ans=0.0 2024-09-18 18:22:41,458 INFO [train.py:1198] (1/2) Epoch 27, batch 2350, loss[loss=0.2328, simple_loss=0.2846, pruned_loss=0.06741, ctc_loss=0.1416, cr_loss=0.4461, over 34706.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.268, pruned_loss=0.06053, ctc_loss=0.1271, cr_loss=0.4006, over 6773824.56 frames. ], batch size: 97, lr: 4.43e-03, grad_scale: 16.0 2024-09-18 18:22:50,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.10 vs. limit=10.0 2024-09-18 18:22:54,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=481306.0, ans=0.95 2024-09-18 18:22:59,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=481352.6666666667, ans=0.1 2024-09-18 18:23:03,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=481352.6666666667, ans=0.125 2024-09-18 18:23:12,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481399.3333333333, ans=0.1 2024-09-18 18:23:16,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=481399.3333333333, ans=0.0 2024-09-18 18:23:42,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=481446.0, ans=0.125 2024-09-18 18:23:46,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.whiten.whitening_limit, batch_count=481492.6666666667, ans=12.0 2024-09-18 18:24:03,623 INFO [train.py:1198] (1/2) Epoch 27, batch 2400, loss[loss=0.1953, simple_loss=0.2508, pruned_loss=0.05148, ctc_loss=0.1116, cr_loss=0.3621, over 34599.00 frames. ], tot_loss[loss=0.2154, simple_loss=0.2682, pruned_loss=0.06059, ctc_loss=0.1273, cr_loss=0.4009, over 6777528.22 frames. ], batch size: 89, lr: 4.43e-03, grad_scale: 32.0 2024-09-18 18:24:09,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=481539.3333333333, ans=0.0 2024-09-18 18:24:34,049 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.087e+02 2.453e+02 2.842e+02 3.572e+02 7.430e+02, threshold=5.683e+02, percent-clipped=2.0 2024-09-18 18:24:34,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=481586.0, ans=0.125 2024-09-18 18:24:49,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481632.6666666667, ans=0.1 2024-09-18 18:25:06,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=481679.3333333333, ans=0.2 2024-09-18 18:25:08,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.14 vs. limit=22.5 2024-09-18 18:25:30,753 INFO [train.py:1198] (1/2) Epoch 27, batch 2450, loss[loss=0.2191, simple_loss=0.2717, pruned_loss=0.06211, ctc_loss=0.1305, cr_loss=0.4045, over 34439.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.2692, pruned_loss=0.06101, ctc_loss=0.128, cr_loss=0.4027, over 6753878.96 frames. ], batch size: 95, lr: 4.42e-03, grad_scale: 32.0 2024-09-18 18:25:42,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=481772.6666666667, ans=0.125 2024-09-18 18:25:47,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=481819.3333333333, ans=0.125 2024-09-18 18:25:47,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=481819.3333333333, ans=0.0 2024-09-18 18:25:49,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=481819.3333333333, ans=0.125 2024-09-18 18:26:12,537 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.72 vs. limit=22.5 2024-09-18 18:26:27,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=481912.6666666667, ans=0.0 2024-09-18 18:26:30,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=481912.6666666667, ans=0.5 2024-09-18 18:26:31,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.63 vs. limit=15.0 2024-09-18 18:26:40,721 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=6.68 vs. limit=15.0 2024-09-18 18:26:43,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.33 vs. limit=15.0 2024-09-18 18:26:50,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=481959.3333333333, ans=0.2 2024-09-18 18:26:53,074 INFO [train.py:1198] (1/2) Epoch 27, batch 2500, loss[loss=0.2205, simple_loss=0.2803, pruned_loss=0.05929, ctc_loss=0.1283, cr_loss=0.4136, over 34430.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2695, pruned_loss=0.06129, ctc_loss=0.1284, cr_loss=0.404, over 6765150.89 frames. ], batch size: 100, lr: 4.42e-03, grad_scale: 32.0 2024-09-18 18:26:56,864 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.545e-02 2024-09-18 18:27:08,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=482052.6666666667, ans=0.125 2024-09-18 18:27:14,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=482052.6666666667, ans=0.2 2024-09-18 18:27:14,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=482052.6666666667, ans=0.125 2024-09-18 18:27:22,469 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.929e+02 2.475e+02 2.729e+02 3.288e+02 5.857e+02, threshold=5.458e+02, percent-clipped=1.0 2024-09-18 18:27:29,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=482099.3333333333, ans=0.2 2024-09-18 18:27:46,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=482146.0, ans=0.1 2024-09-18 18:27:54,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=482146.0, ans=0.125 2024-09-18 18:27:56,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=482146.0, ans=0.125 2024-09-18 18:28:01,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=482192.6666666667, ans=0.125 2024-09-18 18:28:01,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.69 vs. limit=15.0 2024-09-18 18:28:15,839 INFO [train.py:1198] (1/2) Epoch 27, batch 2550, loss[loss=0.1843, simple_loss=0.242, pruned_loss=0.04664, ctc_loss=0.1012, cr_loss=0.328, over 34191.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2693, pruned_loss=0.06092, ctc_loss=0.1278, cr_loss=0.4029, over 6767683.83 frames. ], batch size: 78, lr: 4.42e-03, grad_scale: 32.0 2024-09-18 18:28:27,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=482239.3333333333, ans=0.0 2024-09-18 18:28:35,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=22.5 2024-09-18 18:28:42,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=482286.0, ans=0.125 2024-09-18 18:28:47,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=482286.0, ans=0.0 2024-09-18 18:28:54,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=482332.6666666667, ans=0.1 2024-09-18 18:28:56,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=482332.6666666667, ans=0.0 2024-09-18 18:28:57,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=482332.6666666667, ans=0.0 2024-09-18 18:29:42,656 INFO [train.py:1198] (1/2) Epoch 27, batch 2600, loss[loss=0.2093, simple_loss=0.2654, pruned_loss=0.05682, ctc_loss=0.1198, cr_loss=0.3905, over 34388.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2699, pruned_loss=0.06111, ctc_loss=0.1281, cr_loss=0.4035, over 6763036.87 frames. ], batch size: 91, lr: 4.42e-03, grad_scale: 32.0 2024-09-18 18:29:54,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=482472.6666666667, ans=0.1 2024-09-18 18:30:00,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=482519.3333333333, ans=0.1 2024-09-18 18:30:04,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=482519.3333333333, ans=0.125 2024-09-18 18:30:06,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=482519.3333333333, ans=0.125 2024-09-18 18:30:07,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=482519.3333333333, ans=0.1 2024-09-18 18:30:12,385 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.027e+02 2.466e+02 2.940e+02 3.944e+02 7.033e+02, threshold=5.880e+02, percent-clipped=7.0 2024-09-18 18:30:27,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=482566.0, ans=0.1 2024-09-18 18:30:31,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.98 vs. limit=22.5 2024-09-18 18:30:50,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=482659.3333333333, ans=0.125 2024-09-18 18:31:05,049 INFO [train.py:1198] (1/2) Epoch 27, batch 2650, loss[loss=0.2297, simple_loss=0.2839, pruned_loss=0.06544, ctc_loss=0.1387, cr_loss=0.4233, over 34208.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2705, pruned_loss=0.06121, ctc_loss=0.1284, cr_loss=0.4046, over 6769854.90 frames. ], batch size: 117, lr: 4.42e-03, grad_scale: 32.0 2024-09-18 18:31:15,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=482706.0, ans=0.09899494936611666 2024-09-18 18:31:18,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482706.0, ans=0.1 2024-09-18 18:31:29,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=482752.6666666667, ans=0.2 2024-09-18 18:31:45,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.60 vs. limit=15.0 2024-09-18 18:31:51,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=482799.3333333333, ans=0.125 2024-09-18 18:32:14,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482892.6666666667, ans=0.1 2024-09-18 18:32:19,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=482892.6666666667, ans=0.2 2024-09-18 18:32:24,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=482892.6666666667, ans=0.025 2024-09-18 18:32:29,195 INFO [train.py:1198] (1/2) Epoch 27, batch 2700, loss[loss=0.2207, simple_loss=0.2766, pruned_loss=0.06152, ctc_loss=0.1283, cr_loss=0.4053, over 34622.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2711, pruned_loss=0.06149, ctc_loss=0.1289, cr_loss=0.4054, over 6762863.93 frames. ], batch size: 102, lr: 4.42e-03, grad_scale: 16.0 2024-09-18 18:32:32,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=482939.3333333333, ans=0.125 2024-09-18 18:32:42,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=482939.3333333333, ans=0.2 2024-09-18 18:32:43,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.55 vs. limit=15.0 2024-09-18 18:32:52,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2024-09-18 18:33:01,947 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.572e+02 3.193e+02 3.800e+02 5.774e+02, threshold=6.386e+02, percent-clipped=0.0 2024-09-18 18:33:02,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=483032.6666666667, ans=0.025 2024-09-18 18:33:17,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=483032.6666666667, ans=0.125 2024-09-18 18:33:17,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=483032.6666666667, ans=0.125 2024-09-18 18:33:29,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=483079.3333333333, ans=0.2 2024-09-18 18:33:44,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-09-18 18:33:53,737 INFO [train.py:1198] (1/2) Epoch 27, batch 2750, loss[loss=0.2048, simple_loss=0.2601, pruned_loss=0.0554, ctc_loss=0.1176, cr_loss=0.3767, over 34657.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2694, pruned_loss=0.06086, ctc_loss=0.1278, cr_loss=0.4031, over 6760823.02 frames. ], batch size: 88, lr: 4.42e-03, grad_scale: 16.0 2024-09-18 18:34:15,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=483219.3333333333, ans=0.5 2024-09-18 18:34:20,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=483219.3333333333, ans=0.125 2024-09-18 18:34:28,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=483266.0, ans=0.125 2024-09-18 18:34:34,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=483266.0, ans=0.0 2024-09-18 18:34:49,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=483312.6666666667, ans=0.025 2024-09-18 18:35:15,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=483406.0, ans=0.2 2024-09-18 18:35:16,326 INFO [train.py:1198] (1/2) Epoch 27, batch 2800, loss[loss=0.2639, simple_loss=0.3002, pruned_loss=0.08641, ctc_loss=0.1787, cr_loss=0.4793, over 23525.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2697, pruned_loss=0.06123, ctc_loss=0.1284, cr_loss=0.4045, over 6740634.60 frames. ], batch size: 244, lr: 4.42e-03, grad_scale: 32.0 2024-09-18 18:35:49,930 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.070e+02 2.451e+02 3.018e+02 3.574e+02 5.865e+02, threshold=6.037e+02, percent-clipped=0.0 2024-09-18 18:35:58,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=483499.3333333333, ans=0.0 2024-09-18 18:35:58,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=483499.3333333333, ans=0.125 2024-09-18 18:36:00,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=483499.3333333333, ans=0.125 2024-09-18 18:36:01,139 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2024-09-18 18:36:05,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=483499.3333333333, ans=0.025 2024-09-18 18:36:05,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483499.3333333333, ans=0.1 2024-09-18 18:36:08,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=483546.0, ans=0.0 2024-09-18 18:36:29,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.08 vs. limit=15.0 2024-09-18 18:36:30,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=483592.6666666667, ans=0.125 2024-09-18 18:36:37,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=483592.6666666667, ans=0.125 2024-09-18 18:36:43,441 INFO [train.py:1198] (1/2) Epoch 27, batch 2850, loss[loss=0.214, simple_loss=0.2599, pruned_loss=0.06302, ctc_loss=0.1305, cr_loss=0.399, over 34465.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.2699, pruned_loss=0.06152, ctc_loss=0.129, cr_loss=0.4056, over 6725837.66 frames. ], batch size: 90, lr: 4.42e-03, grad_scale: 16.0 2024-09-18 18:36:48,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483639.3333333333, ans=0.1 2024-09-18 18:36:51,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=483639.3333333333, ans=0.1 2024-09-18 18:36:53,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=483639.3333333333, ans=0.125 2024-09-18 18:37:22,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.49 vs. limit=12.0 2024-09-18 18:37:36,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=483779.3333333333, ans=0.0 2024-09-18 18:37:40,710 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.54 vs. limit=15.0 2024-09-18 18:37:46,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=483779.3333333333, ans=0.1 2024-09-18 18:37:49,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=483826.0, ans=0.125 2024-09-18 18:37:53,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.40 vs. limit=15.0 2024-09-18 18:38:05,662 INFO [train.py:1198] (1/2) Epoch 27, batch 2900, loss[loss=0.2189, simple_loss=0.2698, pruned_loss=0.06218, ctc_loss=0.133, cr_loss=0.4237, over 34534.00 frames. ], tot_loss[loss=0.2183, simple_loss=0.2709, pruned_loss=0.06177, ctc_loss=0.1295, cr_loss=0.4077, over 6756333.42 frames. ], batch size: 94, lr: 4.41e-03, grad_scale: 16.0 2024-09-18 18:38:06,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.82 vs. limit=15.0 2024-09-18 18:38:25,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=483919.3333333333, ans=0.2 2024-09-18 18:38:38,369 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.915e+02 2.486e+02 2.956e+02 3.770e+02 6.360e+02, threshold=5.912e+02, percent-clipped=1.0 2024-09-18 18:38:52,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.57 vs. limit=15.0 2024-09-18 18:39:26,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=484106.0, ans=0.0 2024-09-18 18:39:28,203 INFO [train.py:1198] (1/2) Epoch 27, batch 2950, loss[loss=0.211, simple_loss=0.2617, pruned_loss=0.06008, ctc_loss=0.1223, cr_loss=0.3948, over 34639.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2693, pruned_loss=0.06114, ctc_loss=0.1283, cr_loss=0.4051, over 6750515.11 frames. ], batch size: 88, lr: 4.41e-03, grad_scale: 16.0 2024-09-18 18:39:30,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=484106.0, ans=0.1 2024-09-18 18:40:18,964 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 18:40:50,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.22 vs. limit=15.0 2024-09-18 18:40:54,611 INFO [train.py:1198] (1/2) Epoch 27, batch 3000, loss[loss=0.2118, simple_loss=0.263, pruned_loss=0.0599, ctc_loss=0.1249, cr_loss=0.3951, over 34554.00 frames. ], tot_loss[loss=0.2166, simple_loss=0.2692, pruned_loss=0.06106, ctc_loss=0.1282, cr_loss=0.4045, over 6751372.38 frames. ], batch size: 94, lr: 4.41e-03, grad_scale: 16.0 2024-09-18 18:40:54,611 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 18:41:01,510 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6519, 3.9723, 3.9944, 4.1577], device='cuda:1') 2024-09-18 18:41:11,561 INFO [train.py:1230] (1/2) Epoch 27, validation: loss=0.1483, simple_loss=0.2438, pruned_loss=0.02243, ctc_loss=0.03986, cr_loss=1.989e-14, over 944034.00 frames. 2024-09-18 18:41:11,562 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 18:41:17,661 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.86 vs. limit=12.0 2024-09-18 18:41:22,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484339.3333333333, ans=0.1 2024-09-18 18:41:24,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=484339.3333333333, ans=0.1 2024-09-18 18:41:45,151 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.416e+02 2.822e+02 3.609e+02 6.938e+02, threshold=5.644e+02, percent-clipped=4.0 2024-09-18 18:41:47,692 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.63 vs. limit=15.0 2024-09-18 18:41:51,371 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.30 vs. limit=12.0 2024-09-18 18:42:11,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=484479.3333333333, ans=0.2 2024-09-18 18:42:15,278 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-09-18 18:42:20,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.76 vs. limit=15.0 2024-09-18 18:42:24,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=484526.0, ans=0.0 2024-09-18 18:42:27,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=484526.0, ans=0.0 2024-09-18 18:42:28,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=484526.0, ans=0.125 2024-09-18 18:42:33,943 INFO [train.py:1198] (1/2) Epoch 27, batch 3050, loss[loss=0.2197, simple_loss=0.2669, pruned_loss=0.06434, ctc_loss=0.1363, cr_loss=0.4122, over 34597.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2704, pruned_loss=0.0615, ctc_loss=0.1291, cr_loss=0.4063, over 6745027.00 frames. ], batch size: 89, lr: 4.41e-03, grad_scale: 16.0 2024-09-18 18:42:34,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=484572.6666666667, ans=0.125 2024-09-18 18:43:28,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=15.0 2024-09-18 18:43:39,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=484759.3333333333, ans=0.0 2024-09-18 18:43:54,874 INFO [train.py:1198] (1/2) Epoch 27, batch 3100, loss[loss=0.2293, simple_loss=0.2884, pruned_loss=0.0636, ctc_loss=0.132, cr_loss=0.4149, over 34236.00 frames. ], tot_loss[loss=0.2176, simple_loss=0.2702, pruned_loss=0.06146, ctc_loss=0.1289, cr_loss=0.4061, over 6744218.97 frames. ], batch size: 117, lr: 4.41e-03, grad_scale: 16.0 2024-09-18 18:44:11,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=484852.6666666667, ans=0.0 2024-09-18 18:44:27,152 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.092e+02 2.464e+02 2.739e+02 3.519e+02 7.522e+02, threshold=5.479e+02, percent-clipped=3.0 2024-09-18 18:44:45,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=484946.0, ans=0.035 2024-09-18 18:44:48,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=484946.0, ans=0.125 2024-09-18 18:44:52,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=484946.0, ans=0.125 2024-09-18 18:44:58,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=484992.6666666667, ans=0.2 2024-09-18 18:45:13,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=484992.6666666667, ans=0.07 2024-09-18 18:45:14,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.12 vs. limit=22.5 2024-09-18 18:45:18,972 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2024-09-18 18:45:19,581 INFO [train.py:1198] (1/2) Epoch 27, batch 3150, loss[loss=0.2289, simple_loss=0.283, pruned_loss=0.06565, ctc_loss=0.1387, cr_loss=0.3949, over 33818.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2705, pruned_loss=0.06157, ctc_loss=0.1292, cr_loss=0.4064, over 6750714.04 frames. ], batch size: 122, lr: 4.41e-03, grad_scale: 16.0 2024-09-18 18:45:50,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=485132.6666666667, ans=0.125 2024-09-18 18:46:06,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=485179.3333333333, ans=0.2 2024-09-18 18:46:22,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=485226.0, ans=0.125 2024-09-18 18:46:40,316 INFO [train.py:1198] (1/2) Epoch 27, batch 3200, loss[loss=0.2066, simple_loss=0.2657, pruned_loss=0.05429, ctc_loss=0.116, cr_loss=0.3903, over 34532.00 frames. ], tot_loss[loss=0.2171, simple_loss=0.2698, pruned_loss=0.06122, ctc_loss=0.1285, cr_loss=0.4046, over 6761669.61 frames. ], batch size: 94, lr: 4.41e-03, grad_scale: 32.0 2024-09-18 18:46:50,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=485272.6666666667, ans=0.0 2024-09-18 18:47:00,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=485319.3333333333, ans=0.5 2024-09-18 18:47:16,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=485319.3333333333, ans=0.125 2024-09-18 18:47:19,409 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.527e+02 2.907e+02 3.454e+02 5.110e+02, threshold=5.813e+02, percent-clipped=0.0 2024-09-18 18:47:23,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=485366.0, ans=0.125 2024-09-18 18:47:28,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=485366.0, ans=0.025 2024-09-18 18:47:38,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.24 vs. limit=12.0 2024-09-18 18:48:08,207 INFO [train.py:1198] (1/2) Epoch 27, batch 3250, loss[loss=0.2278, simple_loss=0.279, pruned_loss=0.06597, ctc_loss=0.137, cr_loss=0.4332, over 34643.00 frames. ], tot_loss[loss=0.2178, simple_loss=0.2704, pruned_loss=0.06151, ctc_loss=0.1291, cr_loss=0.4063, over 6770462.85 frames. ], batch size: 98, lr: 4.41e-03, grad_scale: 32.0 2024-09-18 18:48:25,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.50 vs. limit=10.0 2024-09-18 18:48:38,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=485599.3333333333, ans=0.1 2024-09-18 18:48:43,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=485599.3333333333, ans=0.025 2024-09-18 18:49:03,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.46 vs. limit=22.5 2024-09-18 18:49:22,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.71 vs. limit=15.0 2024-09-18 18:49:23,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=485692.6666666667, ans=0.125 2024-09-18 18:49:28,496 INFO [train.py:1198] (1/2) Epoch 27, batch 3300, loss[loss=0.2149, simple_loss=0.2759, pruned_loss=0.05693, ctc_loss=0.1233, cr_loss=0.3858, over 33199.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2694, pruned_loss=0.0611, ctc_loss=0.1283, cr_loss=0.4039, over 6769461.12 frames. ], batch size: 130, lr: 4.41e-03, grad_scale: 32.0 2024-09-18 18:49:28,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=485739.3333333333, ans=0.025 2024-09-18 18:49:31,261 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.37 vs. limit=6.0 2024-09-18 18:49:37,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.85 vs. limit=12.0 2024-09-18 18:49:42,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=485739.3333333333, ans=0.125 2024-09-18 18:49:51,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.40 vs. limit=22.5 2024-09-18 18:50:01,270 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.451e+02 2.793e+02 3.415e+02 5.227e+02, threshold=5.586e+02, percent-clipped=0.0 2024-09-18 18:50:06,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=485832.6666666667, ans=0.1 2024-09-18 18:50:12,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2024-09-18 18:50:15,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=485879.3333333333, ans=0.125 2024-09-18 18:50:20,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=485879.3333333333, ans=0.2 2024-09-18 18:50:26,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=485879.3333333333, ans=0.0 2024-09-18 18:50:27,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2024-09-18 18:50:51,893 INFO [train.py:1198] (1/2) Epoch 27, batch 3350, loss[loss=0.2318, simple_loss=0.2887, pruned_loss=0.06488, ctc_loss=0.1376, cr_loss=0.4395, over 33831.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2699, pruned_loss=0.06144, ctc_loss=0.1289, cr_loss=0.405, over 6742904.11 frames. ], batch size: 122, lr: 4.41e-03, grad_scale: 32.0 2024-09-18 18:51:10,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=486019.3333333333, ans=0.125 2024-09-18 18:51:43,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=486112.6666666667, ans=0.125 2024-09-18 18:52:12,573 INFO [train.py:1198] (1/2) Epoch 27, batch 3400, loss[loss=0.1913, simple_loss=0.2417, pruned_loss=0.05193, ctc_loss=0.1113, cr_loss=0.3719, over 34155.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2699, pruned_loss=0.06142, ctc_loss=0.1288, cr_loss=0.4048, over 6732838.80 frames. ], batch size: 78, lr: 4.40e-03, grad_scale: 32.0 2024-09-18 18:52:16,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=486206.0, ans=0.0 2024-09-18 18:52:24,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=486206.0, ans=0.125 2024-09-18 18:52:27,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=486252.6666666667, ans=0.0 2024-09-18 18:52:28,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=486252.6666666667, ans=0.125 2024-09-18 18:52:35,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=486252.6666666667, ans=0.0 2024-09-18 18:52:37,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=486252.6666666667, ans=0.09899494936611666 2024-09-18 18:52:38,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=486252.6666666667, ans=0.125 2024-09-18 18:52:43,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=486299.3333333333, ans=0.04949747468305833 2024-09-18 18:52:44,632 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.979e+02 2.423e+02 2.911e+02 3.677e+02 5.627e+02, threshold=5.822e+02, percent-clipped=1.0 2024-09-18 18:52:57,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=486299.3333333333, ans=0.1 2024-09-18 18:53:01,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=12.0 2024-09-18 18:53:28,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.85 vs. limit=22.5 2024-09-18 18:53:32,782 INFO [train.py:1198] (1/2) Epoch 27, batch 3450, loss[loss=0.2338, simple_loss=0.2881, pruned_loss=0.0669, ctc_loss=0.1421, cr_loss=0.4309, over 33026.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.27, pruned_loss=0.06141, ctc_loss=0.1286, cr_loss=0.4042, over 6745795.86 frames. ], batch size: 130, lr: 4.40e-03, grad_scale: 32.0 2024-09-18 18:53:33,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=486439.3333333333, ans=0.125 2024-09-18 18:54:11,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=486532.6666666667, ans=0.0 2024-09-18 18:54:21,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=486579.3333333333, ans=0.1 2024-09-18 18:54:32,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=486579.3333333333, ans=0.2 2024-09-18 18:54:52,903 INFO [train.py:1198] (1/2) Epoch 27, batch 3500, loss[loss=0.1975, simple_loss=0.2521, pruned_loss=0.05289, ctc_loss=0.1137, cr_loss=0.3608, over 34466.00 frames. ], tot_loss[loss=0.2172, simple_loss=0.2696, pruned_loss=0.06142, ctc_loss=0.1286, cr_loss=0.404, over 6747087.79 frames. ], batch size: 85, lr: 4.40e-03, grad_scale: 32.0 2024-09-18 18:54:59,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=486672.6666666667, ans=0.125 2024-09-18 18:55:04,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=486672.6666666667, ans=0.1 2024-09-18 18:55:27,057 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.413e+02 2.657e+02 3.261e+02 5.726e+02, threshold=5.314e+02, percent-clipped=0.0 2024-09-18 18:55:52,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.62 vs. limit=15.0 2024-09-18 18:55:54,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=486812.6666666667, ans=0.125 2024-09-18 18:56:15,386 INFO [train.py:1198] (1/2) Epoch 27, batch 3550, loss[loss=0.2371, simple_loss=0.2915, pruned_loss=0.06807, ctc_loss=0.1427, cr_loss=0.4512, over 34394.00 frames. ], tot_loss[loss=0.2171, simple_loss=0.2696, pruned_loss=0.06135, ctc_loss=0.1286, cr_loss=0.4043, over 6756080.35 frames. ], batch size: 103, lr: 4.40e-03, grad_scale: 16.0 2024-09-18 18:56:16,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=8.27 vs. limit=15.0 2024-09-18 18:56:34,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=486952.6666666667, ans=0.1 2024-09-18 18:56:36,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=486952.6666666667, ans=0.0 2024-09-18 18:57:08,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=487046.0, ans=0.0 2024-09-18 18:57:12,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=487046.0, ans=0.1 2024-09-18 18:57:17,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=487046.0, ans=0.125 2024-09-18 18:57:19,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=487092.6666666667, ans=0.025 2024-09-18 18:57:34,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=487139.3333333333, ans=0.0 2024-09-18 18:57:35,737 INFO [train.py:1198] (1/2) Epoch 27, batch 3600, loss[loss=0.197, simple_loss=0.2514, pruned_loss=0.05262, ctc_loss=0.1124, cr_loss=0.3723, over 34472.00 frames. ], tot_loss[loss=0.2173, simple_loss=0.2698, pruned_loss=0.06145, ctc_loss=0.1288, cr_loss=0.4049, over 6764856.73 frames. ], batch size: 90, lr: 4.40e-03, grad_scale: 32.0 2024-09-18 18:58:00,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.02 vs. limit=15.0 2024-09-18 18:58:06,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=487232.6666666667, ans=0.0 2024-09-18 18:58:09,718 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.549e+02 2.966e+02 3.612e+02 6.006e+02, threshold=5.932e+02, percent-clipped=3.0 2024-09-18 18:58:39,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=487326.0, ans=0.035 2024-09-18 18:58:47,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=487326.0, ans=0.0 2024-09-18 18:58:47,703 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.17 vs. limit=22.5 2024-09-18 18:58:56,381 INFO [train.py:1198] (1/2) Epoch 27, batch 3650, loss[loss=0.2316, simple_loss=0.2824, pruned_loss=0.06777, ctc_loss=0.1403, cr_loss=0.4304, over 34472.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.269, pruned_loss=0.06113, ctc_loss=0.1281, cr_loss=0.4035, over 6768387.59 frames. ], batch size: 110, lr: 4.40e-03, grad_scale: 32.0 2024-09-18 18:59:22,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=487419.3333333333, ans=0.125 2024-09-18 18:59:28,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=487466.0, ans=0.1 2024-09-18 18:59:35,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2024-09-18 18:59:40,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=487466.0, ans=0.0 2024-09-18 18:59:52,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=487512.6666666667, ans=0.0 2024-09-18 19:00:10,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=487559.3333333333, ans=0.125 2024-09-18 19:00:18,197 INFO [train.py:1198] (1/2) Epoch 27, batch 3700, loss[loss=0.2121, simple_loss=0.271, pruned_loss=0.05687, ctc_loss=0.1215, cr_loss=0.3794, over 34650.00 frames. ], tot_loss[loss=0.2158, simple_loss=0.2688, pruned_loss=0.06061, ctc_loss=0.1273, cr_loss=0.4017, over 6784027.10 frames. ], batch size: 102, lr: 4.40e-03, grad_scale: 32.0 2024-09-18 19:00:18,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=487606.0, ans=0.125 2024-09-18 19:00:26,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=487606.0, ans=0.0 2024-09-18 19:00:41,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=487652.6666666667, ans=0.1 2024-09-18 19:00:48,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.38 vs. limit=15.0 2024-09-18 19:00:52,177 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.026e+02 2.441e+02 2.904e+02 4.270e+02 9.166e+02, threshold=5.807e+02, percent-clipped=9.0 2024-09-18 19:00:55,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=487699.3333333333, ans=0.125 2024-09-18 19:01:02,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=487699.3333333333, ans=0.1 2024-09-18 19:01:13,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=487746.0, ans=0.125 2024-09-18 19:01:39,119 INFO [train.py:1198] (1/2) Epoch 27, batch 3750, loss[loss=0.2322, simple_loss=0.284, pruned_loss=0.0681, ctc_loss=0.138, cr_loss=0.418, over 34362.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2721, pruned_loss=0.0621, ctc_loss=0.13, cr_loss=0.4081, over 6785831.41 frames. ], batch size: 113, lr: 4.40e-03, grad_scale: 32.0 2024-09-18 19:01:53,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.62 vs. limit=15.0 2024-09-18 19:02:02,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=487886.0, ans=0.125 2024-09-18 19:02:02,738 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:02:11,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=487932.6666666667, ans=0.0 2024-09-18 19:02:25,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=487932.6666666667, ans=0.125 2024-09-18 19:02:38,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=487979.3333333333, ans=0.0 2024-09-18 19:02:38,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=487979.3333333333, ans=0.025 2024-09-18 19:03:00,860 INFO [train.py:1198] (1/2) Epoch 27, batch 3800, loss[loss=0.2464, simple_loss=0.2897, pruned_loss=0.07704, ctc_loss=0.1582, cr_loss=0.4361, over 29765.00 frames. ], tot_loss[loss=0.2227, simple_loss=0.2749, pruned_loss=0.06364, ctc_loss=0.1329, cr_loss=0.4141, over 6674250.13 frames. ], batch size: 175, lr: 4.40e-03, grad_scale: 32.0 2024-09-18 19:03:07,894 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:03:19,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=488119.3333333333, ans=0.1 2024-09-18 19:03:36,237 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.153e+02 2.364e+02 2.535e+02 2.880e+02 8.086e+02, threshold=5.070e+02, percent-clipped=1.0 2024-09-18 19:03:50,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=488212.6666666667, ans=0.2 2024-09-18 19:03:57,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=488212.6666666667, ans=0.0 2024-09-18 19:03:57,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=488212.6666666667, ans=0.125 2024-09-18 19:04:22,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=488259.3333333333, ans=0.0 2024-09-18 19:04:24,909 INFO [train.py:1198] (1/2) Epoch 27, batch 3850, loss[loss=0.245, simple_loss=0.2868, pruned_loss=0.07704, ctc_loss=0.1596, cr_loss=0.4303, over 23760.00 frames. ], tot_loss[loss=0.2262, simple_loss=0.2772, pruned_loss=0.06554, ctc_loss=0.137, cr_loss=0.4174, over 6249891.79 frames. ], batch size: 244, lr: 4.39e-03, grad_scale: 32.0 2024-09-18 19:04:47,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=13.33 vs. limit=15.0 2024-09-18 19:05:50,932 INFO [train.py:1198] (1/2) Epoch 28, batch 0, loss[loss=0.1823, simple_loss=0.241, pruned_loss=0.04553, ctc_loss=0.09657, cr_loss=0.331, over 34481.00 frames. ], tot_loss[loss=0.1823, simple_loss=0.241, pruned_loss=0.04553, ctc_loss=0.09657, cr_loss=0.331, over 34481.00 frames. ], batch size: 85, lr: 4.31e-03, grad_scale: 32.0 2024-09-18 19:05:50,933 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 19:06:07,798 INFO [train.py:1230] (1/2) Epoch 28, validation: loss=0.1487, simple_loss=0.2451, pruned_loss=0.02213, ctc_loss=0.03974, cr_loss=2.192e-14, over 944034.00 frames. 2024-09-18 19:06:07,799 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 19:06:33,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=488474.0, ans=0.1 2024-09-18 19:06:37,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=15.0 2024-09-18 19:06:51,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.84 vs. limit=22.5 2024-09-18 19:07:02,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=488567.3333333333, ans=0.125 2024-09-18 19:07:24,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=488614.0, ans=0.0 2024-09-18 19:07:25,858 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.649e+02 2.800e+02 3.212e+02 7.113e+02, threshold=5.600e+02, percent-clipped=5.0 2024-09-18 19:07:34,295 INFO [train.py:1198] (1/2) Epoch 28, batch 50, loss[loss=0.1923, simple_loss=0.2424, pruned_loss=0.05293, ctc_loss=0.111, cr_loss=0.3519, over 34522.00 frames. ], tot_loss[loss=0.2189, simple_loss=0.2711, pruned_loss=0.06213, ctc_loss=0.1302, cr_loss=0.4073, over 1479620.74 frames. ], batch size: 82, lr: 4.31e-03, grad_scale: 32.0 2024-09-18 19:07:52,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=488707.3333333333, ans=0.07 2024-09-18 19:08:02,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=488707.3333333333, ans=0.125 2024-09-18 19:08:27,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=488800.6666666667, ans=0.125 2024-09-18 19:08:32,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=488800.6666666667, ans=0.5 2024-09-18 19:08:33,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=488800.6666666667, ans=0.125 2024-09-18 19:08:44,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=488847.3333333333, ans=0.0 2024-09-18 19:08:56,810 INFO [train.py:1198] (1/2) Epoch 28, batch 100, loss[loss=0.2064, simple_loss=0.2587, pruned_loss=0.05692, ctc_loss=0.1239, cr_loss=0.3854, over 34585.00 frames. ], tot_loss[loss=0.2204, simple_loss=0.2731, pruned_loss=0.06253, ctc_loss=0.131, cr_loss=0.409, over 2629973.89 frames. ], batch size: 89, lr: 4.31e-03, grad_scale: 32.0 2024-09-18 19:09:00,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=488894.0, ans=0.125 2024-09-18 19:10:00,067 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2024-09-18 19:10:12,293 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.427e+02 2.702e+02 3.171e+02 6.172e+02, threshold=5.404e+02, percent-clipped=2.0 2024-09-18 19:10:20,483 INFO [train.py:1198] (1/2) Epoch 28, batch 150, loss[loss=0.1954, simple_loss=0.2482, pruned_loss=0.05283, ctc_loss=0.1107, cr_loss=0.3711, over 34482.00 frames. ], tot_loss[loss=0.2175, simple_loss=0.2703, pruned_loss=0.06135, ctc_loss=0.1286, cr_loss=0.4052, over 3557541.78 frames. ], batch size: 82, lr: 4.31e-03, grad_scale: 32.0 2024-09-18 19:10:41,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=489174.0, ans=0.1 2024-09-18 19:10:41,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=489174.0, ans=0.0 2024-09-18 19:10:49,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=489174.0, ans=0.0 2024-09-18 19:10:52,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.13 vs. limit=15.0 2024-09-18 19:10:56,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=489220.6666666667, ans=0.1 2024-09-18 19:11:16,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=489267.3333333333, ans=0.0 2024-09-18 19:11:29,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=489314.0, ans=0.2 2024-09-18 19:11:33,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=489314.0, ans=0.0 2024-09-18 19:11:45,854 INFO [train.py:1198] (1/2) Epoch 28, batch 200, loss[loss=0.237, simple_loss=0.2903, pruned_loss=0.06904, ctc_loss=0.141, cr_loss=0.4328, over 31851.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2698, pruned_loss=0.06119, ctc_loss=0.1282, cr_loss=0.4043, over 4271265.03 frames. ], batch size: 145, lr: 4.31e-03, grad_scale: 32.0 2024-09-18 19:11:47,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=489360.6666666667, ans=0.125 2024-09-18 19:11:54,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=489360.6666666667, ans=0.125 2024-09-18 19:12:03,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.23 vs. limit=15.0 2024-09-18 19:12:22,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=489454.0, ans=0.0 2024-09-18 19:12:37,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=489500.6666666667, ans=0.0 2024-09-18 19:12:54,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.59 vs. limit=12.0 2024-09-18 19:12:59,941 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.539e+02 3.305e+02 4.726e+02 8.782e+02, threshold=6.610e+02, percent-clipped=18.0 2024-09-18 19:13:08,338 INFO [train.py:1198] (1/2) Epoch 28, batch 250, loss[loss=0.2497, simple_loss=0.3002, pruned_loss=0.07458, ctc_loss=0.1539, cr_loss=0.4811, over 34249.00 frames. ], tot_loss[loss=0.2166, simple_loss=0.2696, pruned_loss=0.06091, ctc_loss=0.1279, cr_loss=0.4043, over 4833804.23 frames. ], batch size: 117, lr: 4.31e-03, grad_scale: 32.0 2024-09-18 19:13:10,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=489594.0, ans=0.1 2024-09-18 19:13:13,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=489594.0, ans=0.0 2024-09-18 19:13:16,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=489594.0, ans=0.125 2024-09-18 19:13:23,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=489640.6666666667, ans=10.0 2024-09-18 19:13:32,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=489640.6666666667, ans=0.125 2024-09-18 19:13:43,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=489687.3333333333, ans=0.125 2024-09-18 19:13:46,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=489687.3333333333, ans=0.2 2024-09-18 19:13:55,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.39 vs. limit=22.5 2024-09-18 19:14:32,670 INFO [train.py:1198] (1/2) Epoch 28, batch 300, loss[loss=0.2436, simple_loss=0.2957, pruned_loss=0.07196, ctc_loss=0.1496, cr_loss=0.4402, over 34366.00 frames. ], tot_loss[loss=0.2163, simple_loss=0.2692, pruned_loss=0.0609, ctc_loss=0.1278, cr_loss=0.4036, over 5261504.80 frames. ], batch size: 107, lr: 4.31e-03, grad_scale: 32.0 2024-09-18 19:14:34,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=489827.3333333333, ans=0.125 2024-09-18 19:14:46,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=489827.3333333333, ans=0.1 2024-09-18 19:14:47,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=489874.0, ans=0.125 2024-09-18 19:14:52,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.23 vs. limit=10.0 2024-09-18 19:14:57,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=489874.0, ans=0.2 2024-09-18 19:15:10,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=15.0 2024-09-18 19:15:32,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=489967.3333333333, ans=0.0 2024-09-18 19:15:34,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=489967.3333333333, ans=0.1 2024-09-18 19:15:49,434 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.171e+02 2.460e+02 2.786e+02 3.523e+02 6.223e+02, threshold=5.572e+02, percent-clipped=0.0 2024-09-18 19:15:57,660 INFO [train.py:1198] (1/2) Epoch 28, batch 350, loss[loss=0.1856, simple_loss=0.2417, pruned_loss=0.04736, ctc_loss=0.1032, cr_loss=0.3542, over 34297.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2698, pruned_loss=0.06112, ctc_loss=0.1284, cr_loss=0.404, over 5596923.87 frames. ], batch size: 83, lr: 4.31e-03, grad_scale: 32.0 2024-09-18 19:16:09,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=490060.6666666667, ans=0.0 2024-09-18 19:17:02,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=490247.3333333333, ans=0.125 2024-09-18 19:17:02,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=490247.3333333333, ans=0.0 2024-09-18 19:17:10,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=490247.3333333333, ans=0.125 2024-09-18 19:17:10,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=490247.3333333333, ans=0.2 2024-09-18 19:17:21,779 INFO [train.py:1198] (1/2) Epoch 28, batch 400, loss[loss=0.2195, simple_loss=0.2753, pruned_loss=0.06041, ctc_loss=0.1297, cr_loss=0.4207, over 34400.00 frames. ], tot_loss[loss=0.2162, simple_loss=0.2693, pruned_loss=0.06076, ctc_loss=0.1276, cr_loss=0.4026, over 5863587.05 frames. ], batch size: 95, lr: 4.31e-03, grad_scale: 32.0 2024-09-18 19:17:32,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=490294.0, ans=0.125 2024-09-18 19:17:45,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=490340.6666666667, ans=0.125 2024-09-18 19:18:30,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.29 vs. limit=15.0 2024-09-18 19:18:31,878 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:18:36,384 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.418e+02 2.918e+02 3.909e+02 9.124e+02, threshold=5.835e+02, percent-clipped=6.0 2024-09-18 19:18:44,564 INFO [train.py:1198] (1/2) Epoch 28, batch 450, loss[loss=0.2268, simple_loss=0.279, pruned_loss=0.06571, ctc_loss=0.135, cr_loss=0.4069, over 34682.00 frames. ], tot_loss[loss=0.2167, simple_loss=0.2697, pruned_loss=0.06101, ctc_loss=0.1279, cr_loss=0.4034, over 6050960.55 frames. ], batch size: 97, lr: 4.30e-03, grad_scale: 32.0 2024-09-18 19:19:10,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=490574.0, ans=0.125 2024-09-18 19:19:26,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=490620.6666666667, ans=0.1 2024-09-18 19:19:43,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=490667.3333333333, ans=0.1 2024-09-18 19:20:04,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=490714.0, ans=0.0 2024-09-18 19:20:07,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=490760.6666666667, ans=0.125 2024-09-18 19:20:09,216 INFO [train.py:1198] (1/2) Epoch 28, batch 500, loss[loss=0.2349, simple_loss=0.2881, pruned_loss=0.06818, ctc_loss=0.1407, cr_loss=0.4305, over 34443.00 frames. ], tot_loss[loss=0.2163, simple_loss=0.2691, pruned_loss=0.06087, ctc_loss=0.1276, cr_loss=0.4027, over 6218261.93 frames. ], batch size: 110, lr: 4.30e-03, grad_scale: 32.0 2024-09-18 19:20:30,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=490807.3333333333, ans=0.125 2024-09-18 19:20:57,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=490900.6666666667, ans=0.125 2024-09-18 19:20:58,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=490900.6666666667, ans=0.1 2024-09-18 19:21:21,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.68 vs. limit=22.5 2024-09-18 19:21:24,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=490947.3333333333, ans=0.0 2024-09-18 19:21:25,811 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.429e+02 2.911e+02 3.781e+02 6.475e+02, threshold=5.822e+02, percent-clipped=2.0 2024-09-18 19:21:34,036 INFO [train.py:1198] (1/2) Epoch 28, batch 550, loss[loss=0.2232, simple_loss=0.2804, pruned_loss=0.06148, ctc_loss=0.1324, cr_loss=0.4113, over 33848.00 frames. ], tot_loss[loss=0.2163, simple_loss=0.2692, pruned_loss=0.06088, ctc_loss=0.1277, cr_loss=0.4024, over 6327865.86 frames. ], batch size: 122, lr: 4.30e-03, grad_scale: 32.0 2024-09-18 19:21:37,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=490994.0, ans=0.2 2024-09-18 19:21:57,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=491040.6666666667, ans=0.125 2024-09-18 19:22:20,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=491087.3333333333, ans=0.125 2024-09-18 19:22:20,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=491087.3333333333, ans=0.0 2024-09-18 19:22:47,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=491180.6666666667, ans=0.125 2024-09-18 19:22:54,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2024-09-18 19:22:59,194 INFO [train.py:1198] (1/2) Epoch 28, batch 600, loss[loss=0.2264, simple_loss=0.2813, pruned_loss=0.0644, ctc_loss=0.1315, cr_loss=0.4108, over 34207.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2696, pruned_loss=0.06083, ctc_loss=0.1276, cr_loss=0.4029, over 6430169.02 frames. ], batch size: 117, lr: 4.30e-03, grad_scale: 32.0 2024-09-18 19:23:20,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=491274.0, ans=0.0 2024-09-18 19:23:36,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=491320.6666666667, ans=0.125 2024-09-18 19:23:40,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=491320.6666666667, ans=0.125 2024-09-18 19:23:48,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=491367.3333333333, ans=0.025 2024-09-18 19:23:54,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=491367.3333333333, ans=0.125 2024-09-18 19:24:04,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491414.0, ans=0.1 2024-09-18 19:24:12,471 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.066e+02 2.405e+02 2.940e+02 3.990e+02 7.424e+02, threshold=5.881e+02, percent-clipped=3.0 2024-09-18 19:24:12,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=491414.0, ans=0.125 2024-09-18 19:24:12,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=491414.0, ans=0.125 2024-09-18 19:24:20,616 INFO [train.py:1198] (1/2) Epoch 28, batch 650, loss[loss=0.2212, simple_loss=0.2812, pruned_loss=0.06015, ctc_loss=0.1262, cr_loss=0.3918, over 34546.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.2687, pruned_loss=0.06034, ctc_loss=0.1267, cr_loss=0.4003, over 6521210.02 frames. ], batch size: 94, lr: 4.30e-03, grad_scale: 32.0 2024-09-18 19:24:21,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=15.0 2024-09-18 19:24:25,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=491460.6666666667, ans=0.125 2024-09-18 19:24:33,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491460.6666666667, ans=0.1 2024-09-18 19:24:42,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=491507.3333333333, ans=10.0 2024-09-18 19:25:05,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=491554.0, ans=0.09899494936611666 2024-09-18 19:25:13,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=491600.6666666667, ans=0.0 2024-09-18 19:25:18,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=491600.6666666667, ans=0.5 2024-09-18 19:25:32,013 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.518e-03 2024-09-18 19:25:42,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.92 vs. limit=6.0 2024-09-18 19:25:43,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=491694.0, ans=0.125 2024-09-18 19:25:44,799 INFO [train.py:1198] (1/2) Epoch 28, batch 700, loss[loss=0.2115, simple_loss=0.2596, pruned_loss=0.06087, ctc_loss=0.1278, cr_loss=0.4005, over 34561.00 frames. ], tot_loss[loss=0.2155, simple_loss=0.2689, pruned_loss=0.06033, ctc_loss=0.1268, cr_loss=0.4011, over 6578650.79 frames. ], batch size: 89, lr: 4.30e-03, grad_scale: 32.0 2024-09-18 19:25:48,732 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.64 vs. limit=15.0 2024-09-18 19:25:50,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=6.65 vs. limit=12.0 2024-09-18 19:25:58,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=491694.0, ans=0.125 2024-09-18 19:26:00,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.32 vs. limit=15.0 2024-09-18 19:26:03,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=491740.6666666667, ans=0.95 2024-09-18 19:26:11,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=491740.6666666667, ans=0.0 2024-09-18 19:26:19,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=491787.3333333333, ans=0.125 2024-09-18 19:26:29,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=491787.3333333333, ans=0.1 2024-09-18 19:26:51,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=491880.6666666667, ans=0.025 2024-09-18 19:26:57,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2024-09-18 19:27:01,603 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.177e+02 2.509e+02 3.061e+02 3.547e+02 5.809e+02, threshold=6.123e+02, percent-clipped=0.0 2024-09-18 19:27:09,614 INFO [train.py:1198] (1/2) Epoch 28, batch 750, loss[loss=0.2151, simple_loss=0.2704, pruned_loss=0.05938, ctc_loss=0.1253, cr_loss=0.4004, over 34407.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2684, pruned_loss=0.06016, ctc_loss=0.1266, cr_loss=0.4011, over 6620762.96 frames. ], batch size: 95, lr: 4.30e-03, grad_scale: 32.0 2024-09-18 19:27:21,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=491927.3333333333, ans=0.1 2024-09-18 19:27:33,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.67 vs. limit=22.5 2024-09-18 19:27:40,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=492020.6666666667, ans=0.125 2024-09-18 19:28:25,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=492114.0, ans=0.0 2024-09-18 19:28:31,372 INFO [train.py:1198] (1/2) Epoch 28, batch 800, loss[loss=0.1886, simple_loss=0.2416, pruned_loss=0.0493, ctc_loss=0.1101, cr_loss=0.3749, over 34477.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2683, pruned_loss=0.06012, ctc_loss=0.1266, cr_loss=0.4013, over 6657264.27 frames. ], batch size: 85, lr: 4.30e-03, grad_scale: 32.0 2024-09-18 19:28:31,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=492160.6666666667, ans=0.125 2024-09-18 19:28:38,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=492160.6666666667, ans=0.125 2024-09-18 19:28:42,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.51 vs. limit=6.0 2024-09-18 19:29:03,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.04 vs. limit=15.0 2024-09-18 19:29:22,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=492300.6666666667, ans=0.125 2024-09-18 19:29:23,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-09-18 19:29:47,386 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.491e+02 2.954e+02 3.532e+02 5.575e+02, threshold=5.908e+02, percent-clipped=0.0 2024-09-18 19:29:55,519 INFO [train.py:1198] (1/2) Epoch 28, batch 850, loss[loss=0.2192, simple_loss=0.2805, pruned_loss=0.05872, ctc_loss=0.1228, cr_loss=0.3953, over 34395.00 frames. ], tot_loss[loss=0.2143, simple_loss=0.268, pruned_loss=0.05976, ctc_loss=0.1258, cr_loss=0.3998, over 6691904.45 frames. ], batch size: 103, lr: 4.30e-03, grad_scale: 32.0 2024-09-18 19:30:05,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=492394.0, ans=0.125 2024-09-18 19:30:13,763 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:30:25,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=492440.6666666667, ans=0.05 2024-09-18 19:30:49,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=15.0 2024-09-18 19:31:09,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=492580.6666666667, ans=0.125 2024-09-18 19:31:09,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.24 vs. limit=10.0 2024-09-18 19:31:10,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=492580.6666666667, ans=0.025 2024-09-18 19:31:14,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=492580.6666666667, ans=0.125 2024-09-18 19:31:14,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492580.6666666667, ans=0.1 2024-09-18 19:31:20,393 INFO [train.py:1198] (1/2) Epoch 28, batch 900, loss[loss=0.1878, simple_loss=0.2417, pruned_loss=0.04908, ctc_loss=0.1083, cr_loss=0.3496, over 34482.00 frames. ], tot_loss[loss=0.2143, simple_loss=0.268, pruned_loss=0.05976, ctc_loss=0.1258, cr_loss=0.3996, over 6697062.55 frames. ], batch size: 85, lr: 4.30e-03, grad_scale: 16.0 2024-09-18 19:31:20,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=492627.3333333333, ans=0.0 2024-09-18 19:31:37,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=492674.0, ans=0.125 2024-09-18 19:31:42,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=492674.0, ans=0.09899494936611666 2024-09-18 19:31:52,783 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.41 vs. limit=6.0 2024-09-18 19:31:53,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=492720.6666666667, ans=0.125 2024-09-18 19:32:38,100 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.149e+02 2.643e+02 3.037e+02 3.887e+02 6.540e+02, threshold=6.074e+02, percent-clipped=3.0 2024-09-18 19:32:44,669 INFO [train.py:1198] (1/2) Epoch 28, batch 950, loss[loss=0.1828, simple_loss=0.2353, pruned_loss=0.04804, ctc_loss=0.1037, cr_loss=0.3378, over 34715.00 frames. ], tot_loss[loss=0.2147, simple_loss=0.2681, pruned_loss=0.06, ctc_loss=0.1263, cr_loss=0.4001, over 6701105.38 frames. ], batch size: 87, lr: 4.29e-03, grad_scale: 16.0 2024-09-18 19:32:58,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.30 vs. limit=15.0 2024-09-18 19:33:06,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=8.24 vs. limit=15.0 2024-09-18 19:33:13,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=492907.3333333333, ans=0.09899494936611666 2024-09-18 19:33:31,399 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.132e-02 2024-09-18 19:33:52,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=493047.3333333333, ans=0.125 2024-09-18 19:34:06,764 INFO [train.py:1198] (1/2) Epoch 28, batch 1000, loss[loss=0.1992, simple_loss=0.2497, pruned_loss=0.05522, ctc_loss=0.1164, cr_loss=0.3745, over 34508.00 frames. ], tot_loss[loss=0.2155, simple_loss=0.269, pruned_loss=0.06034, ctc_loss=0.1268, cr_loss=0.401, over 6695553.78 frames. ], batch size: 90, lr: 4.29e-03, grad_scale: 16.0 2024-09-18 19:34:08,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=493094.0, ans=0.125 2024-09-18 19:34:10,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=493094.0, ans=0.1 2024-09-18 19:34:11,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=493094.0, ans=0.125 2024-09-18 19:34:59,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.24 vs. limit=10.0 2024-09-18 19:34:59,681 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.06 vs. limit=22.5 2024-09-18 19:35:13,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=493280.6666666667, ans=0.0 2024-09-18 19:35:20,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=493280.6666666667, ans=0.125 2024-09-18 19:35:25,117 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.050e+02 2.565e+02 3.043e+02 4.244e+02 5.970e+02, threshold=6.087e+02, percent-clipped=0.0 2024-09-18 19:35:25,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=493280.6666666667, ans=0.1 2024-09-18 19:35:31,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2024-09-18 19:35:31,729 INFO [train.py:1198] (1/2) Epoch 28, batch 1050, loss[loss=0.2167, simple_loss=0.2744, pruned_loss=0.05901, ctc_loss=0.1252, cr_loss=0.3992, over 34577.00 frames. ], tot_loss[loss=0.2151, simple_loss=0.2683, pruned_loss=0.06026, ctc_loss=0.1268, cr_loss=0.401, over 6705434.68 frames. ], batch size: 99, lr: 4.29e-03, grad_scale: 16.0 2024-09-18 19:35:36,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=493327.3333333333, ans=0.1 2024-09-18 19:35:45,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=493327.3333333333, ans=0.2 2024-09-18 19:36:02,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=493374.0, ans=0.125 2024-09-18 19:36:05,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=493420.6666666667, ans=0.0 2024-09-18 19:36:23,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=493467.3333333333, ans=0.07 2024-09-18 19:36:28,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=493467.3333333333, ans=0.125 2024-09-18 19:36:41,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=493514.0, ans=0.0 2024-09-18 19:36:55,925 INFO [train.py:1198] (1/2) Epoch 28, batch 1100, loss[loss=0.2095, simple_loss=0.26, pruned_loss=0.05874, ctc_loss=0.1249, cr_loss=0.4102, over 34366.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2685, pruned_loss=0.06013, ctc_loss=0.1266, cr_loss=0.4008, over 6717867.33 frames. ], batch size: 91, lr: 4.29e-03, grad_scale: 16.0 2024-09-18 19:37:01,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=493560.6666666667, ans=0.125 2024-09-18 19:37:04,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=493560.6666666667, ans=0.2 2024-09-18 19:37:08,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.69 vs. limit=6.0 2024-09-18 19:37:11,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=493607.3333333333, ans=0.125 2024-09-18 19:37:19,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=493607.3333333333, ans=0.1 2024-09-18 19:37:51,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=493700.6666666667, ans=0.0 2024-09-18 19:37:52,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=493700.6666666667, ans=0.125 2024-09-18 19:38:02,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=493747.3333333333, ans=0.0 2024-09-18 19:38:07,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.08 vs. limit=15.0 2024-09-18 19:38:14,597 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.399e+02 2.642e+02 3.446e+02 6.488e+02, threshold=5.283e+02, percent-clipped=1.0 2024-09-18 19:38:15,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=493747.3333333333, ans=0.0 2024-09-18 19:38:21,329 INFO [train.py:1198] (1/2) Epoch 28, batch 1150, loss[loss=0.2048, simple_loss=0.2615, pruned_loss=0.05476, ctc_loss=0.1166, cr_loss=0.38, over 34364.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2683, pruned_loss=0.06018, ctc_loss=0.1268, cr_loss=0.4011, over 6717006.34 frames. ], batch size: 91, lr: 4.29e-03, grad_scale: 16.0 2024-09-18 19:38:25,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=493794.0, ans=0.125 2024-09-18 19:38:58,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=493887.3333333333, ans=0.125 2024-09-18 19:39:02,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=22.5 2024-09-18 19:39:07,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=493887.3333333333, ans=0.125 2024-09-18 19:39:31,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=493980.6666666667, ans=0.2 2024-09-18 19:39:31,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=493980.6666666667, ans=0.125 2024-09-18 19:39:46,354 INFO [train.py:1198] (1/2) Epoch 28, batch 1200, loss[loss=0.2223, simple_loss=0.2776, pruned_loss=0.06211, ctc_loss=0.1301, cr_loss=0.4196, over 34565.00 frames. ], tot_loss[loss=0.2154, simple_loss=0.2689, pruned_loss=0.06026, ctc_loss=0.1271, cr_loss=0.4019, over 6710679.07 frames. ], batch size: 99, lr: 4.29e-03, grad_scale: 32.0 2024-09-18 19:39:49,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=494027.3333333333, ans=0.2 2024-09-18 19:40:06,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=494074.0, ans=0.1 2024-09-18 19:40:26,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=494120.6666666667, ans=0.1 2024-09-18 19:40:37,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=494167.3333333333, ans=0.125 2024-09-18 19:40:43,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.84 vs. limit=15.0 2024-09-18 19:40:59,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=494214.0, ans=0.125 2024-09-18 19:41:02,299 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.090e+02 2.456e+02 2.764e+02 3.477e+02 6.005e+02, threshold=5.528e+02, percent-clipped=2.0 2024-09-18 19:41:08,805 INFO [train.py:1198] (1/2) Epoch 28, batch 1250, loss[loss=0.2357, simple_loss=0.2873, pruned_loss=0.06937, ctc_loss=0.1397, cr_loss=0.4366, over 34357.00 frames. ], tot_loss[loss=0.2156, simple_loss=0.2692, pruned_loss=0.06029, ctc_loss=0.127, cr_loss=0.4024, over 6743758.05 frames. ], batch size: 107, lr: 4.29e-03, grad_scale: 32.0 2024-09-18 19:41:12,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=494260.6666666667, ans=0.0 2024-09-18 19:41:19,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=8.11 vs. limit=15.0 2024-09-18 19:41:24,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=494307.3333333333, ans=0.04949747468305833 2024-09-18 19:41:28,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=494307.3333333333, ans=0.2 2024-09-18 19:41:37,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.73 vs. limit=22.5 2024-09-18 19:41:40,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=494354.0, ans=0.0 2024-09-18 19:41:43,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=494354.0, ans=0.1 2024-09-18 19:41:50,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=494354.0, ans=0.0 2024-09-18 19:42:05,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=494400.6666666667, ans=0.125 2024-09-18 19:42:31,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=494494.0, ans=0.0 2024-09-18 19:42:33,091 INFO [train.py:1198] (1/2) Epoch 28, batch 1300, loss[loss=0.2197, simple_loss=0.2772, pruned_loss=0.06018, ctc_loss=0.1295, cr_loss=0.4014, over 33118.00 frames. ], tot_loss[loss=0.2148, simple_loss=0.2683, pruned_loss=0.05999, ctc_loss=0.1265, cr_loss=0.4013, over 6748530.06 frames. ], batch size: 130, lr: 4.29e-03, grad_scale: 32.0 2024-09-18 19:42:38,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=494494.0, ans=0.125 2024-09-18 19:42:48,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=494540.6666666667, ans=0.0 2024-09-18 19:42:51,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=494540.6666666667, ans=0.0 2024-09-18 19:42:51,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.62 vs. limit=15.0 2024-09-18 19:42:55,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.39 vs. limit=12.0 2024-09-18 19:42:57,182 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.40 vs. limit=15.0 2024-09-18 19:43:50,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=494680.6666666667, ans=0.0 2024-09-18 19:43:51,360 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.974e+02 2.417e+02 2.834e+02 3.755e+02 6.369e+02, threshold=5.668e+02, percent-clipped=3.0 2024-09-18 19:43:51,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=494680.6666666667, ans=0.025 2024-09-18 19:43:57,961 INFO [train.py:1198] (1/2) Epoch 28, batch 1350, loss[loss=0.2008, simple_loss=0.2568, pruned_loss=0.05344, ctc_loss=0.1131, cr_loss=0.38, over 34558.00 frames. ], tot_loss[loss=0.2147, simple_loss=0.2683, pruned_loss=0.05992, ctc_loss=0.1263, cr_loss=0.4007, over 6770173.69 frames. ], batch size: 94, lr: 4.29e-03, grad_scale: 32.0 2024-09-18 19:44:01,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=494727.3333333333, ans=0.0 2024-09-18 19:44:01,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=494727.3333333333, ans=0.09899494936611666 2024-09-18 19:44:04,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=494727.3333333333, ans=0.125 2024-09-18 19:44:11,750 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.15 vs. limit=22.5 2024-09-18 19:44:42,425 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=22.5 2024-09-18 19:45:12,710 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.97 vs. limit=15.0 2024-09-18 19:45:15,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=494914.0, ans=0.0 2024-09-18 19:45:19,971 INFO [train.py:1198] (1/2) Epoch 28, batch 1400, loss[loss=0.1869, simple_loss=0.2406, pruned_loss=0.04903, ctc_loss=0.106, cr_loss=0.3482, over 34268.00 frames. ], tot_loss[loss=0.2143, simple_loss=0.2678, pruned_loss=0.05977, ctc_loss=0.126, cr_loss=0.4006, over 6779900.87 frames. ], batch size: 80, lr: 4.29e-03, grad_scale: 32.0 2024-09-18 19:45:27,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-09-18 19:45:42,278 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.89 vs. limit=15.0 2024-09-18 19:45:45,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=495007.3333333333, ans=0.025 2024-09-18 19:46:06,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=495054.0, ans=0.035 2024-09-18 19:46:07,120 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:46:15,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=495100.6666666667, ans=0.0 2024-09-18 19:46:36,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=495147.3333333333, ans=0.2 2024-09-18 19:46:39,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.072e+02 2.457e+02 2.860e+02 3.430e+02 5.187e+02, threshold=5.720e+02, percent-clipped=0.0 2024-09-18 19:46:44,374 INFO [train.py:1198] (1/2) Epoch 28, batch 1450, loss[loss=0.2296, simple_loss=0.2779, pruned_loss=0.06767, ctc_loss=0.1383, cr_loss=0.4572, over 34465.00 frames. ], tot_loss[loss=0.2149, simple_loss=0.2686, pruned_loss=0.05994, ctc_loss=0.1264, cr_loss=0.4019, over 6775006.06 frames. ], batch size: 110, lr: 4.28e-03, grad_scale: 16.0 2024-09-18 19:46:48,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=22.5 2024-09-18 19:47:07,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=495240.6666666667, ans=0.125 2024-09-18 19:47:21,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=495287.3333333333, ans=0.125 2024-09-18 19:47:27,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=495287.3333333333, ans=0.025 2024-09-18 19:47:39,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495334.0, ans=0.1 2024-09-18 19:47:52,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.82 vs. limit=15.0 2024-09-18 19:48:06,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=495427.3333333333, ans=0.05 2024-09-18 19:48:06,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=495427.3333333333, ans=0.125 2024-09-18 19:48:08,285 INFO [train.py:1198] (1/2) Epoch 28, batch 1500, loss[loss=0.2343, simple_loss=0.2848, pruned_loss=0.06897, ctc_loss=0.1414, cr_loss=0.4375, over 34443.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.2689, pruned_loss=0.06007, ctc_loss=0.1267, cr_loss=0.4024, over 6775971.31 frames. ], batch size: 100, lr: 4.28e-03, grad_scale: 16.0 2024-09-18 19:48:19,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.66 vs. limit=22.5 2024-09-18 19:48:20,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=495427.3333333333, ans=0.0 2024-09-18 19:48:38,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=495474.0, ans=0.0 2024-09-18 19:48:58,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=495567.3333333333, ans=0.125 2024-09-18 19:48:58,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=495567.3333333333, ans=0.125 2024-09-18 19:49:08,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=495567.3333333333, ans=0.125 2024-09-18 19:49:14,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=495614.0, ans=0.125 2024-09-18 19:49:26,703 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.042e+02 2.467e+02 2.715e+02 3.668e+02 1.325e+03, threshold=5.431e+02, percent-clipped=3.0 2024-09-18 19:49:33,619 INFO [train.py:1198] (1/2) Epoch 28, batch 1550, loss[loss=0.2209, simple_loss=0.2815, pruned_loss=0.05939, ctc_loss=0.1251, cr_loss=0.411, over 34446.00 frames. ], tot_loss[loss=0.2156, simple_loss=0.269, pruned_loss=0.06034, ctc_loss=0.1272, cr_loss=0.4031, over 6746608.08 frames. ], batch size: 105, lr: 4.28e-03, grad_scale: 16.0 2024-09-18 19:49:44,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=15.0 2024-09-18 19:49:56,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495707.3333333333, ans=0.1 2024-09-18 19:50:48,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.92 vs. limit=22.5 2024-09-18 19:50:52,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.48 vs. limit=22.5 2024-09-18 19:50:55,447 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.27 vs. limit=15.0 2024-09-18 19:50:57,697 INFO [train.py:1198] (1/2) Epoch 28, batch 1600, loss[loss=0.2313, simple_loss=0.2852, pruned_loss=0.06643, ctc_loss=0.1367, cr_loss=0.4285, over 34573.00 frames. ], tot_loss[loss=0.2156, simple_loss=0.2687, pruned_loss=0.0604, ctc_loss=0.1273, cr_loss=0.4025, over 6724405.21 frames. ], batch size: 99, lr: 4.28e-03, grad_scale: 32.0 2024-09-18 19:51:14,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=495940.6666666667, ans=0.0 2024-09-18 19:51:14,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495940.6666666667, ans=0.1 2024-09-18 19:51:19,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=495940.6666666667, ans=0.125 2024-09-18 19:51:24,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=495940.6666666667, ans=0.025 2024-09-18 19:51:27,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=495940.6666666667, ans=0.0 2024-09-18 19:51:36,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.94 vs. limit=15.0 2024-09-18 19:51:50,162 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2024-09-18 19:52:04,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-09-18 19:52:06,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496080.6666666667, ans=0.1 2024-09-18 19:52:15,772 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.111e+02 2.452e+02 2.810e+02 3.509e+02 6.601e+02, threshold=5.620e+02, percent-clipped=2.0 2024-09-18 19:52:19,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=496127.3333333333, ans=0.2 2024-09-18 19:52:20,562 INFO [train.py:1198] (1/2) Epoch 28, batch 1650, loss[loss=0.2255, simple_loss=0.2851, pruned_loss=0.06153, ctc_loss=0.131, cr_loss=0.417, over 34374.00 frames. ], tot_loss[loss=0.2155, simple_loss=0.2686, pruned_loss=0.06038, ctc_loss=0.1272, cr_loss=0.4026, over 6717539.37 frames. ], batch size: 103, lr: 4.28e-03, grad_scale: 32.0 2024-09-18 19:52:20,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=496127.3333333333, ans=0.125 2024-09-18 19:52:27,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=496127.3333333333, ans=0.125 2024-09-18 19:53:25,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=496267.3333333333, ans=0.2 2024-09-18 19:53:41,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=496314.0, ans=0.125 2024-09-18 19:53:44,435 INFO [train.py:1198] (1/2) Epoch 28, batch 1700, loss[loss=0.1853, simple_loss=0.2385, pruned_loss=0.04866, ctc_loss=0.1043, cr_loss=0.3502, over 34319.00 frames. ], tot_loss[loss=0.2148, simple_loss=0.2682, pruned_loss=0.06, ctc_loss=0.1265, cr_loss=0.401, over 6743729.79 frames. ], batch size: 80, lr: 4.28e-03, grad_scale: 32.0 2024-09-18 19:54:08,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=496407.3333333333, ans=0.0 2024-09-18 19:54:34,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=496500.6666666667, ans=0.2 2024-09-18 19:55:03,735 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.407e+02 2.795e+02 3.615e+02 7.772e+02, threshold=5.590e+02, percent-clipped=4.0 2024-09-18 19:55:08,651 INFO [train.py:1198] (1/2) Epoch 28, batch 1750, loss[loss=0.1941, simple_loss=0.2423, pruned_loss=0.05365, ctc_loss=0.1158, cr_loss=0.3868, over 34181.00 frames. ], tot_loss[loss=0.2146, simple_loss=0.268, pruned_loss=0.05997, ctc_loss=0.1264, cr_loss=0.4008, over 6751027.12 frames. ], batch size: 78, lr: 4.28e-03, grad_scale: 32.0 2024-09-18 19:55:15,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=496594.0, ans=0.0 2024-09-18 19:55:31,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=496640.6666666667, ans=0.125 2024-09-18 19:55:44,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=496687.3333333333, ans=0.125 2024-09-18 19:56:30,481 INFO [train.py:1198] (1/2) Epoch 28, batch 1800, loss[loss=0.2311, simple_loss=0.2887, pruned_loss=0.06516, ctc_loss=0.1338, cr_loss=0.4097, over 34703.00 frames. ], tot_loss[loss=0.2147, simple_loss=0.2682, pruned_loss=0.05993, ctc_loss=0.1264, cr_loss=0.4005, over 6754116.14 frames. ], batch size: 97, lr: 4.28e-03, grad_scale: 32.0 2024-09-18 19:56:35,835 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 19:56:35,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=496827.3333333333, ans=0.125 2024-09-18 19:56:51,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.55 vs. limit=12.0 2024-09-18 19:57:03,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=496920.6666666667, ans=0.125 2024-09-18 19:57:21,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=15.0 2024-09-18 19:57:24,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=496967.3333333333, ans=0.1 2024-09-18 19:57:30,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=496967.3333333333, ans=0.125 2024-09-18 19:57:42,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=497014.0, ans=0.0 2024-09-18 19:57:50,009 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.125e+02 2.591e+02 3.179e+02 4.410e+02 7.801e+02, threshold=6.359e+02, percent-clipped=12.0 2024-09-18 19:57:55,034 INFO [train.py:1198] (1/2) Epoch 28, batch 1850, loss[loss=0.2222, simple_loss=0.2784, pruned_loss=0.0625, ctc_loss=0.1268, cr_loss=0.3917, over 34460.00 frames. ], tot_loss[loss=0.2143, simple_loss=0.2678, pruned_loss=0.05979, ctc_loss=0.1262, cr_loss=0.4003, over 6762074.79 frames. ], batch size: 100, lr: 4.28e-03, grad_scale: 32.0 2024-09-18 19:57:55,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=497060.6666666667, ans=0.0 2024-09-18 19:58:30,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=497154.0, ans=0.2 2024-09-18 19:58:46,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=497200.6666666667, ans=0.125 2024-09-18 19:58:58,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=497200.6666666667, ans=0.025 2024-09-18 19:59:13,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=497247.3333333333, ans=0.125 2024-09-18 19:59:13,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=497247.3333333333, ans=0.125 2024-09-18 19:59:19,149 INFO [train.py:1198] (1/2) Epoch 28, batch 1900, loss[loss=0.2089, simple_loss=0.2693, pruned_loss=0.05508, ctc_loss=0.1178, cr_loss=0.3679, over 34396.00 frames. ], tot_loss[loss=0.2151, simple_loss=0.2687, pruned_loss=0.0601, ctc_loss=0.1267, cr_loss=0.4011, over 6771609.67 frames. ], batch size: 103, lr: 4.28e-03, grad_scale: 32.0 2024-09-18 19:59:35,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=497340.6666666667, ans=0.0 2024-09-18 20:00:02,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=497387.3333333333, ans=0.125 2024-09-18 20:00:05,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=497387.3333333333, ans=0.0 2024-09-18 20:00:15,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=497434.0, ans=0.0 2024-09-18 20:00:17,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=497434.0, ans=0.0 2024-09-18 20:00:37,091 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.153e+02 2.467e+02 2.972e+02 3.760e+02 5.959e+02, threshold=5.945e+02, percent-clipped=0.0 2024-09-18 20:00:42,093 INFO [train.py:1198] (1/2) Epoch 28, batch 1950, loss[loss=0.2035, simple_loss=0.2587, pruned_loss=0.05587, ctc_loss=0.1125, cr_loss=0.355, over 34369.00 frames. ], tot_loss[loss=0.2159, simple_loss=0.2698, pruned_loss=0.06029, ctc_loss=0.1271, cr_loss=0.4023, over 6789163.78 frames. ], batch size: 91, lr: 4.27e-03, grad_scale: 32.0 2024-09-18 20:01:28,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=497620.6666666667, ans=0.125 2024-09-18 20:01:28,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=497620.6666666667, ans=0.0 2024-09-18 20:01:36,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=497667.3333333333, ans=0.125 2024-09-18 20:01:44,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=497667.3333333333, ans=0.0 2024-09-18 20:01:58,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=497714.0, ans=0.0 2024-09-18 20:02:05,921 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.23 vs. limit=10.0 2024-09-18 20:02:08,047 INFO [train.py:1198] (1/2) Epoch 28, batch 2000, loss[loss=0.1954, simple_loss=0.2433, pruned_loss=0.05448, ctc_loss=0.115, cr_loss=0.3881, over 34142.00 frames. ], tot_loss[loss=0.2163, simple_loss=0.2701, pruned_loss=0.06049, ctc_loss=0.1275, cr_loss=0.4032, over 6765179.34 frames. ], batch size: 78, lr: 4.27e-03, grad_scale: 32.0 2024-09-18 20:02:40,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=497854.0, ans=0.125 2024-09-18 20:02:51,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=497854.0, ans=0.0 2024-09-18 20:03:03,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=497900.6666666667, ans=0.125 2024-09-18 20:03:25,950 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.462e+02 3.048e+02 3.610e+02 5.700e+02, threshold=6.096e+02, percent-clipped=0.0 2024-09-18 20:03:30,917 INFO [train.py:1198] (1/2) Epoch 28, batch 2050, loss[loss=0.1881, simple_loss=0.242, pruned_loss=0.0492, ctc_loss=0.1064, cr_loss=0.3655, over 34479.00 frames. ], tot_loss[loss=0.2157, simple_loss=0.2692, pruned_loss=0.06035, ctc_loss=0.1272, cr_loss=0.4029, over 6756933.32 frames. ], batch size: 82, lr: 4.27e-03, grad_scale: 32.0 2024-09-18 20:03:32,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=497994.0, ans=0.125 2024-09-18 20:03:45,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.91 vs. limit=15.0 2024-09-18 20:03:51,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=498040.6666666667, ans=0.2 2024-09-18 20:03:53,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.07 vs. limit=15.0 2024-09-18 20:03:54,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=498040.6666666667, ans=0.125 2024-09-18 20:04:19,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=498134.0, ans=0.1 2024-09-18 20:04:25,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=498134.0, ans=0.0 2024-09-18 20:04:38,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=498180.6666666667, ans=0.2 2024-09-18 20:04:49,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=498180.6666666667, ans=0.0 2024-09-18 20:04:55,408 INFO [train.py:1198] (1/2) Epoch 28, batch 2100, loss[loss=0.2076, simple_loss=0.2645, pruned_loss=0.05574, ctc_loss=0.1175, cr_loss=0.3948, over 34548.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2685, pruned_loss=0.06003, ctc_loss=0.1266, cr_loss=0.4017, over 6770283.23 frames. ], batch size: 94, lr: 4.27e-03, grad_scale: 32.0 2024-09-18 20:05:31,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=498320.6666666667, ans=0.035 2024-09-18 20:05:33,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=498320.6666666667, ans=0.125 2024-09-18 20:06:04,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.47 vs. limit=15.0 2024-09-18 20:06:07,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=498414.0, ans=0.0 2024-09-18 20:06:15,125 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.015e+02 2.414e+02 2.872e+02 3.347e+02 6.855e+02, threshold=5.745e+02, percent-clipped=2.0 2024-09-18 20:06:17,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=498414.0, ans=0.1 2024-09-18 20:06:20,180 INFO [train.py:1198] (1/2) Epoch 28, batch 2150, loss[loss=0.2076, simple_loss=0.2607, pruned_loss=0.05747, ctc_loss=0.1193, cr_loss=0.3917, over 34361.00 frames. ], tot_loss[loss=0.2143, simple_loss=0.2678, pruned_loss=0.05981, ctc_loss=0.1261, cr_loss=0.4006, over 6788277.78 frames. ], batch size: 91, lr: 4.27e-03, grad_scale: 32.0 2024-09-18 20:06:23,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=498460.6666666667, ans=0.125 2024-09-18 20:06:27,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=498460.6666666667, ans=0.1 2024-09-18 20:06:46,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=498507.3333333333, ans=0.125 2024-09-18 20:07:13,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=498600.6666666667, ans=0.0 2024-09-18 20:07:42,416 INFO [train.py:1198] (1/2) Epoch 28, batch 2200, loss[loss=0.2209, simple_loss=0.2792, pruned_loss=0.06077, ctc_loss=0.1255, cr_loss=0.3984, over 34438.00 frames. ], tot_loss[loss=0.2146, simple_loss=0.2681, pruned_loss=0.05989, ctc_loss=0.1262, cr_loss=0.4004, over 6783024.51 frames. ], batch size: 100, lr: 4.27e-03, grad_scale: 32.0 2024-09-18 20:08:00,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=498740.6666666667, ans=0.0 2024-09-18 20:08:02,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=498740.6666666667, ans=0.0 2024-09-18 20:08:10,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=498740.6666666667, ans=0.2 2024-09-18 20:08:11,142 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.57 vs. limit=15.0 2024-09-18 20:08:35,576 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2024-09-18 20:08:54,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=498880.6666666667, ans=0.125 2024-09-18 20:08:56,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2024-09-18 20:09:01,721 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.966e+02 2.506e+02 3.133e+02 4.307e+02 6.025e+02, threshold=6.265e+02, percent-clipped=5.0 2024-09-18 20:09:05,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=498927.3333333333, ans=0.125 2024-09-18 20:09:06,781 INFO [train.py:1198] (1/2) Epoch 28, batch 2250, loss[loss=0.2106, simple_loss=0.2671, pruned_loss=0.05672, ctc_loss=0.1229, cr_loss=0.4027, over 34421.00 frames. ], tot_loss[loss=0.2144, simple_loss=0.2679, pruned_loss=0.05985, ctc_loss=0.1261, cr_loss=0.4001, over 6778124.45 frames. ], batch size: 95, lr: 4.27e-03, grad_scale: 32.0 2024-09-18 20:09:31,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=498974.0, ans=0.0 2024-09-18 20:09:45,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=499020.6666666667, ans=0.125 2024-09-18 20:09:57,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.92 vs. limit=6.0 2024-09-18 20:10:26,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=499114.0, ans=0.125 2024-09-18 20:10:31,163 INFO [train.py:1198] (1/2) Epoch 28, batch 2300, loss[loss=0.1849, simple_loss=0.2441, pruned_loss=0.04578, ctc_loss=0.1019, cr_loss=0.3424, over 34247.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.2671, pruned_loss=0.05956, ctc_loss=0.1255, cr_loss=0.3987, over 6764665.68 frames. ], batch size: 83, lr: 4.27e-03, grad_scale: 32.0 2024-09-18 20:11:15,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=499254.0, ans=0.125 2024-09-18 20:11:27,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=499300.6666666667, ans=0.125 2024-09-18 20:11:31,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2024-09-18 20:11:37,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=499347.3333333333, ans=0.125 2024-09-18 20:11:48,293 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=19.27 vs. limit=22.5 2024-09-18 20:11:48,905 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.853e+02 2.484e+02 2.892e+02 3.602e+02 6.035e+02, threshold=5.785e+02, percent-clipped=0.0 2024-09-18 20:11:53,902 INFO [train.py:1198] (1/2) Epoch 28, batch 2350, loss[loss=0.222, simple_loss=0.2746, pruned_loss=0.06353, ctc_loss=0.1313, cr_loss=0.4039, over 34704.00 frames. ], tot_loss[loss=0.2143, simple_loss=0.2676, pruned_loss=0.05988, ctc_loss=0.1261, cr_loss=0.4003, over 6772805.20 frames. ], batch size: 97, lr: 4.27e-03, grad_scale: 16.0 2024-09-18 20:11:54,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=499394.0, ans=0.0 2024-09-18 20:11:59,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=499394.0, ans=0.125 2024-09-18 20:12:14,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499440.6666666667, ans=0.1 2024-09-18 20:12:21,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=499440.6666666667, ans=15.0 2024-09-18 20:12:39,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=499487.3333333333, ans=0.0 2024-09-18 20:12:39,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=499487.3333333333, ans=0.04949747468305833 2024-09-18 20:13:05,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=499580.6666666667, ans=0.1 2024-09-18 20:13:05,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=499580.6666666667, ans=0.125 2024-09-18 20:13:20,487 INFO [train.py:1198] (1/2) Epoch 28, batch 2400, loss[loss=0.2106, simple_loss=0.2618, pruned_loss=0.05906, ctc_loss=0.125, cr_loss=0.4083, over 34584.00 frames. ], tot_loss[loss=0.2149, simple_loss=0.2682, pruned_loss=0.06012, ctc_loss=0.1265, cr_loss=0.401, over 6777061.18 frames. ], batch size: 89, lr: 4.27e-03, grad_scale: 32.0 2024-09-18 20:13:20,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=499627.3333333333, ans=0.1 2024-09-18 20:13:28,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=499627.3333333333, ans=0.125 2024-09-18 20:13:33,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=499627.3333333333, ans=0.125 2024-09-18 20:13:43,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=499674.0, ans=0.125 2024-09-18 20:13:55,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=499720.6666666667, ans=0.0 2024-09-18 20:14:00,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=499720.6666666667, ans=0.125 2024-09-18 20:14:11,463 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.63 vs. limit=15.0 2024-09-18 20:14:39,991 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.554e+02 3.117e+02 4.031e+02 7.007e+02, threshold=6.235e+02, percent-clipped=5.0 2024-09-18 20:14:43,314 INFO [train.py:1198] (1/2) Epoch 28, batch 2450, loss[loss=0.2161, simple_loss=0.2708, pruned_loss=0.06007, ctc_loss=0.1268, cr_loss=0.3966, over 34433.00 frames. ], tot_loss[loss=0.2155, simple_loss=0.2689, pruned_loss=0.06031, ctc_loss=0.127, cr_loss=0.4019, over 6751155.34 frames. ], batch size: 95, lr: 4.26e-03, grad_scale: 32.0 2024-09-18 20:14:43,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=499860.6666666667, ans=0.0 2024-09-18 20:15:19,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=499954.0, ans=0.1 2024-09-18 20:15:34,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=500000.6666666667, ans=0.125 2024-09-18 20:15:56,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=500047.3333333333, ans=0.09899494936611666 2024-09-18 20:16:05,892 INFO [train.py:1198] (1/2) Epoch 28, batch 2500, loss[loss=0.2307, simple_loss=0.2882, pruned_loss=0.0639, ctc_loss=0.1394, cr_loss=0.4379, over 34464.00 frames. ], tot_loss[loss=0.216, simple_loss=0.2691, pruned_loss=0.06061, ctc_loss=0.1274, cr_loss=0.4032, over 6763679.61 frames. ], batch size: 100, lr: 4.26e-03, grad_scale: 32.0 2024-09-18 20:16:21,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=500094.0, ans=0.125 2024-09-18 20:17:30,304 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.398e+02 2.706e+02 3.397e+02 6.558e+02, threshold=5.412e+02, percent-clipped=1.0 2024-09-18 20:17:33,693 INFO [train.py:1198] (1/2) Epoch 28, batch 2550, loss[loss=0.1848, simple_loss=0.237, pruned_loss=0.04886, ctc_loss=0.1057, cr_loss=0.3462, over 34169.00 frames. ], tot_loss[loss=0.2156, simple_loss=0.2687, pruned_loss=0.06048, ctc_loss=0.1272, cr_loss=0.4025, over 6766773.55 frames. ], batch size: 78, lr: 4.26e-03, grad_scale: 32.0 2024-09-18 20:17:36,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-09-18 20:17:48,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=500374.0, ans=0.05 2024-09-18 20:18:08,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=500420.6666666667, ans=0.125 2024-09-18 20:18:34,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.13 vs. limit=15.0 2024-09-18 20:18:39,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=500514.0, ans=0.015 2024-09-18 20:18:56,259 INFO [train.py:1198] (1/2) Epoch 28, batch 2600, loss[loss=0.2133, simple_loss=0.2665, pruned_loss=0.05904, ctc_loss=0.1265, cr_loss=0.4148, over 34372.00 frames. ], tot_loss[loss=0.2159, simple_loss=0.269, pruned_loss=0.06054, ctc_loss=0.1273, cr_loss=0.4029, over 6762925.35 frames. ], batch size: 91, lr: 4.26e-03, grad_scale: 32.0 2024-09-18 20:19:20,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=500607.3333333333, ans=0.0 2024-09-18 20:19:57,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=500700.6666666667, ans=0.2 2024-09-18 20:20:16,943 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.147e+02 2.464e+02 2.889e+02 3.729e+02 6.258e+02, threshold=5.778e+02, percent-clipped=3.0 2024-09-18 20:20:19,614 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.25 vs. limit=10.0 2024-09-18 20:20:20,189 INFO [train.py:1198] (1/2) Epoch 28, batch 2650, loss[loss=0.2241, simple_loss=0.2806, pruned_loss=0.06219, ctc_loss=0.1303, cr_loss=0.4303, over 34182.00 frames. ], tot_loss[loss=0.2161, simple_loss=0.2695, pruned_loss=0.06059, ctc_loss=0.1273, cr_loss=0.4031, over 6770448.47 frames. ], batch size: 117, lr: 4.26e-03, grad_scale: 32.0 2024-09-18 20:20:20,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=500794.0, ans=0.125 2024-09-18 20:20:25,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=500794.0, ans=0.1 2024-09-18 20:20:25,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=500794.0, ans=0.09899494936611666 2024-09-18 20:21:15,439 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.38 vs. limit=22.5 2024-09-18 20:21:22,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.23 vs. limit=15.0 2024-09-18 20:21:24,937 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=500934.0, ans=0.125 2024-09-18 20:21:26,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=500980.6666666667, ans=0.2 2024-09-18 20:21:39,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=500980.6666666667, ans=0.125 2024-09-18 20:21:44,398 INFO [train.py:1198] (1/2) Epoch 28, batch 2700, loss[loss=0.213, simple_loss=0.2716, pruned_loss=0.05707, ctc_loss=0.123, cr_loss=0.3898, over 34590.00 frames. ], tot_loss[loss=0.2165, simple_loss=0.2698, pruned_loss=0.06075, ctc_loss=0.1277, cr_loss=0.4039, over 6765338.52 frames. ], batch size: 102, lr: 4.26e-03, grad_scale: 32.0 2024-09-18 20:21:44,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=501027.3333333333, ans=0.125 2024-09-18 20:21:46,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=501027.3333333333, ans=0.125 2024-09-18 20:21:46,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=501027.3333333333, ans=0.0 2024-09-18 20:21:49,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=501027.3333333333, ans=0.125 2024-09-18 20:22:06,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=501074.0, ans=0.2 2024-09-18 20:22:15,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501120.6666666667, ans=0.1 2024-09-18 20:22:22,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=501120.6666666667, ans=0.0 2024-09-18 20:22:22,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=501120.6666666667, ans=0.2 2024-09-18 20:22:37,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=501167.3333333333, ans=0.2 2024-09-18 20:22:53,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=501214.0, ans=0.125 2024-09-18 20:22:58,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-09-18 20:23:04,126 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.077e+02 2.550e+02 3.103e+02 3.694e+02 7.326e+02, threshold=6.205e+02, percent-clipped=3.0 2024-09-18 20:23:06,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=501260.6666666667, ans=0.0 2024-09-18 20:23:07,557 INFO [train.py:1198] (1/2) Epoch 28, batch 2750, loss[loss=0.203, simple_loss=0.2556, pruned_loss=0.05543, ctc_loss=0.1189, cr_loss=0.3917, over 34625.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.2685, pruned_loss=0.06023, ctc_loss=0.1267, cr_loss=0.4018, over 6761639.66 frames. ], batch size: 88, lr: 4.26e-03, grad_scale: 32.0 2024-09-18 20:23:14,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=501260.6666666667, ans=0.125 2024-09-18 20:23:24,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=501307.3333333333, ans=0.125 2024-09-18 20:23:37,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=501307.3333333333, ans=0.125 2024-09-18 20:24:25,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=501447.3333333333, ans=0.0 2024-09-18 20:24:26,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=501447.3333333333, ans=0.125 2024-09-18 20:24:27,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.80 vs. limit=15.0 2024-09-18 20:24:30,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=501447.3333333333, ans=0.125 2024-09-18 20:24:33,939 INFO [train.py:1198] (1/2) Epoch 28, batch 2800, loss[loss=0.2408, simple_loss=0.2919, pruned_loss=0.07199, ctc_loss=0.1491, cr_loss=0.3978, over 24293.00 frames. ], tot_loss[loss=0.2154, simple_loss=0.2686, pruned_loss=0.06033, ctc_loss=0.1269, cr_loss=0.4016, over 6740595.87 frames. ], batch size: 244, lr: 4.26e-03, grad_scale: 32.0 2024-09-18 20:24:35,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.66 vs. limit=12.0 2024-09-18 20:24:36,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.12 vs. limit=22.5 2024-09-18 20:24:40,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=501494.0, ans=0.025 2024-09-18 20:24:51,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.24 vs. limit=22.5 2024-09-18 20:24:57,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=501540.6666666667, ans=0.125 2024-09-18 20:25:18,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=501587.3333333333, ans=0.1 2024-09-18 20:25:24,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.81 vs. limit=22.5 2024-09-18 20:25:52,930 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.087e+02 2.612e+02 3.080e+02 3.736e+02 6.277e+02, threshold=6.161e+02, percent-clipped=2.0 2024-09-18 20:25:56,237 INFO [train.py:1198] (1/2) Epoch 28, batch 2850, loss[loss=0.2179, simple_loss=0.2698, pruned_loss=0.06242, ctc_loss=0.1263, cr_loss=0.399, over 34487.00 frames. ], tot_loss[loss=0.216, simple_loss=0.2692, pruned_loss=0.06066, ctc_loss=0.1275, cr_loss=0.4026, over 6725422.48 frames. ], batch size: 90, lr: 4.26e-03, grad_scale: 32.0 2024-09-18 20:26:21,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.43 vs. limit=22.5 2024-09-18 20:26:28,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=501820.6666666667, ans=0.5 2024-09-18 20:26:39,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=501820.6666666667, ans=0.125 2024-09-18 20:26:46,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=501867.3333333333, ans=0.2 2024-09-18 20:27:02,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.93 vs. limit=22.5 2024-09-18 20:27:18,535 INFO [train.py:1198] (1/2) Epoch 28, batch 2900, loss[loss=0.2241, simple_loss=0.277, pruned_loss=0.06404, ctc_loss=0.1325, cr_loss=0.4145, over 34544.00 frames. ], tot_loss[loss=0.217, simple_loss=0.2703, pruned_loss=0.06096, ctc_loss=0.1281, cr_loss=0.4042, over 6755487.81 frames. ], batch size: 94, lr: 4.26e-03, grad_scale: 32.0 2024-09-18 20:27:23,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=501960.6666666667, ans=0.0 2024-09-18 20:27:24,162 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2024-09-18 20:28:07,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=502054.0, ans=0.04949747468305833 2024-09-18 20:28:09,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=502054.0, ans=0.2 2024-09-18 20:28:17,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=502100.6666666667, ans=0.125 2024-09-18 20:28:19,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=502100.6666666667, ans=0.125 2024-09-18 20:28:31,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=502147.3333333333, ans=0.1 2024-09-18 20:28:32,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=12.0 2024-09-18 20:28:42,530 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.993e+02 2.432e+02 2.911e+02 3.684e+02 6.182e+02, threshold=5.822e+02, percent-clipped=2.0 2024-09-18 20:28:45,833 INFO [train.py:1198] (1/2) Epoch 28, batch 2950, loss[loss=0.2051, simple_loss=0.2584, pruned_loss=0.05635, ctc_loss=0.1183, cr_loss=0.3874, over 34640.00 frames. ], tot_loss[loss=0.2159, simple_loss=0.2692, pruned_loss=0.06054, ctc_loss=0.1273, cr_loss=0.4024, over 6749876.75 frames. ], batch size: 88, lr: 4.25e-03, grad_scale: 32.0 2024-09-18 20:28:46,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=502194.0, ans=0.025 2024-09-18 20:29:01,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=502240.6666666667, ans=0.05 2024-09-18 20:29:04,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=502240.6666666667, ans=0.125 2024-09-18 20:29:05,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=502240.6666666667, ans=0.0 2024-09-18 20:29:09,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=502240.6666666667, ans=0.0 2024-09-18 20:29:16,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=502240.6666666667, ans=0.125 2024-09-18 20:29:34,328 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:29:52,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=502380.6666666667, ans=0.2 2024-09-18 20:29:59,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=502380.6666666667, ans=0.0 2024-09-18 20:30:08,674 INFO [train.py:1198] (1/2) Epoch 28, batch 3000, loss[loss=0.2099, simple_loss=0.2641, pruned_loss=0.05744, ctc_loss=0.1228, cr_loss=0.4102, over 34548.00 frames. ], tot_loss[loss=0.2156, simple_loss=0.2688, pruned_loss=0.06045, ctc_loss=0.1271, cr_loss=0.4025, over 6750863.23 frames. ], batch size: 94, lr: 4.25e-03, grad_scale: 32.0 2024-09-18 20:30:08,674 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 20:30:15,833 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.8810, 3.0257, 2.8885, 3.1327, 2.9358, 2.1514, 2.9102, 3.0675], device='cuda:1') 2024-09-18 20:30:25,708 INFO [train.py:1230] (1/2) Epoch 28, validation: loss=0.1484, simple_loss=0.244, pruned_loss=0.02245, ctc_loss=0.03987, cr_loss=2.005e-14, over 944034.00 frames. 2024-09-18 20:30:25,708 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 20:30:42,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=502474.0, ans=0.0 2024-09-18 20:30:50,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=502474.0, ans=0.025 2024-09-18 20:30:57,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2024-09-18 20:31:26,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=502567.3333333333, ans=0.2 2024-09-18 20:31:44,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=502614.0, ans=0.125 2024-09-18 20:31:46,180 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.466e+02 2.735e+02 3.274e+02 6.075e+02, threshold=5.470e+02, percent-clipped=2.0 2024-09-18 20:31:49,591 INFO [train.py:1198] (1/2) Epoch 28, batch 3050, loss[loss=0.2008, simple_loss=0.2558, pruned_loss=0.05374, ctc_loss=0.1154, cr_loss=0.3813, over 34573.00 frames. ], tot_loss[loss=0.2168, simple_loss=0.2699, pruned_loss=0.06092, ctc_loss=0.1281, cr_loss=0.4047, over 6743020.70 frames. ], batch size: 89, lr: 4.25e-03, grad_scale: 32.0 2024-09-18 20:31:54,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-18 20:32:27,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.78 vs. limit=10.0 2024-09-18 20:32:43,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=502800.6666666667, ans=0.0 2024-09-18 20:33:00,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=502847.3333333333, ans=0.2 2024-09-18 20:33:02,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.14 vs. limit=12.0 2024-09-18 20:33:10,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=502894.0, ans=0.05 2024-09-18 20:33:11,696 INFO [train.py:1198] (1/2) Epoch 28, batch 3100, loss[loss=0.2256, simple_loss=0.2853, pruned_loss=0.0615, ctc_loss=0.131, cr_loss=0.4179, over 34210.00 frames. ], tot_loss[loss=0.2166, simple_loss=0.2696, pruned_loss=0.06092, ctc_loss=0.128, cr_loss=0.4043, over 6742469.94 frames. ], batch size: 117, lr: 4.25e-03, grad_scale: 32.0 2024-09-18 20:33:24,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=502894.0, ans=0.035 2024-09-18 20:33:42,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=502987.3333333333, ans=0.09899494936611666 2024-09-18 20:33:56,016 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2024-09-18 20:34:02,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503034.0, ans=0.1 2024-09-18 20:34:14,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=503034.0, ans=0.125 2024-09-18 20:34:17,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=503080.6666666667, ans=0.0 2024-09-18 20:34:20,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=503080.6666666667, ans=0.125 2024-09-18 20:34:21,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=503080.6666666667, ans=0.125 2024-09-18 20:34:29,998 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.971e+02 2.465e+02 2.968e+02 3.612e+02 8.233e+02, threshold=5.936e+02, percent-clipped=3.0 2024-09-18 20:34:33,261 INFO [train.py:1198] (1/2) Epoch 28, batch 3150, loss[loss=0.2311, simple_loss=0.2911, pruned_loss=0.06375, ctc_loss=0.1351, cr_loss=0.4128, over 33889.00 frames. ], tot_loss[loss=0.2163, simple_loss=0.2694, pruned_loss=0.06075, ctc_loss=0.1278, cr_loss=0.4035, over 6748405.81 frames. ], batch size: 122, lr: 4.25e-03, grad_scale: 32.0 2024-09-18 20:34:33,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=503127.3333333333, ans=0.1 2024-09-18 20:34:40,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503127.3333333333, ans=0.1 2024-09-18 20:34:48,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=503174.0, ans=0.2 2024-09-18 20:34:52,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=503174.0, ans=0.125 2024-09-18 20:34:56,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=503174.0, ans=0.05 2024-09-18 20:34:59,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=503174.0, ans=0.0 2024-09-18 20:35:01,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.76 vs. limit=15.0 2024-09-18 20:35:02,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=503174.0, ans=0.5 2024-09-18 20:35:07,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503220.6666666667, ans=0.1 2024-09-18 20:35:12,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=503220.6666666667, ans=0.0 2024-09-18 20:35:23,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=503267.3333333333, ans=0.0 2024-09-18 20:35:31,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=503267.3333333333, ans=0.1 2024-09-18 20:35:38,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=503314.0, ans=0.125 2024-09-18 20:35:53,959 INFO [train.py:1198] (1/2) Epoch 28, batch 3200, loss[loss=0.2151, simple_loss=0.2677, pruned_loss=0.0601, ctc_loss=0.129, cr_loss=0.4098, over 34563.00 frames. ], tot_loss[loss=0.2155, simple_loss=0.2687, pruned_loss=0.0604, ctc_loss=0.127, cr_loss=0.4023, over 6762230.10 frames. ], batch size: 94, lr: 4.25e-03, grad_scale: 32.0 2024-09-18 20:35:58,257 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.72 vs. limit=22.5 2024-09-18 20:35:59,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=503360.6666666667, ans=0.2 2024-09-18 20:36:18,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503407.3333333333, ans=0.1 2024-09-18 20:36:44,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=503500.6666666667, ans=0.125 2024-09-18 20:36:44,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=503500.6666666667, ans=0.2 2024-09-18 20:36:50,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=15.0 2024-09-18 20:37:02,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=503547.3333333333, ans=0.125 2024-09-18 20:37:10,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.84 vs. limit=10.0 2024-09-18 20:37:11,939 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.448e+02 2.825e+02 3.149e+02 5.428e+02, threshold=5.649e+02, percent-clipped=0.0 2024-09-18 20:37:15,157 INFO [train.py:1198] (1/2) Epoch 28, batch 3250, loss[loss=0.2322, simple_loss=0.2845, pruned_loss=0.06722, ctc_loss=0.14, cr_loss=0.4366, over 34648.00 frames. ], tot_loss[loss=0.2158, simple_loss=0.269, pruned_loss=0.06049, ctc_loss=0.1271, cr_loss=0.4025, over 6771412.47 frames. ], batch size: 98, lr: 4.25e-03, grad_scale: 32.0 2024-09-18 20:37:24,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.88 vs. limit=6.0 2024-09-18 20:37:31,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=503640.6666666667, ans=0.125 2024-09-18 20:37:33,066 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:37:35,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.05 vs. limit=15.0 2024-09-18 20:37:58,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2024-09-18 20:38:03,450 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:38:11,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=503734.0, ans=0.125 2024-09-18 20:38:16,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=503734.0, ans=0.125 2024-09-18 20:38:17,299 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=22.5 2024-09-18 20:38:21,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=503780.6666666667, ans=0.125 2024-09-18 20:38:35,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=503780.6666666667, ans=0.125 2024-09-18 20:38:38,562 INFO [train.py:1198] (1/2) Epoch 28, batch 3300, loss[loss=0.2354, simple_loss=0.2898, pruned_loss=0.06759, ctc_loss=0.1415, cr_loss=0.4397, over 33083.00 frames. ], tot_loss[loss=0.2149, simple_loss=0.268, pruned_loss=0.06017, ctc_loss=0.1266, cr_loss=0.4014, over 6770665.59 frames. ], batch size: 130, lr: 4.25e-03, grad_scale: 32.0 2024-09-18 20:38:43,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=503827.3333333333, ans=0.125 2024-09-18 20:39:13,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.23 vs. limit=15.0 2024-09-18 20:39:17,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=503920.6666666667, ans=0.2 2024-09-18 20:39:25,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=503967.3333333333, ans=0.0 2024-09-18 20:39:30,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=503967.3333333333, ans=0.125 2024-09-18 20:40:00,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=504014.0, ans=15.0 2024-09-18 20:40:02,384 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.101e+02 2.426e+02 2.800e+02 3.363e+02 6.620e+02, threshold=5.600e+02, percent-clipped=3.0 2024-09-18 20:40:02,889 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.015e-02 2024-09-18 20:40:05,586 INFO [train.py:1198] (1/2) Epoch 28, batch 3350, loss[loss=0.2326, simple_loss=0.2906, pruned_loss=0.06493, ctc_loss=0.1379, cr_loss=0.4268, over 33900.00 frames. ], tot_loss[loss=0.2159, simple_loss=0.2691, pruned_loss=0.06053, ctc_loss=0.1274, cr_loss=0.4036, over 6745306.29 frames. ], batch size: 122, lr: 4.25e-03, grad_scale: 32.0 2024-09-18 20:40:14,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2024-09-18 20:40:43,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=504154.0, ans=0.2 2024-09-18 20:40:51,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=504154.0, ans=0.1 2024-09-18 20:41:06,770 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.04 vs. limit=15.0 2024-09-18 20:41:09,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=504247.3333333333, ans=0.025 2024-09-18 20:41:26,549 INFO [train.py:1198] (1/2) Epoch 28, batch 3400, loss[loss=0.1822, simple_loss=0.2348, pruned_loss=0.04801, ctc_loss=0.1019, cr_loss=0.3302, over 34162.00 frames. ], tot_loss[loss=0.2161, simple_loss=0.2691, pruned_loss=0.06071, ctc_loss=0.1277, cr_loss=0.4042, over 6736080.17 frames. ], batch size: 78, lr: 4.25e-03, grad_scale: 32.0 2024-09-18 20:42:07,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=504387.3333333333, ans=0.125 2024-09-18 20:42:20,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=504434.0, ans=0.1 2024-09-18 20:42:39,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=504480.6666666667, ans=0.1 2024-09-18 20:42:43,953 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.191e+02 2.509e+02 2.967e+02 3.489e+02 5.585e+02, threshold=5.935e+02, percent-clipped=0.0 2024-09-18 20:42:48,552 INFO [train.py:1198] (1/2) Epoch 28, batch 3450, loss[loss=0.2136, simple_loss=0.2718, pruned_loss=0.05772, ctc_loss=0.122, cr_loss=0.3924, over 33147.00 frames. ], tot_loss[loss=0.2162, simple_loss=0.2692, pruned_loss=0.06079, ctc_loss=0.1277, cr_loss=0.4038, over 6747794.37 frames. ], batch size: 130, lr: 4.24e-03, grad_scale: 32.0 2024-09-18 20:42:52,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=504527.3333333333, ans=0.125 2024-09-18 20:43:01,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.99 vs. limit=15.0 2024-09-18 20:43:18,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=504574.0, ans=0.025 2024-09-18 20:43:21,975 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2024-09-18 20:43:37,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=504667.3333333333, ans=0.0 2024-09-18 20:44:05,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=504714.0, ans=0.2 2024-09-18 20:44:09,541 INFO [train.py:1198] (1/2) Epoch 28, batch 3500, loss[loss=0.1908, simple_loss=0.2428, pruned_loss=0.05136, ctc_loss=0.1103, cr_loss=0.3523, over 34473.00 frames. ], tot_loss[loss=0.2156, simple_loss=0.2686, pruned_loss=0.06051, ctc_loss=0.1272, cr_loss=0.4025, over 6749238.82 frames. ], batch size: 85, lr: 4.24e-03, grad_scale: 32.0 2024-09-18 20:44:13,655 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.68 vs. limit=10.0 2024-09-18 20:44:14,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=504760.6666666667, ans=0.125 2024-09-18 20:44:18,527 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.06 vs. limit=22.5 2024-09-18 20:44:31,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=504807.3333333333, ans=0.125 2024-09-18 20:44:36,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=504807.3333333333, ans=0.2 2024-09-18 20:44:39,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=504807.3333333333, ans=0.2 2024-09-18 20:44:59,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2024-09-18 20:45:03,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=504900.6666666667, ans=0.07 2024-09-18 20:45:18,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=504947.3333333333, ans=0.0 2024-09-18 20:45:23,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=504947.3333333333, ans=0.125 2024-09-18 20:45:27,569 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.457e+02 2.762e+02 3.412e+02 8.712e+02, threshold=5.523e+02, percent-clipped=2.0 2024-09-18 20:45:30,783 INFO [train.py:1198] (1/2) Epoch 28, batch 3550, loss[loss=0.2203, simple_loss=0.2764, pruned_loss=0.06104, ctc_loss=0.1296, cr_loss=0.4063, over 34396.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.2683, pruned_loss=0.06028, ctc_loss=0.1268, cr_loss=0.4019, over 6758531.16 frames. ], batch size: 103, lr: 4.24e-03, grad_scale: 32.0 2024-09-18 20:45:32,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=504994.0, ans=0.125 2024-09-18 20:45:40,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=504994.0, ans=0.0 2024-09-18 20:45:50,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=505040.6666666667, ans=0.125 2024-09-18 20:46:02,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=505087.3333333333, ans=0.125 2024-09-18 20:46:16,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.43 vs. limit=10.0 2024-09-18 20:46:41,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=505180.6666666667, ans=0.1 2024-09-18 20:46:43,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=22.5 2024-09-18 20:46:51,351 INFO [train.py:1198] (1/2) Epoch 28, batch 3600, loss[loss=0.2089, simple_loss=0.2599, pruned_loss=0.05853, ctc_loss=0.1236, cr_loss=0.4015, over 34495.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.2686, pruned_loss=0.06023, ctc_loss=0.1267, cr_loss=0.4025, over 6767701.92 frames. ], batch size: 90, lr: 4.24e-03, grad_scale: 32.0 2024-09-18 20:46:55,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2024-09-18 20:47:07,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=505274.0, ans=0.09899494936611666 2024-09-18 20:47:23,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=505320.6666666667, ans=0.95 2024-09-18 20:47:32,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=505320.6666666667, ans=0.0 2024-09-18 20:47:48,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=505367.3333333333, ans=0.05 2024-09-18 20:47:48,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=505367.3333333333, ans=0.025 2024-09-18 20:47:51,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=505367.3333333333, ans=0.0 2024-09-18 20:48:02,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=505414.0, ans=0.125 2024-09-18 20:48:09,098 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.110e+02 2.541e+02 3.383e+02 4.467e+02 7.321e+02, threshold=6.765e+02, percent-clipped=6.0 2024-09-18 20:48:09,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=505414.0, ans=0.0 2024-09-18 20:48:12,311 INFO [train.py:1198] (1/2) Epoch 28, batch 3650, loss[loss=0.2328, simple_loss=0.2833, pruned_loss=0.06836, ctc_loss=0.1409, cr_loss=0.4379, over 34435.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2679, pruned_loss=0.05991, ctc_loss=0.1261, cr_loss=0.4014, over 6770344.77 frames. ], batch size: 110, lr: 4.24e-03, grad_scale: 32.0 2024-09-18 20:48:12,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=505460.6666666667, ans=0.0 2024-09-18 20:48:15,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=505460.6666666667, ans=0.125 2024-09-18 20:48:16,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=505460.6666666667, ans=15.0 2024-09-18 20:48:33,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=505507.3333333333, ans=0.125 2024-09-18 20:48:35,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=505507.3333333333, ans=0.0 2024-09-18 20:48:38,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=505507.3333333333, ans=0.2 2024-09-18 20:48:40,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=505507.3333333333, ans=0.125 2024-09-18 20:49:25,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2024-09-18 20:49:26,389 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:49:29,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=505647.3333333333, ans=0.07 2024-09-18 20:49:32,392 INFO [train.py:1198] (1/2) Epoch 28, batch 3700, loss[loss=0.2285, simple_loss=0.284, pruned_loss=0.06449, ctc_loss=0.1339, cr_loss=0.4295, over 34626.00 frames. ], tot_loss[loss=0.2144, simple_loss=0.2681, pruned_loss=0.05975, ctc_loss=0.1259, cr_loss=0.4009, over 6785229.51 frames. ], batch size: 102, lr: 4.24e-03, grad_scale: 32.0 2024-09-18 20:49:54,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.64 vs. limit=15.0 2024-09-18 20:50:08,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=505787.3333333333, ans=0.125 2024-09-18 20:50:12,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=505787.3333333333, ans=0.0 2024-09-18 20:50:30,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=505834.0, ans=0.1 2024-09-18 20:50:36,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=505834.0, ans=0.125 2024-09-18 20:50:45,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=505880.6666666667, ans=0.125 2024-09-18 20:50:51,655 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.103e+02 2.346e+02 2.538e+02 3.136e+02 5.312e+02, threshold=5.077e+02, percent-clipped=0.0 2024-09-18 20:50:54,981 INFO [train.py:1198] (1/2) Epoch 28, batch 3750, loss[loss=0.2327, simple_loss=0.284, pruned_loss=0.06797, ctc_loss=0.1394, cr_loss=0.4396, over 34344.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2717, pruned_loss=0.06113, ctc_loss=0.1285, cr_loss=0.4074, over 6786714.45 frames. ], batch size: 113, lr: 4.24e-03, grad_scale: 32.0 2024-09-18 20:51:11,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=505974.0, ans=0.125 2024-09-18 20:51:30,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=506020.6666666667, ans=0.125 2024-09-18 20:51:40,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=506020.6666666667, ans=0.0 2024-09-18 20:51:42,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=506067.3333333333, ans=0.125 2024-09-18 20:51:43,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=506067.3333333333, ans=0.0 2024-09-18 20:51:53,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=506067.3333333333, ans=0.1 2024-09-18 20:52:15,672 INFO [train.py:1198] (1/2) Epoch 28, batch 3800, loss[loss=0.2403, simple_loss=0.2824, pruned_loss=0.07487, ctc_loss=0.1512, cr_loss=0.4545, over 29918.00 frames. ], tot_loss[loss=0.2209, simple_loss=0.274, pruned_loss=0.06249, ctc_loss=0.1311, cr_loss=0.4123, over 6676144.51 frames. ], batch size: 175, lr: 4.24e-03, grad_scale: 32.0 2024-09-18 20:52:30,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=506160.6666666667, ans=0.025 2024-09-18 20:52:44,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=506207.3333333333, ans=0.1 2024-09-18 20:52:47,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=15.96 vs. limit=15.0 2024-09-18 20:53:00,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=506254.0, ans=0.0 2024-09-18 20:53:06,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=12.0 2024-09-18 20:53:36,500 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 2.524e+02 2.741e+02 3.000e+02 4.331e+02, threshold=5.482e+02, percent-clipped=0.0 2024-09-18 20:53:37,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=506347.3333333333, ans=0.025 2024-09-18 20:53:40,173 INFO [train.py:1198] (1/2) Epoch 28, batch 3850, loss[loss=0.2526, simple_loss=0.2919, pruned_loss=0.08042, ctc_loss=0.1698, cr_loss=0.4621, over 23060.00 frames. ], tot_loss[loss=0.2249, simple_loss=0.2766, pruned_loss=0.06469, ctc_loss=0.1357, cr_loss=0.4165, over 6249156.28 frames. ], batch size: 244, lr: 4.24e-03, grad_scale: 32.0 2024-09-18 20:53:41,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.93 vs. limit=15.0 2024-09-18 20:53:48,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=506394.0, ans=0.125 2024-09-18 20:53:50,521 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:54:00,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=506440.6666666667, ans=0.1 2024-09-18 20:55:07,304 INFO [train.py:1198] (1/2) Epoch 29, batch 0, loss[loss=0.2028, simple_loss=0.2576, pruned_loss=0.05412, ctc_loss=0.1197, cr_loss=0.3967, over 34490.00 frames. ], tot_loss[loss=0.2028, simple_loss=0.2576, pruned_loss=0.05412, ctc_loss=0.1197, cr_loss=0.3967, over 34490.00 frames. ], batch size: 85, lr: 4.16e-03, grad_scale: 32.0 2024-09-18 20:55:07,304 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 20:55:12,152 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.8083, 1.8957, 3.5822, 3.6153], device='cuda:1') 2024-09-18 20:55:24,290 INFO [train.py:1230] (1/2) Epoch 29, validation: loss=0.1488, simple_loss=0.2449, pruned_loss=0.02233, ctc_loss=0.04003, cr_loss=2.046e-14, over 944034.00 frames. 2024-09-18 20:55:24,290 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 20:55:40,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.76 vs. limit=15.0 2024-09-18 20:56:19,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=506655.3333333333, ans=0.125 2024-09-18 20:56:34,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=506702.0, ans=0.0 2024-09-18 20:56:48,739 INFO [train.py:1198] (1/2) Epoch 29, batch 50, loss[loss=0.1987, simple_loss=0.2518, pruned_loss=0.05398, ctc_loss=0.1134, cr_loss=0.3715, over 34462.00 frames. ], tot_loss[loss=0.2169, simple_loss=0.2698, pruned_loss=0.06093, ctc_loss=0.1284, cr_loss=0.409, over 1482605.12 frames. ], batch size: 82, lr: 4.16e-03, grad_scale: 32.0 2024-09-18 20:56:53,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=506748.6666666667, ans=0.0 2024-09-18 20:57:00,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=506748.6666666667, ans=0.0 2024-09-18 20:57:00,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=506748.6666666667, ans=0.125 2024-09-18 20:57:12,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.64 vs. limit=22.5 2024-09-18 20:57:25,252 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.489e+02 2.754e+02 3.275e+02 6.568e+02, threshold=5.508e+02, percent-clipped=3.0 2024-09-18 20:57:50,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=506888.6666666667, ans=10.0 2024-09-18 20:57:55,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.76 vs. limit=6.0 2024-09-18 20:57:58,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=506935.3333333333, ans=0.125 2024-09-18 20:58:13,391 INFO [train.py:1198] (1/2) Epoch 29, batch 100, loss[loss=0.1981, simple_loss=0.2505, pruned_loss=0.05366, ctc_loss=0.1155, cr_loss=0.3816, over 34600.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2719, pruned_loss=0.06167, ctc_loss=0.1297, cr_loss=0.4115, over 2630041.04 frames. ], batch size: 89, lr: 4.16e-03, grad_scale: 32.0 2024-09-18 20:59:00,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.18 vs. limit=15.0 2024-09-18 20:59:11,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=10.20 vs. limit=10.0 2024-09-18 20:59:29,742 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 20:59:37,435 INFO [train.py:1198] (1/2) Epoch 29, batch 150, loss[loss=0.1908, simple_loss=0.2435, pruned_loss=0.05106, ctc_loss=0.108, cr_loss=0.3584, over 34488.00 frames. ], tot_loss[loss=0.2151, simple_loss=0.2687, pruned_loss=0.06001, ctc_loss=0.1264, cr_loss=0.4032, over 3557816.13 frames. ], batch size: 82, lr: 4.16e-03, grad_scale: 32.0 2024-09-18 20:59:37,792 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.156e-02 2024-09-18 20:59:37,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=507215.3333333333, ans=0.0 2024-09-18 20:59:50,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=507215.3333333333, ans=0.09899494936611666 2024-09-18 20:59:57,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=507262.0, ans=0.125 2024-09-18 21:00:07,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=507262.0, ans=0.025 2024-09-18 21:00:09,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=507308.6666666667, ans=0.0 2024-09-18 21:00:13,377 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.016e+02 2.390e+02 2.954e+02 3.842e+02 6.210e+02, threshold=5.907e+02, percent-clipped=5.0 2024-09-18 21:00:26,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=507355.3333333333, ans=0.125 2024-09-18 21:00:56,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=507402.0, ans=0.05 2024-09-18 21:00:56,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=507402.0, ans=0.0 2024-09-18 21:00:59,123 INFO [train.py:1198] (1/2) Epoch 29, batch 200, loss[loss=0.2501, simple_loss=0.2976, pruned_loss=0.07636, ctc_loss=0.1547, cr_loss=0.4707, over 31958.00 frames. ], tot_loss[loss=0.2148, simple_loss=0.2681, pruned_loss=0.06005, ctc_loss=0.1265, cr_loss=0.4027, over 4270588.87 frames. ], batch size: 145, lr: 4.16e-03, grad_scale: 32.0 2024-09-18 21:01:14,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=507448.6666666667, ans=0.1 2024-09-18 21:01:46,617 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=9.59 vs. limit=15.0 2024-09-18 21:01:47,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=507542.0, ans=0.125 2024-09-18 21:01:55,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=507588.6666666667, ans=0.1 2024-09-18 21:01:56,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.02 vs. limit=12.0 2024-09-18 21:01:59,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=507588.6666666667, ans=0.0 2024-09-18 21:01:59,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=507588.6666666667, ans=0.1 2024-09-18 21:02:12,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=507635.3333333333, ans=0.0 2024-09-18 21:02:23,676 INFO [train.py:1198] (1/2) Epoch 29, batch 250, loss[loss=0.2426, simple_loss=0.2944, pruned_loss=0.07125, ctc_loss=0.1478, cr_loss=0.4664, over 34253.00 frames. ], tot_loss[loss=0.2147, simple_loss=0.2682, pruned_loss=0.05994, ctc_loss=0.1262, cr_loss=0.4022, over 4832491.59 frames. ], batch size: 117, lr: 4.16e-03, grad_scale: 32.0 2024-09-18 21:02:25,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=507682.0, ans=0.0 2024-09-18 21:02:40,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=507728.6666666667, ans=0.125 2024-09-18 21:02:47,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=507728.6666666667, ans=0.1 2024-09-18 21:03:02,638 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.021e+02 2.525e+02 3.272e+02 4.056e+02 9.231e+02, threshold=6.544e+02, percent-clipped=11.0 2024-09-18 21:03:06,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=507775.3333333333, ans=0.1 2024-09-18 21:03:39,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=507868.6666666667, ans=0.0 2024-09-18 21:03:48,817 INFO [train.py:1198] (1/2) Epoch 29, batch 300, loss[loss=0.2414, simple_loss=0.2882, pruned_loss=0.07321, ctc_loss=0.1489, cr_loss=0.4591, over 34368.00 frames. ], tot_loss[loss=0.214, simple_loss=0.2675, pruned_loss=0.05964, ctc_loss=0.1257, cr_loss=0.4007, over 5262385.97 frames. ], batch size: 107, lr: 4.16e-03, grad_scale: 32.0 2024-09-18 21:03:55,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=507915.3333333333, ans=0.125 2024-09-18 21:04:25,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=508008.6666666667, ans=0.1 2024-09-18 21:04:26,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=508008.6666666667, ans=0.0 2024-09-18 21:04:30,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=508008.6666666667, ans=0.09899494936611666 2024-09-18 21:05:03,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=508102.0, ans=0.015 2024-09-18 21:05:13,265 INFO [train.py:1198] (1/2) Epoch 29, batch 350, loss[loss=0.1835, simple_loss=0.2393, pruned_loss=0.04685, ctc_loss=0.1014, cr_loss=0.3442, over 34269.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2681, pruned_loss=0.05985, ctc_loss=0.126, cr_loss=0.4009, over 5598492.23 frames. ], batch size: 83, lr: 4.15e-03, grad_scale: 32.0 2024-09-18 21:05:25,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=508148.6666666667, ans=0.1 2024-09-18 21:05:26,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=508148.6666666667, ans=0.125 2024-09-18 21:05:39,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=508195.3333333333, ans=0.125 2024-09-18 21:05:44,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=508242.0, ans=0.1 2024-09-18 21:05:49,289 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.089e+02 2.496e+02 2.933e+02 3.495e+02 6.313e+02, threshold=5.865e+02, percent-clipped=0.0 2024-09-18 21:06:35,686 INFO [train.py:1198] (1/2) Epoch 29, batch 400, loss[loss=0.2297, simple_loss=0.2816, pruned_loss=0.06654, ctc_loss=0.1377, cr_loss=0.4313, over 34406.00 frames. ], tot_loss[loss=0.2142, simple_loss=0.2678, pruned_loss=0.05968, ctc_loss=0.1257, cr_loss=0.3998, over 5864881.90 frames. ], batch size: 95, lr: 4.15e-03, grad_scale: 32.0 2024-09-18 21:06:58,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=508428.6666666667, ans=0.125 2024-09-18 21:07:03,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=508428.6666666667, ans=0.2 2024-09-18 21:07:03,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=508428.6666666667, ans=0.025 2024-09-18 21:07:51,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=508568.6666666667, ans=0.125 2024-09-18 21:07:51,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=508568.6666666667, ans=0.0 2024-09-18 21:08:00,821 INFO [train.py:1198] (1/2) Epoch 29, batch 450, loss[loss=0.2293, simple_loss=0.2797, pruned_loss=0.06722, ctc_loss=0.1376, cr_loss=0.4235, over 34693.00 frames. ], tot_loss[loss=0.2144, simple_loss=0.268, pruned_loss=0.05979, ctc_loss=0.126, cr_loss=0.4006, over 6053368.70 frames. ], batch size: 97, lr: 4.15e-03, grad_scale: 16.0 2024-09-18 21:08:39,357 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.473e+02 2.844e+02 3.512e+02 6.613e+02, threshold=5.688e+02, percent-clipped=2.0 2024-09-18 21:08:49,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=508708.6666666667, ans=0.0 2024-09-18 21:08:51,945 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=12.0 2024-09-18 21:09:11,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=508802.0, ans=0.2 2024-09-18 21:09:26,001 INFO [train.py:1198] (1/2) Epoch 29, batch 500, loss[loss=0.2231, simple_loss=0.279, pruned_loss=0.06189, ctc_loss=0.1313, cr_loss=0.4293, over 34410.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2674, pruned_loss=0.05947, ctc_loss=0.1253, cr_loss=0.3998, over 6220731.87 frames. ], batch size: 110, lr: 4.15e-03, grad_scale: 16.0 2024-09-18 21:09:32,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=508848.6666666667, ans=0.1 2024-09-18 21:09:36,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=508848.6666666667, ans=0.0 2024-09-18 21:09:47,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=508895.3333333333, ans=0.125 2024-09-18 21:09:52,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=508895.3333333333, ans=0.025 2024-09-18 21:10:22,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=508988.6666666667, ans=0.125 2024-09-18 21:10:51,292 INFO [train.py:1198] (1/2) Epoch 29, batch 550, loss[loss=0.2229, simple_loss=0.2786, pruned_loss=0.06221, ctc_loss=0.1301, cr_loss=0.4207, over 33841.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2674, pruned_loss=0.05948, ctc_loss=0.1254, cr_loss=0.4005, over 6330554.93 frames. ], batch size: 122, lr: 4.15e-03, grad_scale: 16.0 2024-09-18 21:11:11,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.03 vs. limit=22.5 2024-09-18 21:11:14,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=509128.6666666667, ans=0.125 2024-09-18 21:11:16,741 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-09-18 21:11:21,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.22 vs. limit=10.0 2024-09-18 21:11:29,032 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.049e+02 2.393e+02 2.714e+02 3.583e+02 8.431e+02, threshold=5.428e+02, percent-clipped=1.0 2024-09-18 21:11:51,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.14 vs. limit=10.0 2024-09-18 21:11:52,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=509222.0, ans=0.0 2024-09-18 21:12:13,724 INFO [train.py:1198] (1/2) Epoch 29, batch 600, loss[loss=0.2295, simple_loss=0.2835, pruned_loss=0.06569, ctc_loss=0.1367, cr_loss=0.4168, over 34240.00 frames. ], tot_loss[loss=0.2141, simple_loss=0.2678, pruned_loss=0.05963, ctc_loss=0.1255, cr_loss=0.4004, over 6433194.90 frames. ], batch size: 117, lr: 4.15e-03, grad_scale: 16.0 2024-09-18 21:12:34,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.66 vs. limit=15.0 2024-09-18 21:12:41,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=509362.0, ans=0.125 2024-09-18 21:12:45,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=509362.0, ans=0.125 2024-09-18 21:12:54,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.11 vs. limit=15.0 2024-09-18 21:12:56,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=509408.6666666667, ans=0.0 2024-09-18 21:13:11,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=509455.3333333333, ans=0.0 2024-09-18 21:13:30,245 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.62 vs. limit=10.0 2024-09-18 21:13:33,385 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2024-09-18 21:13:37,459 INFO [train.py:1198] (1/2) Epoch 29, batch 650, loss[loss=0.2208, simple_loss=0.2704, pruned_loss=0.06422, ctc_loss=0.1316, cr_loss=0.4127, over 34532.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2671, pruned_loss=0.05927, ctc_loss=0.1247, cr_loss=0.3988, over 6524359.66 frames. ], batch size: 94, lr: 4.15e-03, grad_scale: 16.0 2024-09-18 21:13:44,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=509548.6666666667, ans=0.025 2024-09-18 21:14:03,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=509595.3333333333, ans=0.0 2024-09-18 21:14:07,025 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=22.5 2024-09-18 21:14:07,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=509595.3333333333, ans=0.125 2024-09-18 21:14:16,098 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.073e+02 2.386e+02 2.785e+02 3.854e+02 7.805e+02, threshold=5.569e+02, percent-clipped=5.0 2024-09-18 21:14:18,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=509642.0, ans=0.125 2024-09-18 21:14:23,545 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.50 vs. limit=12.0 2024-09-18 21:14:29,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=509688.6666666667, ans=0.0 2024-09-18 21:14:46,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=509735.3333333333, ans=0.0 2024-09-18 21:15:03,142 INFO [train.py:1198] (1/2) Epoch 29, batch 700, loss[loss=0.1915, simple_loss=0.2441, pruned_loss=0.05107, ctc_loss=0.1097, cr_loss=0.3704, over 34574.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2676, pruned_loss=0.05939, ctc_loss=0.1251, cr_loss=0.3995, over 6582249.51 frames. ], batch size: 89, lr: 4.15e-03, grad_scale: 16.0 2024-09-18 21:15:09,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=509782.0, ans=0.0 2024-09-18 21:15:16,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=509782.0, ans=0.125 2024-09-18 21:15:23,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=15.0 2024-09-18 21:15:24,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=509828.6666666667, ans=0.125 2024-09-18 21:15:48,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=509875.3333333333, ans=0.125 2024-09-18 21:16:13,381 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=15.0 2024-09-18 21:16:27,630 INFO [train.py:1198] (1/2) Epoch 29, batch 750, loss[loss=0.2146, simple_loss=0.2689, pruned_loss=0.05945, ctc_loss=0.1251, cr_loss=0.4098, over 34424.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2672, pruned_loss=0.05922, ctc_loss=0.1248, cr_loss=0.3991, over 6625297.64 frames. ], batch size: 95, lr: 4.15e-03, grad_scale: 16.0 2024-09-18 21:16:29,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=510015.3333333333, ans=0.125 2024-09-18 21:16:31,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=510015.3333333333, ans=15.0 2024-09-18 21:16:47,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=510062.0, ans=0.0 2024-09-18 21:16:55,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=510062.0, ans=0.0 2024-09-18 21:16:58,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=510108.6666666667, ans=0.0 2024-09-18 21:17:05,067 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.052e+02 2.598e+02 3.109e+02 3.915e+02 6.619e+02, threshold=6.217e+02, percent-clipped=4.0 2024-09-18 21:17:10,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=510108.6666666667, ans=0.1 2024-09-18 21:17:21,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.46 vs. limit=15.0 2024-09-18 21:17:50,230 INFO [train.py:1198] (1/2) Epoch 29, batch 800, loss[loss=0.1874, simple_loss=0.2419, pruned_loss=0.04912, ctc_loss=0.1048, cr_loss=0.3413, over 34451.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.2673, pruned_loss=0.05933, ctc_loss=0.1251, cr_loss=0.3992, over 6661413.52 frames. ], batch size: 85, lr: 4.15e-03, grad_scale: 32.0 2024-09-18 21:18:02,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2024-09-18 21:18:56,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=510435.3333333333, ans=0.2 2024-09-18 21:19:14,016 INFO [train.py:1198] (1/2) Epoch 29, batch 850, loss[loss=0.225, simple_loss=0.2815, pruned_loss=0.06287, ctc_loss=0.1315, cr_loss=0.4135, over 34385.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2671, pruned_loss=0.05929, ctc_loss=0.1251, cr_loss=0.3996, over 6693945.83 frames. ], batch size: 103, lr: 4.14e-03, grad_scale: 32.0 2024-09-18 21:19:14,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=510482.0, ans=0.0 2024-09-18 21:19:25,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=510482.0, ans=0.0 2024-09-18 21:19:28,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=510528.6666666667, ans=0.2 2024-09-18 21:19:49,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=510575.3333333333, ans=0.125 2024-09-18 21:19:52,112 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.076e+02 2.422e+02 2.948e+02 3.627e+02 6.741e+02, threshold=5.896e+02, percent-clipped=3.0 2024-09-18 21:19:59,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=510575.3333333333, ans=0.0 2024-09-18 21:20:38,925 INFO [train.py:1198] (1/2) Epoch 29, batch 900, loss[loss=0.184, simple_loss=0.2399, pruned_loss=0.0463, ctc_loss=0.1074, cr_loss=0.3481, over 34457.00 frames. ], tot_loss[loss=0.2134, simple_loss=0.2672, pruned_loss=0.05933, ctc_loss=0.1252, cr_loss=0.3993, over 6699193.80 frames. ], batch size: 85, lr: 4.14e-03, grad_scale: 16.0 2024-09-18 21:21:00,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff3.min_abs, batch_count=510762.0, ans=0.2 2024-09-18 21:21:05,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=510762.0, ans=0.5 2024-09-18 21:21:30,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=510855.3333333333, ans=0.125 2024-09-18 21:21:31,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=510855.3333333333, ans=0.2 2024-09-18 21:22:00,951 INFO [train.py:1198] (1/2) Epoch 29, batch 950, loss[loss=0.1855, simple_loss=0.2426, pruned_loss=0.04731, ctc_loss=0.1008, cr_loss=0.3396, over 34692.00 frames. ], tot_loss[loss=0.2138, simple_loss=0.2675, pruned_loss=0.0595, ctc_loss=0.1256, cr_loss=0.4, over 6701583.33 frames. ], batch size: 87, lr: 4.14e-03, grad_scale: 16.0 2024-09-18 21:22:06,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=510948.6666666667, ans=0.025 2024-09-18 21:22:11,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.85 vs. limit=15.0 2024-09-18 21:22:19,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=510995.3333333333, ans=0.125 2024-09-18 21:22:37,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=511042.0, ans=0.0 2024-09-18 21:22:42,615 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.428e+02 2.765e+02 3.403e+02 1.200e+03, threshold=5.530e+02, percent-clipped=2.0 2024-09-18 21:22:56,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=511088.6666666667, ans=0.0 2024-09-18 21:23:21,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=511135.3333333333, ans=0.0 2024-09-18 21:23:25,986 INFO [train.py:1198] (1/2) Epoch 29, batch 1000, loss[loss=0.2154, simple_loss=0.2669, pruned_loss=0.06116, ctc_loss=0.1267, cr_loss=0.4063, over 34503.00 frames. ], tot_loss[loss=0.2149, simple_loss=0.2683, pruned_loss=0.06001, ctc_loss=0.1264, cr_loss=0.4017, over 6695773.26 frames. ], batch size: 90, lr: 4.14e-03, grad_scale: 16.0 2024-09-18 21:23:26,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=511182.0, ans=0.025 2024-09-18 21:23:34,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=511182.0, ans=0.125 2024-09-18 21:23:37,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=511182.0, ans=0.0 2024-09-18 21:23:46,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=511228.6666666667, ans=0.125 2024-09-18 21:24:29,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=511322.0, ans=0.125 2024-09-18 21:24:46,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.09 vs. limit=22.5 2024-09-18 21:24:50,241 INFO [train.py:1198] (1/2) Epoch 29, batch 1050, loss[loss=0.2218, simple_loss=0.2793, pruned_loss=0.06078, ctc_loss=0.1304, cr_loss=0.4186, over 34542.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2679, pruned_loss=0.05989, ctc_loss=0.1263, cr_loss=0.4013, over 6703995.64 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 16.0 2024-09-18 21:24:53,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=511415.3333333333, ans=0.1 2024-09-18 21:25:15,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=511462.0, ans=0.0 2024-09-18 21:25:30,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.314e+02 2.606e+02 3.284e+02 6.582e+02, threshold=5.211e+02, percent-clipped=1.0 2024-09-18 21:25:37,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=511508.6666666667, ans=0.0 2024-09-18 21:25:40,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2024-09-18 21:25:57,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2024-09-18 21:26:08,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=511602.0, ans=0.1 2024-09-18 21:26:15,110 INFO [train.py:1198] (1/2) Epoch 29, batch 1100, loss[loss=0.2014, simple_loss=0.2557, pruned_loss=0.05443, ctc_loss=0.1172, cr_loss=0.371, over 34376.00 frames. ], tot_loss[loss=0.2143, simple_loss=0.2676, pruned_loss=0.05985, ctc_loss=0.1263, cr_loss=0.4014, over 6717314.97 frames. ], batch size: 91, lr: 4.14e-03, grad_scale: 16.0 2024-09-18 21:26:30,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=511695.3333333333, ans=0.125 2024-09-18 21:26:38,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2024-09-18 21:26:56,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=511742.0, ans=0.125 2024-09-18 21:26:58,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=511742.0, ans=0.125 2024-09-18 21:27:17,042 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.57 vs. limit=10.0 2024-09-18 21:27:39,532 INFO [train.py:1198] (1/2) Epoch 29, batch 1150, loss[loss=0.2025, simple_loss=0.2565, pruned_loss=0.05531, ctc_loss=0.1138, cr_loss=0.3759, over 34348.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.2673, pruned_loss=0.05967, ctc_loss=0.1259, cr_loss=0.4009, over 6716568.32 frames. ], batch size: 91, lr: 4.14e-03, grad_scale: 16.0 2024-09-18 21:27:48,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=511882.0, ans=0.0 2024-09-18 21:27:58,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=511928.6666666667, ans=0.0 2024-09-18 21:27:59,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=511928.6666666667, ans=0.125 2024-09-18 21:28:11,627 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:28:11,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=511975.3333333333, ans=0.125 2024-09-18 21:28:17,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=511975.3333333333, ans=0.125 2024-09-18 21:28:19,305 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.372e+02 2.795e+02 3.528e+02 6.136e+02, threshold=5.589e+02, percent-clipped=3.0 2024-09-18 21:28:34,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=512022.0, ans=0.025 2024-09-18 21:29:02,054 INFO [train.py:1198] (1/2) Epoch 29, batch 1200, loss[loss=0.2201, simple_loss=0.2787, pruned_loss=0.06014, ctc_loss=0.1262, cr_loss=0.3976, over 34569.00 frames. ], tot_loss[loss=0.2151, simple_loss=0.2685, pruned_loss=0.06012, ctc_loss=0.1267, cr_loss=0.403, over 6708156.92 frames. ], batch size: 99, lr: 4.14e-03, grad_scale: 32.0 2024-09-18 21:29:18,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=512162.0, ans=0.025 2024-09-18 21:29:23,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=512162.0, ans=0.0 2024-09-18 21:29:43,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=512208.6666666667, ans=0.0 2024-09-18 21:30:00,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2024-09-18 21:30:20,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=512302.0, ans=0.125 2024-09-18 21:30:27,155 INFO [train.py:1198] (1/2) Epoch 29, batch 1250, loss[loss=0.2315, simple_loss=0.2821, pruned_loss=0.06748, ctc_loss=0.1397, cr_loss=0.4513, over 34332.00 frames. ], tot_loss[loss=0.2151, simple_loss=0.2686, pruned_loss=0.06012, ctc_loss=0.1266, cr_loss=0.4036, over 6741797.06 frames. ], batch size: 107, lr: 4.14e-03, grad_scale: 32.0 2024-09-18 21:30:29,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=15.0 2024-09-18 21:30:37,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=512348.6666666667, ans=0.125 2024-09-18 21:31:07,178 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.057e+02 2.422e+02 2.798e+02 3.449e+02 6.152e+02, threshold=5.596e+02, percent-clipped=2.0 2024-09-18 21:31:19,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=512488.6666666667, ans=0.0 2024-09-18 21:31:32,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=512488.6666666667, ans=0.025 2024-09-18 21:31:37,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=512535.3333333333, ans=0.0 2024-09-18 21:31:38,184 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.74 vs. limit=15.0 2024-09-18 21:31:39,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=512535.3333333333, ans=0.1 2024-09-18 21:31:52,113 INFO [train.py:1198] (1/2) Epoch 29, batch 1300, loss[loss=0.2197, simple_loss=0.2768, pruned_loss=0.06054, ctc_loss=0.1279, cr_loss=0.3977, over 33159.00 frames. ], tot_loss[loss=0.2143, simple_loss=0.2679, pruned_loss=0.05976, ctc_loss=0.1259, cr_loss=0.402, over 6743561.38 frames. ], batch size: 130, lr: 4.14e-03, grad_scale: 32.0 2024-09-18 21:31:54,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.78 vs. limit=22.5 2024-09-18 21:32:00,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=512582.0, ans=0.1 2024-09-18 21:32:14,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=512628.6666666667, ans=0.0 2024-09-18 21:32:20,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=512628.6666666667, ans=0.125 2024-09-18 21:32:27,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=512675.3333333333, ans=0.125 2024-09-18 21:32:32,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=512675.3333333333, ans=10.0 2024-09-18 21:32:37,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=512675.3333333333, ans=0.125 2024-09-18 21:32:42,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=512722.0, ans=0.1 2024-09-18 21:32:42,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=512722.0, ans=0.2 2024-09-18 21:32:47,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.49 vs. limit=22.5 2024-09-18 21:32:48,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=512722.0, ans=0.025 2024-09-18 21:33:03,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=512768.6666666667, ans=0.025 2024-09-18 21:33:10,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=512768.6666666667, ans=0.125 2024-09-18 21:33:14,907 INFO [train.py:1198] (1/2) Epoch 29, batch 1350, loss[loss=0.2122, simple_loss=0.2655, pruned_loss=0.05896, ctc_loss=0.1246, cr_loss=0.4034, over 34539.00 frames. ], tot_loss[loss=0.2141, simple_loss=0.2677, pruned_loss=0.05966, ctc_loss=0.1256, cr_loss=0.4009, over 6762630.08 frames. ], batch size: 94, lr: 4.14e-03, grad_scale: 32.0 2024-09-18 21:33:36,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=512862.0, ans=0.0 2024-09-18 21:33:51,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=512908.6666666667, ans=0.0 2024-09-18 21:33:54,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=512908.6666666667, ans=0.125 2024-09-18 21:33:56,176 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.050e+02 2.474e+02 2.905e+02 3.736e+02 6.462e+02, threshold=5.809e+02, percent-clipped=2.0 2024-09-18 21:34:13,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=512955.3333333333, ans=0.0 2024-09-18 21:34:39,557 INFO [train.py:1198] (1/2) Epoch 29, batch 1400, loss[loss=0.1914, simple_loss=0.245, pruned_loss=0.05057, ctc_loss=0.1088, cr_loss=0.3729, over 34291.00 frames. ], tot_loss[loss=0.2142, simple_loss=0.2677, pruned_loss=0.0597, ctc_loss=0.1258, cr_loss=0.4016, over 6775486.31 frames. ], batch size: 80, lr: 4.13e-03, grad_scale: 32.0 2024-09-18 21:34:49,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=513048.6666666667, ans=0.125 2024-09-18 21:35:03,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=513095.3333333333, ans=0.0 2024-09-18 21:35:09,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=513095.3333333333, ans=0.125 2024-09-18 21:35:19,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=513142.0, ans=0.2 2024-09-18 21:35:20,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.53 vs. limit=15.0 2024-09-18 21:35:37,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=513188.6666666667, ans=0.0 2024-09-18 21:35:49,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=513235.3333333333, ans=0.0 2024-09-18 21:35:57,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=513235.3333333333, ans=0.0 2024-09-18 21:35:59,349 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:35:59,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=513235.3333333333, ans=0.125 2024-09-18 21:36:03,994 INFO [train.py:1198] (1/2) Epoch 29, batch 1450, loss[loss=0.2327, simple_loss=0.2877, pruned_loss=0.06647, ctc_loss=0.1386, cr_loss=0.4259, over 34456.00 frames. ], tot_loss[loss=0.2146, simple_loss=0.2683, pruned_loss=0.05982, ctc_loss=0.1261, cr_loss=0.4021, over 6773796.70 frames. ], batch size: 110, lr: 4.13e-03, grad_scale: 32.0 2024-09-18 21:36:07,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513282.0, ans=0.1 2024-09-18 21:36:10,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=513282.0, ans=0.025 2024-09-18 21:36:26,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=513328.6666666667, ans=0.0 2024-09-18 21:36:28,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=15.0 2024-09-18 21:36:32,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=513328.6666666667, ans=0.125 2024-09-18 21:36:34,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.06 vs. limit=6.0 2024-09-18 21:36:43,470 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.161e+02 2.493e+02 2.846e+02 3.357e+02 6.529e+02, threshold=5.692e+02, percent-clipped=1.0 2024-09-18 21:37:26,241 INFO [train.py:1198] (1/2) Epoch 29, batch 1500, loss[loss=0.2211, simple_loss=0.278, pruned_loss=0.06071, ctc_loss=0.1301, cr_loss=0.4167, over 34452.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2687, pruned_loss=0.05993, ctc_loss=0.1263, cr_loss=0.4028, over 6775067.88 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 32.0 2024-09-18 21:37:42,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=513515.3333333333, ans=0.0 2024-09-18 21:37:47,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=513562.0, ans=0.0 2024-09-18 21:38:21,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.99 vs. limit=22.5 2024-09-18 21:38:23,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=513655.3333333333, ans=0.125 2024-09-18 21:38:52,687 INFO [train.py:1198] (1/2) Epoch 29, batch 1550, loss[loss=0.2224, simple_loss=0.2746, pruned_loss=0.06341, ctc_loss=0.1331, cr_loss=0.4178, over 34411.00 frames. ], tot_loss[loss=0.2149, simple_loss=0.2685, pruned_loss=0.05999, ctc_loss=0.1264, cr_loss=0.4025, over 6745113.14 frames. ], batch size: 105, lr: 4.13e-03, grad_scale: 16.0 2024-09-18 21:39:14,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=513795.3333333333, ans=0.125 2024-09-18 21:39:18,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.71 vs. limit=15.0 2024-09-18 21:39:29,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=513842.0, ans=0.0 2024-09-18 21:39:33,871 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.442e+02 2.907e+02 3.593e+02 6.320e+02, threshold=5.814e+02, percent-clipped=4.0 2024-09-18 21:39:51,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=513888.6666666667, ans=0.0 2024-09-18 21:39:57,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513935.3333333333, ans=0.1 2024-09-18 21:40:15,177 INFO [train.py:1198] (1/2) Epoch 29, batch 1600, loss[loss=0.2157, simple_loss=0.2735, pruned_loss=0.05804, ctc_loss=0.1259, cr_loss=0.4148, over 34544.00 frames. ], tot_loss[loss=0.2149, simple_loss=0.2682, pruned_loss=0.06005, ctc_loss=0.1266, cr_loss=0.4027, over 6723965.68 frames. ], batch size: 99, lr: 4.13e-03, grad_scale: 32.0 2024-09-18 21:40:17,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.91 vs. limit=15.0 2024-09-18 21:40:50,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=514075.3333333333, ans=0.0 2024-09-18 21:41:06,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=514122.0, ans=0.125 2024-09-18 21:41:20,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=514122.0, ans=0.025 2024-09-18 21:41:35,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=514168.6666666667, ans=0.0 2024-09-18 21:41:39,747 INFO [train.py:1198] (1/2) Epoch 29, batch 1650, loss[loss=0.2178, simple_loss=0.2775, pruned_loss=0.05841, ctc_loss=0.1241, cr_loss=0.4124, over 34387.00 frames. ], tot_loss[loss=0.2144, simple_loss=0.2678, pruned_loss=0.05981, ctc_loss=0.1261, cr_loss=0.4013, over 6717689.13 frames. ], batch size: 103, lr: 4.13e-03, grad_scale: 32.0 2024-09-18 21:41:48,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=514215.3333333333, ans=0.04949747468305833 2024-09-18 21:41:51,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=514215.3333333333, ans=0.125 2024-09-18 21:42:05,253 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 21:42:06,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-18 21:42:11,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=514308.6666666667, ans=0.2 2024-09-18 21:42:19,229 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=12.0 2024-09-18 21:42:21,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=514308.6666666667, ans=0.0 2024-09-18 21:42:23,049 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.177e+02 2.496e+02 2.973e+02 3.598e+02 7.823e+02, threshold=5.945e+02, percent-clipped=5.0 2024-09-18 21:42:33,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=514355.3333333333, ans=0.125 2024-09-18 21:42:48,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.64 vs. limit=15.0 2024-09-18 21:42:58,262 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-18 21:43:02,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=514448.6666666667, ans=15.0 2024-09-18 21:43:03,892 INFO [train.py:1198] (1/2) Epoch 29, batch 1700, loss[loss=0.182, simple_loss=0.2365, pruned_loss=0.04683, ctc_loss=0.1023, cr_loss=0.3355, over 34316.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.2676, pruned_loss=0.05958, ctc_loss=0.1257, cr_loss=0.4005, over 6743389.96 frames. ], batch size: 80, lr: 4.13e-03, grad_scale: 32.0 2024-09-18 21:44:04,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=15.0 2024-09-18 21:44:23,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=514635.3333333333, ans=0.0 2024-09-18 21:44:26,161 INFO [train.py:1198] (1/2) Epoch 29, batch 1750, loss[loss=0.186, simple_loss=0.2407, pruned_loss=0.04881, ctc_loss=0.1002, cr_loss=0.3405, over 34195.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2674, pruned_loss=0.05947, ctc_loss=0.1254, cr_loss=0.4003, over 6754134.06 frames. ], batch size: 78, lr: 4.13e-03, grad_scale: 32.0 2024-09-18 21:44:36,710 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.62 vs. limit=22.5 2024-09-18 21:44:56,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=514728.6666666667, ans=0.0 2024-09-18 21:45:09,467 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.049e+02 2.442e+02 2.772e+02 3.448e+02 6.012e+02, threshold=5.544e+02, percent-clipped=1.0 2024-09-18 21:45:11,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=514775.3333333333, ans=0.0 2024-09-18 21:45:23,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=514822.0, ans=0.125 2024-09-18 21:45:26,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=514822.0, ans=0.125 2024-09-18 21:45:31,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.97 vs. limit=12.0 2024-09-18 21:45:39,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=514868.6666666667, ans=0.125 2024-09-18 21:45:47,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=514868.6666666667, ans=0.125 2024-09-18 21:45:50,598 INFO [train.py:1198] (1/2) Epoch 29, batch 1800, loss[loss=0.2077, simple_loss=0.2698, pruned_loss=0.05383, ctc_loss=0.1129, cr_loss=0.3843, over 34696.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.2676, pruned_loss=0.05956, ctc_loss=0.1256, cr_loss=0.4005, over 6757576.04 frames. ], batch size: 97, lr: 4.13e-03, grad_scale: 32.0 2024-09-18 21:45:59,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.57 vs. limit=15.0 2024-09-18 21:46:04,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=514915.3333333333, ans=0.125 2024-09-18 21:46:13,548 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2024-09-18 21:46:14,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=514962.0, ans=0.0 2024-09-18 21:46:36,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2024-09-18 21:46:45,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=515055.3333333333, ans=0.0 2024-09-18 21:46:52,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=515055.3333333333, ans=0.125 2024-09-18 21:47:00,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=515102.0, ans=0.125 2024-09-18 21:47:08,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=515102.0, ans=0.0 2024-09-18 21:47:13,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=515148.6666666667, ans=0.125 2024-09-18 21:47:14,923 INFO [train.py:1198] (1/2) Epoch 29, batch 1850, loss[loss=0.2123, simple_loss=0.271, pruned_loss=0.05715, ctc_loss=0.1205, cr_loss=0.3805, over 34457.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.2675, pruned_loss=0.0596, ctc_loss=0.1256, cr_loss=0.4006, over 6764897.72 frames. ], batch size: 100, lr: 4.13e-03, grad_scale: 32.0 2024-09-18 21:47:16,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=515148.6666666667, ans=0.0 2024-09-18 21:47:20,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=515148.6666666667, ans=0.0 2024-09-18 21:47:41,970 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.45 vs. limit=10.0 2024-09-18 21:47:42,423 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.12 vs. limit=15.0 2024-09-18 21:47:56,382 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.707e+02 3.324e+02 4.464e+02 9.073e+02, threshold=6.649e+02, percent-clipped=9.0 2024-09-18 21:48:17,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=515288.6666666667, ans=0.125 2024-09-18 21:48:19,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=515335.3333333333, ans=0.125 2024-09-18 21:48:22,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=515335.3333333333, ans=0.0 2024-09-18 21:48:37,087 INFO [train.py:1198] (1/2) Epoch 29, batch 1900, loss[loss=0.2254, simple_loss=0.2833, pruned_loss=0.06249, ctc_loss=0.1302, cr_loss=0.4132, over 34364.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2686, pruned_loss=0.05999, ctc_loss=0.1263, cr_loss=0.4023, over 6774701.33 frames. ], batch size: 103, lr: 4.13e-03, grad_scale: 32.0 2024-09-18 21:49:07,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=515428.6666666667, ans=0.0 2024-09-18 21:49:57,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515568.6666666667, ans=0.1 2024-09-18 21:50:03,384 INFO [train.py:1198] (1/2) Epoch 29, batch 1950, loss[loss=0.2048, simple_loss=0.2543, pruned_loss=0.0576, ctc_loss=0.123, cr_loss=0.3875, over 34722.00 frames. ], tot_loss[loss=0.2154, simple_loss=0.2693, pruned_loss=0.06004, ctc_loss=0.1265, cr_loss=0.403, over 6792005.16 frames. ], batch size: 92, lr: 4.12e-03, grad_scale: 32.0 2024-09-18 21:50:20,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=515662.0, ans=0.0 2024-09-18 21:50:33,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=515662.0, ans=0.125 2024-09-18 21:50:45,854 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.418e+02 2.706e+02 3.071e+02 6.474e+02, threshold=5.412e+02, percent-clipped=0.0 2024-09-18 21:50:52,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=515755.3333333333, ans=0.125 2024-09-18 21:51:25,231 INFO [train.py:1198] (1/2) Epoch 29, batch 2000, loss[loss=0.1809, simple_loss=0.2327, pruned_loss=0.04765, ctc_loss=0.101, cr_loss=0.3392, over 34188.00 frames. ], tot_loss[loss=0.2156, simple_loss=0.2695, pruned_loss=0.06016, ctc_loss=0.1267, cr_loss=0.4028, over 6765773.40 frames. ], batch size: 78, lr: 4.12e-03, grad_scale: 32.0 2024-09-18 21:51:30,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=515848.6666666667, ans=0.125 2024-09-18 21:51:30,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=515848.6666666667, ans=0.2 2024-09-18 21:51:32,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.52 vs. limit=15.0 2024-09-18 21:51:35,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=515848.6666666667, ans=0.0 2024-09-18 21:51:40,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=515895.3333333333, ans=0.125 2024-09-18 21:52:13,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=515988.6666666667, ans=0.2 2024-09-18 21:52:16,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=515988.6666666667, ans=0.125 2024-09-18 21:52:19,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=515988.6666666667, ans=0.025 2024-09-18 21:52:23,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=515988.6666666667, ans=0.125 2024-09-18 21:52:49,908 INFO [train.py:1198] (1/2) Epoch 29, batch 2050, loss[loss=0.1885, simple_loss=0.2426, pruned_loss=0.04895, ctc_loss=0.108, cr_loss=0.3705, over 34476.00 frames. ], tot_loss[loss=0.2151, simple_loss=0.2687, pruned_loss=0.06005, ctc_loss=0.1263, cr_loss=0.4021, over 6756760.71 frames. ], batch size: 82, lr: 4.12e-03, grad_scale: 32.0 2024-09-18 21:53:00,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516082.0, ans=0.1 2024-09-18 21:53:02,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=516082.0, ans=0.2 2024-09-18 21:53:14,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=516128.6666666667, ans=0.125 2024-09-18 21:53:19,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=516128.6666666667, ans=0.125 2024-09-18 21:53:27,936 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=22.5 2024-09-18 21:53:34,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.64 vs. limit=15.0 2024-09-18 21:53:35,501 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.565e+02 2.860e+02 3.360e+02 8.567e+02, threshold=5.720e+02, percent-clipped=7.0 2024-09-18 21:54:05,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516268.6666666667, ans=0.1 2024-09-18 21:54:13,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=516315.3333333333, ans=0.125 2024-09-18 21:54:14,819 INFO [train.py:1198] (1/2) Epoch 29, batch 2100, loss[loss=0.2264, simple_loss=0.2821, pruned_loss=0.06352, ctc_loss=0.1317, cr_loss=0.4347, over 34537.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2681, pruned_loss=0.05983, ctc_loss=0.1258, cr_loss=0.4017, over 6771220.19 frames. ], batch size: 94, lr: 4.12e-03, grad_scale: 32.0 2024-09-18 21:54:18,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=516315.3333333333, ans=0.0 2024-09-18 21:54:18,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=516315.3333333333, ans=0.125 2024-09-18 21:54:36,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=516362.0, ans=0.125 2024-09-18 21:54:44,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=516362.0, ans=0.025 2024-09-18 21:54:49,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=516408.6666666667, ans=0.125 2024-09-18 21:55:01,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=516408.6666666667, ans=0.0 2024-09-18 21:55:07,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=516455.3333333333, ans=0.125 2024-09-18 21:55:33,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=516502.0, ans=0.2 2024-09-18 21:55:36,760 INFO [train.py:1198] (1/2) Epoch 29, batch 2150, loss[loss=0.2157, simple_loss=0.2663, pruned_loss=0.06144, ctc_loss=0.1294, cr_loss=0.4096, over 34333.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.2676, pruned_loss=0.05956, ctc_loss=0.1253, cr_loss=0.4006, over 6789630.01 frames. ], batch size: 91, lr: 4.12e-03, grad_scale: 32.0 2024-09-18 21:55:40,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=516548.6666666667, ans=0.0 2024-09-18 21:55:57,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=516595.3333333333, ans=0.125 2024-09-18 21:56:11,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=516642.0, ans=0.2 2024-09-18 21:56:18,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=516642.0, ans=0.0 2024-09-18 21:56:18,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=516642.0, ans=0.0 2024-09-18 21:56:19,863 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.154e+02 2.488e+02 2.874e+02 3.896e+02 7.684e+02, threshold=5.748e+02, percent-clipped=6.0 2024-09-18 21:56:26,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=516688.6666666667, ans=0.2 2024-09-18 21:57:01,713 INFO [train.py:1198] (1/2) Epoch 29, batch 2200, loss[loss=0.2165, simple_loss=0.2737, pruned_loss=0.05884, ctc_loss=0.1266, cr_loss=0.4083, over 34433.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.2673, pruned_loss=0.05939, ctc_loss=0.125, cr_loss=0.3999, over 6784926.84 frames. ], batch size: 100, lr: 4.12e-03, grad_scale: 32.0 2024-09-18 21:57:18,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=516828.6666666667, ans=0.0 2024-09-18 21:57:21,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=516828.6666666667, ans=0.125 2024-09-18 21:57:23,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=516828.6666666667, ans=0.125 2024-09-18 21:57:33,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.34 vs. limit=10.0 2024-09-18 21:57:34,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=516875.3333333333, ans=0.125 2024-09-18 21:57:48,586 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.93 vs. limit=15.0 2024-09-18 21:58:25,250 INFO [train.py:1198] (1/2) Epoch 29, batch 2250, loss[loss=0.2141, simple_loss=0.2705, pruned_loss=0.05825, ctc_loss=0.1242, cr_loss=0.411, over 34400.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.2676, pruned_loss=0.05957, ctc_loss=0.1254, cr_loss=0.4007, over 6782551.30 frames. ], batch size: 95, lr: 4.12e-03, grad_scale: 32.0 2024-09-18 21:58:52,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=517062.0, ans=0.025 2024-09-18 21:58:58,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=517108.6666666667, ans=10.0 2024-09-18 21:58:59,139 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2024-09-18 21:59:09,909 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.139e+02 2.508e+02 3.135e+02 4.179e+02 7.024e+02, threshold=6.270e+02, percent-clipped=1.0 2024-09-18 21:59:39,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=517202.0, ans=0.125 2024-09-18 21:59:47,900 INFO [train.py:1198] (1/2) Epoch 29, batch 2300, loss[loss=0.1858, simple_loss=0.2407, pruned_loss=0.04816, ctc_loss=0.1032, cr_loss=0.3486, over 34296.00 frames. ], tot_loss[loss=0.213, simple_loss=0.2667, pruned_loss=0.05917, ctc_loss=0.1248, cr_loss=0.3992, over 6766971.28 frames. ], batch size: 83, lr: 4.12e-03, grad_scale: 16.0 2024-09-18 21:59:49,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=517248.6666666667, ans=0.07 2024-09-18 22:00:09,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=517295.3333333333, ans=0.05 2024-09-18 22:00:24,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=517342.0, ans=0.0 2024-09-18 22:00:31,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=517342.0, ans=0.125 2024-09-18 22:00:33,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=517342.0, ans=0.1 2024-09-18 22:00:51,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=517388.6666666667, ans=0.125 2024-09-18 22:01:13,931 INFO [train.py:1198] (1/2) Epoch 29, batch 2350, loss[loss=0.22, simple_loss=0.279, pruned_loss=0.05985, ctc_loss=0.1256, cr_loss=0.4044, over 34679.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2672, pruned_loss=0.05922, ctc_loss=0.1249, cr_loss=0.3996, over 6773592.82 frames. ], batch size: 97, lr: 4.12e-03, grad_scale: 16.0 2024-09-18 22:01:15,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.54 vs. limit=6.0 2024-09-18 22:01:35,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=517528.6666666667, ans=0.125 2024-09-18 22:01:36,214 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.58 vs. limit=22.5 2024-09-18 22:01:40,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=517528.6666666667, ans=0.1 2024-09-18 22:01:58,168 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.999e+02 2.455e+02 2.822e+02 3.428e+02 5.176e+02, threshold=5.644e+02, percent-clipped=0.0 2024-09-18 22:02:21,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=517668.6666666667, ans=0.0 2024-09-18 22:02:28,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=517668.6666666667, ans=0.09899494936611666 2024-09-18 22:02:35,936 INFO [train.py:1198] (1/2) Epoch 29, batch 2400, loss[loss=0.2014, simple_loss=0.2539, pruned_loss=0.05516, ctc_loss=0.1186, cr_loss=0.3723, over 34601.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.2676, pruned_loss=0.05954, ctc_loss=0.1255, cr_loss=0.4008, over 6776902.16 frames. ], batch size: 89, lr: 4.12e-03, grad_scale: 32.0 2024-09-18 22:02:42,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=517715.3333333333, ans=0.125 2024-09-18 22:02:42,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=517715.3333333333, ans=0.125 2024-09-18 22:02:56,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=517762.0, ans=0.025 2024-09-18 22:03:09,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=15.0 2024-09-18 22:03:29,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=517855.3333333333, ans=0.0 2024-09-18 22:03:34,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=517855.3333333333, ans=0.125 2024-09-18 22:03:46,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2024-09-18 22:03:54,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=517902.0, ans=0.125 2024-09-18 22:03:58,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=517948.6666666667, ans=0.2 2024-09-18 22:03:58,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=517948.6666666667, ans=0.1 2024-09-18 22:03:59,417 INFO [train.py:1198] (1/2) Epoch 29, batch 2450, loss[loss=0.2178, simple_loss=0.2735, pruned_loss=0.0602, ctc_loss=0.1264, cr_loss=0.4104, over 34399.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2687, pruned_loss=0.05992, ctc_loss=0.1263, cr_loss=0.4023, over 6751299.73 frames. ], batch size: 95, lr: 4.12e-03, grad_scale: 32.0 2024-09-18 22:04:11,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=517948.6666666667, ans=0.0 2024-09-18 22:04:31,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=517995.3333333333, ans=0.125 2024-09-18 22:04:36,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.51 vs. limit=22.5 2024-09-18 22:04:42,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.25 vs. limit=15.0 2024-09-18 22:04:48,121 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.096e+02 2.442e+02 2.820e+02 3.521e+02 6.092e+02, threshold=5.641e+02, percent-clipped=1.0 2024-09-18 22:04:53,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=518088.6666666667, ans=0.125 2024-09-18 22:05:19,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=518135.3333333333, ans=0.2 2024-09-18 22:05:24,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=518182.0, ans=0.04949747468305833 2024-09-18 22:05:26,203 INFO [train.py:1198] (1/2) Epoch 29, batch 2500, loss[loss=0.218, simple_loss=0.274, pruned_loss=0.06029, ctc_loss=0.125, cr_loss=0.4095, over 34455.00 frames. ], tot_loss[loss=0.2148, simple_loss=0.2685, pruned_loss=0.05986, ctc_loss=0.1262, cr_loss=0.4022, over 6763025.99 frames. ], batch size: 100, lr: 4.11e-03, grad_scale: 32.0 2024-09-18 22:05:31,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=518182.0, ans=0.125 2024-09-18 22:05:44,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=518228.6666666667, ans=0.125 2024-09-18 22:05:56,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.05 vs. limit=6.0 2024-09-18 22:05:58,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.88 vs. limit=15.0 2024-09-18 22:06:07,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=518275.3333333333, ans=0.2 2024-09-18 22:06:24,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=518322.0, ans=0.125 2024-09-18 22:06:26,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=518322.0, ans=0.0 2024-09-18 22:06:44,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=518368.6666666667, ans=0.2 2024-09-18 22:06:49,301 INFO [train.py:1198] (1/2) Epoch 29, batch 2550, loss[loss=0.1697, simple_loss=0.2247, pruned_loss=0.04191, ctc_loss=0.0931, cr_loss=0.3077, over 34176.00 frames. ], tot_loss[loss=0.2146, simple_loss=0.2683, pruned_loss=0.05981, ctc_loss=0.1259, cr_loss=0.4017, over 6765006.89 frames. ], batch size: 78, lr: 4.11e-03, grad_scale: 32.0 2024-09-18 22:06:54,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=15.0 2024-09-18 22:06:56,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=518415.3333333333, ans=0.125 2024-09-18 22:07:10,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=518462.0, ans=0.1 2024-09-18 22:07:24,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.99 vs. limit=15.0 2024-09-18 22:07:33,422 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.013e+02 2.552e+02 3.016e+02 3.859e+02 7.609e+02, threshold=6.031e+02, percent-clipped=10.0 2024-09-18 22:07:59,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=518602.0, ans=0.2 2024-09-18 22:08:13,925 INFO [train.py:1198] (1/2) Epoch 29, batch 2600, loss[loss=0.222, simple_loss=0.2734, pruned_loss=0.06378, ctc_loss=0.1318, cr_loss=0.4163, over 34345.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.2688, pruned_loss=0.06008, ctc_loss=0.1266, cr_loss=0.403, over 6760552.00 frames. ], batch size: 91, lr: 4.11e-03, grad_scale: 32.0 2024-09-18 22:08:32,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=518695.3333333333, ans=0.125 2024-09-18 22:08:37,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=518695.3333333333, ans=22.5 2024-09-18 22:08:43,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=518695.3333333333, ans=0.125 2024-09-18 22:09:04,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=518788.6666666667, ans=0.125 2024-09-18 22:09:27,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=518835.3333333333, ans=0.125 2024-09-18 22:09:37,343 INFO [train.py:1198] (1/2) Epoch 29, batch 2650, loss[loss=0.227, simple_loss=0.2812, pruned_loss=0.06397, ctc_loss=0.1364, cr_loss=0.4387, over 34208.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.2688, pruned_loss=0.06011, ctc_loss=0.1266, cr_loss=0.4034, over 6769022.91 frames. ], batch size: 117, lr: 4.11e-03, grad_scale: 32.0 2024-09-18 22:09:41,639 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.13 vs. limit=6.0 2024-09-18 22:09:50,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=518882.0, ans=0.0 2024-09-18 22:10:07,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=518928.6666666667, ans=0.125 2024-09-18 22:10:09,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=518975.3333333333, ans=0.125 2024-09-18 22:10:21,976 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.124e+02 2.594e+02 3.072e+02 3.736e+02 6.319e+02, threshold=6.144e+02, percent-clipped=1.0 2024-09-18 22:10:51,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=519068.6666666667, ans=0.0 2024-09-18 22:10:59,718 INFO [train.py:1198] (1/2) Epoch 29, batch 2700, loss[loss=0.2189, simple_loss=0.277, pruned_loss=0.05949, ctc_loss=0.1278, cr_loss=0.4068, over 34628.00 frames. ], tot_loss[loss=0.2159, simple_loss=0.2694, pruned_loss=0.06038, ctc_loss=0.127, cr_loss=0.4044, over 6763703.68 frames. ], batch size: 102, lr: 4.11e-03, grad_scale: 32.0 2024-09-18 22:11:03,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.00 vs. limit=12.0 2024-09-18 22:11:23,766 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.49 vs. limit=22.5 2024-09-18 22:11:44,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=519208.6666666667, ans=0.0 2024-09-18 22:11:50,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=519255.3333333333, ans=0.125 2024-09-18 22:12:18,997 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.32 vs. limit=15.0 2024-09-18 22:12:22,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=15.0 2024-09-18 22:12:26,515 INFO [train.py:1198] (1/2) Epoch 29, batch 2750, loss[loss=0.2088, simple_loss=0.2608, pruned_loss=0.05806, ctc_loss=0.1265, cr_loss=0.3834, over 34634.00 frames. ], tot_loss[loss=0.2146, simple_loss=0.2682, pruned_loss=0.05984, ctc_loss=0.1259, cr_loss=0.4015, over 6761463.53 frames. ], batch size: 88, lr: 4.11e-03, grad_scale: 32.0 2024-09-18 22:12:49,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=519395.3333333333, ans=0.125 2024-09-18 22:13:06,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=519442.0, ans=0.125 2024-09-18 22:13:10,994 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.604e+02 3.030e+02 3.757e+02 6.406e+02, threshold=6.060e+02, percent-clipped=3.0 2024-09-18 22:13:28,132 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:13:46,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.45 vs. limit=5.0 2024-09-18 22:13:49,676 INFO [train.py:1198] (1/2) Epoch 29, batch 2800, loss[loss=0.2371, simple_loss=0.2834, pruned_loss=0.07247, ctc_loss=0.1486, cr_loss=0.4048, over 23681.00 frames. ], tot_loss[loss=0.2152, simple_loss=0.2686, pruned_loss=0.06015, ctc_loss=0.1265, cr_loss=0.4026, over 6740542.69 frames. ], batch size: 244, lr: 4.11e-03, grad_scale: 32.0 2024-09-18 22:14:15,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=15.0 2024-09-18 22:14:41,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=519722.0, ans=0.2 2024-09-18 22:14:46,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=519722.0, ans=0.125 2024-09-18 22:14:46,824 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.05 vs. limit=15.0 2024-09-18 22:14:46,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.09 vs. limit=22.5 2024-09-18 22:15:02,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=519768.6666666667, ans=0.125 2024-09-18 22:15:12,381 INFO [train.py:1198] (1/2) Epoch 29, batch 2850, loss[loss=0.2208, simple_loss=0.2745, pruned_loss=0.06226, ctc_loss=0.1303, cr_loss=0.413, over 34507.00 frames. ], tot_loss[loss=0.2156, simple_loss=0.2691, pruned_loss=0.0603, ctc_loss=0.1268, cr_loss=0.4034, over 6724796.54 frames. ], batch size: 90, lr: 4.11e-03, grad_scale: 16.0 2024-09-18 22:15:14,472 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:15:22,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=519815.3333333333, ans=0.0 2024-09-18 22:15:24,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=519815.3333333333, ans=0.2 2024-09-18 22:15:24,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=9.33 vs. limit=15.0 2024-09-18 22:16:02,834 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.066e+02 2.370e+02 2.832e+02 3.766e+02 6.403e+02, threshold=5.664e+02, percent-clipped=5.0 2024-09-18 22:16:38,909 INFO [train.py:1198] (1/2) Epoch 29, batch 2900, loss[loss=0.2112, simple_loss=0.2683, pruned_loss=0.05736, ctc_loss=0.1183, cr_loss=0.3925, over 34544.00 frames. ], tot_loss[loss=0.2163, simple_loss=0.2699, pruned_loss=0.06053, ctc_loss=0.1272, cr_loss=0.4043, over 6755206.54 frames. ], batch size: 94, lr: 4.11e-03, grad_scale: 16.0 2024-09-18 22:16:42,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=520048.6666666667, ans=0.0 2024-09-18 22:16:45,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=520048.6666666667, ans=0.2 2024-09-18 22:16:53,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2024-09-18 22:16:57,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=520095.3333333333, ans=0.125 2024-09-18 22:17:06,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=520095.3333333333, ans=10.0 2024-09-18 22:17:30,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=520188.6666666667, ans=0.125 2024-09-18 22:17:30,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2024-09-18 22:18:00,934 INFO [train.py:1198] (1/2) Epoch 29, batch 2950, loss[loss=0.2075, simple_loss=0.2574, pruned_loss=0.05855, ctc_loss=0.1213, cr_loss=0.4027, over 34636.00 frames. ], tot_loss[loss=0.2144, simple_loss=0.2681, pruned_loss=0.05974, ctc_loss=0.1257, cr_loss=0.4008, over 6749291.29 frames. ], batch size: 88, lr: 4.11e-03, grad_scale: 16.0 2024-09-18 22:18:02,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=520282.0, ans=0.125 2024-09-18 22:18:07,757 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:18:39,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=520375.3333333333, ans=0.125 2024-09-18 22:18:44,205 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:18:47,067 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.120e+02 2.448e+02 2.989e+02 3.804e+02 1.123e+03, threshold=5.978e+02, percent-clipped=3.0 2024-09-18 22:18:55,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=520422.0, ans=0.025 2024-09-18 22:18:56,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.88 vs. limit=15.0 2024-09-18 22:19:04,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=11.00 vs. limit=12.0 2024-09-18 22:19:08,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=520468.6666666667, ans=0.125 2024-09-18 22:19:15,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=520468.6666666667, ans=0.1 2024-09-18 22:19:21,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.19 vs. limit=15.0 2024-09-18 22:19:25,717 INFO [train.py:1198] (1/2) Epoch 29, batch 3000, loss[loss=0.2094, simple_loss=0.2665, pruned_loss=0.05663, ctc_loss=0.1191, cr_loss=0.3822, over 34544.00 frames. ], tot_loss[loss=0.214, simple_loss=0.2679, pruned_loss=0.05949, ctc_loss=0.1253, cr_loss=0.4001, over 6750868.64 frames. ], batch size: 94, lr: 4.10e-03, grad_scale: 16.0 2024-09-18 22:19:25,717 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 22:19:42,665 INFO [train.py:1230] (1/2) Epoch 29, validation: loss=0.1491, simple_loss=0.2439, pruned_loss=0.0231, ctc_loss=0.04038, cr_loss=1.974e-14, over 944034.00 frames. 2024-09-18 22:19:42,665 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 22:19:56,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=520515.3333333333, ans=0.0 2024-09-18 22:19:57,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=520562.0, ans=0.0 2024-09-18 22:20:09,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=520562.0, ans=0.125 2024-09-18 22:20:44,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2024-09-18 22:20:51,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=520702.0, ans=0.125 2024-09-18 22:21:04,161 INFO [train.py:1198] (1/2) Epoch 29, batch 3050, loss[loss=0.1977, simple_loss=0.2491, pruned_loss=0.05418, ctc_loss=0.1153, cr_loss=0.3703, over 34577.00 frames. ], tot_loss[loss=0.2149, simple_loss=0.2687, pruned_loss=0.05988, ctc_loss=0.1262, cr_loss=0.4022, over 6740825.18 frames. ], batch size: 89, lr: 4.10e-03, grad_scale: 16.0 2024-09-18 22:21:20,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=520795.3333333333, ans=0.125 2024-09-18 22:21:30,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-18 22:21:50,109 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.164e+02 2.468e+02 2.740e+02 3.366e+02 7.695e+02, threshold=5.479e+02, percent-clipped=3.0 2024-09-18 22:21:54,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.07 vs. limit=12.0 2024-09-18 22:22:25,406 INFO [train.py:1198] (1/2) Epoch 29, batch 3100, loss[loss=0.2182, simple_loss=0.2801, pruned_loss=0.05776, ctc_loss=0.1221, cr_loss=0.4074, over 34254.00 frames. ], tot_loss[loss=0.2147, simple_loss=0.2685, pruned_loss=0.05983, ctc_loss=0.1262, cr_loss=0.402, over 6740715.77 frames. ], batch size: 117, lr: 4.10e-03, grad_scale: 16.0 2024-09-18 22:22:34,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=22.5 2024-09-18 22:22:43,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=521028.6666666667, ans=0.0 2024-09-18 22:23:40,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=521168.6666666667, ans=0.125 2024-09-18 22:23:46,427 INFO [train.py:1198] (1/2) Epoch 29, batch 3150, loss[loss=0.2246, simple_loss=0.2822, pruned_loss=0.06241, ctc_loss=0.13, cr_loss=0.4052, over 33917.00 frames. ], tot_loss[loss=0.2146, simple_loss=0.2683, pruned_loss=0.05981, ctc_loss=0.1261, cr_loss=0.4015, over 6745746.76 frames. ], batch size: 122, lr: 4.10e-03, grad_scale: 16.0 2024-09-18 22:23:46,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=521215.3333333333, ans=0.1 2024-09-18 22:24:31,625 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.159e+02 2.490e+02 2.944e+02 3.881e+02 6.943e+02, threshold=5.889e+02, percent-clipped=4.0 2024-09-18 22:24:31,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=521308.6666666667, ans=0.0 2024-09-18 22:24:36,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=521355.3333333333, ans=0.0 2024-09-18 22:25:02,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=521402.0, ans=0.125 2024-09-18 22:25:10,482 INFO [train.py:1198] (1/2) Epoch 29, batch 3200, loss[loss=0.21, simple_loss=0.2608, pruned_loss=0.05942, ctc_loss=0.1245, cr_loss=0.385, over 34546.00 frames. ], tot_loss[loss=0.2141, simple_loss=0.2678, pruned_loss=0.05959, ctc_loss=0.1256, cr_loss=0.4006, over 6758485.15 frames. ], batch size: 94, lr: 4.10e-03, grad_scale: 32.0 2024-09-18 22:25:29,139 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.56 vs. limit=15.0 2024-09-18 22:25:39,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.02 vs. limit=15.0 2024-09-18 22:25:43,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=521542.0, ans=0.0 2024-09-18 22:25:48,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=521542.0, ans=0.125 2024-09-18 22:25:49,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=521542.0, ans=0.0 2024-09-18 22:26:02,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=521588.6666666667, ans=0.025 2024-09-18 22:26:07,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=521588.6666666667, ans=0.1 2024-09-18 22:26:11,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=521588.6666666667, ans=0.125 2024-09-18 22:26:20,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=521635.3333333333, ans=0.2 2024-09-18 22:26:29,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=521682.0, ans=0.125 2024-09-18 22:26:31,173 INFO [train.py:1198] (1/2) Epoch 29, batch 3250, loss[loss=0.2251, simple_loss=0.2754, pruned_loss=0.06557, ctc_loss=0.1347, cr_loss=0.4167, over 34686.00 frames. ], tot_loss[loss=0.2146, simple_loss=0.2685, pruned_loss=0.05978, ctc_loss=0.1259, cr_loss=0.4013, over 6767644.86 frames. ], batch size: 98, lr: 4.10e-03, grad_scale: 32.0 2024-09-18 22:26:43,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=521682.0, ans=0.025 2024-09-18 22:27:02,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=521775.3333333333, ans=0.2 2024-09-18 22:27:13,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=521775.3333333333, ans=0.04949747468305833 2024-09-18 22:27:16,587 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.057e+02 2.443e+02 2.826e+02 3.465e+02 6.870e+02, threshold=5.651e+02, percent-clipped=2.0 2024-09-18 22:27:39,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=521868.6666666667, ans=0.0 2024-09-18 22:27:44,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.90 vs. limit=15.0 2024-09-18 22:27:52,146 INFO [train.py:1198] (1/2) Epoch 29, batch 3300, loss[loss=0.2051, simple_loss=0.2656, pruned_loss=0.05283, ctc_loss=0.1181, cr_loss=0.3819, over 32970.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.2674, pruned_loss=0.05936, ctc_loss=0.1253, cr_loss=0.3997, over 6766274.87 frames. ], batch size: 130, lr: 4.10e-03, grad_scale: 32.0 2024-09-18 22:27:54,188 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:27:54,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=521915.3333333333, ans=0.125 2024-09-18 22:27:59,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.39 vs. limit=15.0 2024-09-18 22:28:02,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=521915.3333333333, ans=0.125 2024-09-18 22:28:03,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=521915.3333333333, ans=0.125 2024-09-18 22:28:05,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=521915.3333333333, ans=0.09899494936611666 2024-09-18 22:28:09,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.65 vs. limit=15.0 2024-09-18 22:28:20,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=521962.0, ans=0.04949747468305833 2024-09-18 22:28:36,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=522008.6666666667, ans=0.025 2024-09-18 22:28:39,750 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.68 vs. limit=12.0 2024-09-18 22:29:00,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=522102.0, ans=0.0 2024-09-18 22:29:13,119 INFO [train.py:1198] (1/2) Epoch 29, batch 3350, loss[loss=0.2214, simple_loss=0.2747, pruned_loss=0.06198, ctc_loss=0.1333, cr_loss=0.4371, over 33896.00 frames. ], tot_loss[loss=0.2141, simple_loss=0.2679, pruned_loss=0.05954, ctc_loss=0.1257, cr_loss=0.4005, over 6741859.41 frames. ], batch size: 122, lr: 4.10e-03, grad_scale: 32.0 2024-09-18 22:29:18,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=522148.6666666667, ans=0.125 2024-09-18 22:29:34,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=522195.3333333333, ans=0.1 2024-09-18 22:29:37,565 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:29:46,494 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.51 vs. limit=10.0 2024-09-18 22:30:00,777 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.041e+02 2.386e+02 2.625e+02 3.272e+02 5.906e+02, threshold=5.249e+02, percent-clipped=1.0 2024-09-18 22:30:04,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=522288.6666666667, ans=0.125 2024-09-18 22:30:09,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=522288.6666666667, ans=0.1 2024-09-18 22:30:12,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=522288.6666666667, ans=0.05 2024-09-18 22:30:15,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=522288.6666666667, ans=0.5 2024-09-18 22:30:23,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=522335.3333333333, ans=0.025 2024-09-18 22:30:36,232 INFO [train.py:1198] (1/2) Epoch 29, batch 3400, loss[loss=0.1899, simple_loss=0.2424, pruned_loss=0.05027, ctc_loss=0.111, cr_loss=0.3696, over 34128.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2682, pruned_loss=0.05975, ctc_loss=0.1261, cr_loss=0.4018, over 6732547.05 frames. ], batch size: 78, lr: 4.10e-03, grad_scale: 16.0 2024-09-18 22:30:46,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=522382.0, ans=0.125 2024-09-18 22:30:52,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=522428.6666666667, ans=0.125 2024-09-18 22:31:10,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=522475.3333333333, ans=0.2 2024-09-18 22:31:37,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=522522.0, ans=0.125 2024-09-18 22:31:56,814 INFO [train.py:1198] (1/2) Epoch 29, batch 3450, loss[loss=0.2124, simple_loss=0.2758, pruned_loss=0.05485, ctc_loss=0.118, cr_loss=0.3935, over 32994.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2683, pruned_loss=0.05972, ctc_loss=0.1261, cr_loss=0.402, over 6744894.82 frames. ], batch size: 130, lr: 4.10e-03, grad_scale: 16.0 2024-09-18 22:32:06,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=522615.3333333333, ans=0.2 2024-09-18 22:32:28,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=522662.0, ans=10.0 2024-09-18 22:32:40,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.67 vs. limit=15.0 2024-09-18 22:32:49,469 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.048e+02 2.461e+02 2.820e+02 3.464e+02 6.090e+02, threshold=5.639e+02, percent-clipped=2.0 2024-09-18 22:33:22,966 INFO [train.py:1198] (1/2) Epoch 29, batch 3500, loss[loss=0.1974, simple_loss=0.2502, pruned_loss=0.05373, ctc_loss=0.1114, cr_loss=0.3717, over 34471.00 frames. ], tot_loss[loss=0.2138, simple_loss=0.2676, pruned_loss=0.05946, ctc_loss=0.1254, cr_loss=0.4007, over 6746733.73 frames. ], batch size: 85, lr: 4.10e-03, grad_scale: 16.0 2024-09-18 22:33:34,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=522848.6666666667, ans=0.125 2024-09-18 22:33:50,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=522895.3333333333, ans=0.0 2024-09-18 22:33:55,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=522942.0, ans=0.2 2024-09-18 22:34:11,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=522988.6666666667, ans=0.0 2024-09-18 22:34:17,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=522988.6666666667, ans=0.125 2024-09-18 22:34:18,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.60 vs. limit=6.0 2024-09-18 22:34:26,883 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:34:35,174 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=22.5 2024-09-18 22:34:44,131 INFO [train.py:1198] (1/2) Epoch 29, batch 3550, loss[loss=0.2163, simple_loss=0.2693, pruned_loss=0.06072, ctc_loss=0.127, cr_loss=0.4088, over 34366.00 frames. ], tot_loss[loss=0.214, simple_loss=0.2678, pruned_loss=0.05957, ctc_loss=0.1257, cr_loss=0.4012, over 6756688.52 frames. ], batch size: 103, lr: 4.09e-03, grad_scale: 16.0 2024-09-18 22:35:06,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=22.5 2024-09-18 22:35:10,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=523128.6666666667, ans=0.04949747468305833 2024-09-18 22:35:17,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=523175.3333333333, ans=0.0 2024-09-18 22:35:20,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=523175.3333333333, ans=0.125 2024-09-18 22:35:31,411 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.077e+02 2.407e+02 2.945e+02 3.789e+02 5.824e+02, threshold=5.890e+02, percent-clipped=1.0 2024-09-18 22:35:36,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=523222.0, ans=0.2 2024-09-18 22:35:54,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=523268.6666666667, ans=0.0 2024-09-18 22:36:05,058 INFO [train.py:1198] (1/2) Epoch 29, batch 3600, loss[loss=0.2006, simple_loss=0.2592, pruned_loss=0.05268, ctc_loss=0.1111, cr_loss=0.3631, over 34483.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2682, pruned_loss=0.05973, ctc_loss=0.1261, cr_loss=0.4024, over 6767115.09 frames. ], batch size: 90, lr: 4.09e-03, grad_scale: 32.0 2024-09-18 22:36:40,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.71 vs. limit=22.5 2024-09-18 22:36:48,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=523408.6666666667, ans=0.125 2024-09-18 22:37:09,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=523502.0, ans=0.0 2024-09-18 22:37:14,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=15.0 2024-09-18 22:37:25,123 INFO [train.py:1198] (1/2) Epoch 29, batch 3650, loss[loss=0.2264, simple_loss=0.2806, pruned_loss=0.06397, ctc_loss=0.1363, cr_loss=0.4268, over 34483.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.2673, pruned_loss=0.05933, ctc_loss=0.1254, cr_loss=0.4011, over 6769850.76 frames. ], batch size: 110, lr: 4.09e-03, grad_scale: 32.0 2024-09-18 22:37:26,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2024-09-18 22:37:30,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2024-09-18 22:37:31,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=523548.6666666667, ans=0.125 2024-09-18 22:37:39,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=523595.3333333333, ans=0.2 2024-09-18 22:38:06,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=523642.0, ans=0.2 2024-09-18 22:38:12,657 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.130e+02 2.467e+02 3.128e+02 3.955e+02 7.977e+02, threshold=6.255e+02, percent-clipped=7.0 2024-09-18 22:38:19,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=523688.6666666667, ans=0.2 2024-09-18 22:38:19,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.49 vs. limit=6.0 2024-09-18 22:38:22,734 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.65 vs. limit=15.0 2024-09-18 22:38:29,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=523735.3333333333, ans=0.125 2024-09-18 22:38:46,968 INFO [train.py:1198] (1/2) Epoch 29, batch 3700, loss[loss=0.2116, simple_loss=0.2719, pruned_loss=0.05578, ctc_loss=0.1188, cr_loss=0.4001, over 34610.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2674, pruned_loss=0.05909, ctc_loss=0.125, cr_loss=0.4003, over 6784042.71 frames. ], batch size: 102, lr: 4.09e-03, grad_scale: 32.0 2024-09-18 22:38:52,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=523782.0, ans=0.125 2024-09-18 22:38:53,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=523782.0, ans=0.125 2024-09-18 22:39:06,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=523828.6666666667, ans=0.025 2024-09-18 22:39:24,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=523875.3333333333, ans=0.2 2024-09-18 22:39:42,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=523922.0, ans=0.125 2024-09-18 22:40:08,305 INFO [train.py:1198] (1/2) Epoch 29, batch 3750, loss[loss=0.2275, simple_loss=0.2791, pruned_loss=0.06564, ctc_loss=0.1355, cr_loss=0.4368, over 34394.00 frames. ], tot_loss[loss=0.2163, simple_loss=0.2704, pruned_loss=0.06022, ctc_loss=0.1271, cr_loss=0.4054, over 6785350.85 frames. ], batch size: 113, lr: 4.09e-03, grad_scale: 32.0 2024-09-18 22:40:27,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=524062.0, ans=0.0 2024-09-18 22:40:55,207 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.157e+02 2.302e+02 2.460e+02 2.863e+02 5.488e+02, threshold=4.921e+02, percent-clipped=0.0 2024-09-18 22:41:26,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524202.0, ans=0.1 2024-09-18 22:41:29,485 INFO [train.py:1198] (1/2) Epoch 29, batch 3800, loss[loss=0.2415, simple_loss=0.2834, pruned_loss=0.07587, ctc_loss=0.1541, cr_loss=0.4272, over 30090.00 frames. ], tot_loss[loss=0.2193, simple_loss=0.2729, pruned_loss=0.06167, ctc_loss=0.1299, cr_loss=0.41, over 6674442.48 frames. ], batch size: 175, lr: 4.09e-03, grad_scale: 32.0 2024-09-18 22:41:45,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524295.3333333334, ans=0.1 2024-09-18 22:42:21,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=524388.6666666666, ans=0.1 2024-09-18 22:42:24,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=524388.6666666666, ans=0.04949747468305833 2024-09-18 22:42:31,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=524388.6666666666, ans=0.0 2024-09-18 22:42:47,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=524435.3333333334, ans=0.125 2024-09-18 22:42:50,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=524435.3333333334, ans=0.125 2024-09-18 22:42:53,350 INFO [train.py:1198] (1/2) Epoch 29, batch 3850, loss[loss=0.2353, simple_loss=0.2843, pruned_loss=0.06985, ctc_loss=0.15, cr_loss=0.4145, over 23782.00 frames. ], tot_loss[loss=0.2232, simple_loss=0.2754, pruned_loss=0.06378, ctc_loss=0.1342, cr_loss=0.4146, over 6250505.27 frames. ], batch size: 244, lr: 4.09e-03, grad_scale: 16.0 2024-09-18 22:43:01,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=524482.0, ans=0.125 2024-09-18 22:43:23,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=524528.6666666666, ans=0.0 2024-09-18 22:44:09,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=524608.0, ans=0.1 2024-09-18 22:44:27,134 INFO [train.py:1198] (1/2) Epoch 30, batch 0, loss[loss=0.2048, simple_loss=0.2553, pruned_loss=0.05715, ctc_loss=0.1213, cr_loss=0.3926, over 34495.00 frames. ], tot_loss[loss=0.2048, simple_loss=0.2553, pruned_loss=0.05715, ctc_loss=0.1213, cr_loss=0.3926, over 34495.00 frames. ], batch size: 85, lr: 4.02e-03, grad_scale: 32.0 2024-09-18 22:44:27,135 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-18 22:44:44,024 INFO [train.py:1230] (1/2) Epoch 30, validation: loss=0.1489, simple_loss=0.2447, pruned_loss=0.02255, ctc_loss=0.03969, cr_loss=2.074e-14, over 944034.00 frames. 2024-09-18 22:44:44,024 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-18 22:44:48,796 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 2.595e+02 2.784e+02 3.100e+02 8.280e+02, threshold=5.567e+02, percent-clipped=1.0 2024-09-18 22:45:08,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2024-09-18 22:45:18,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=524701.3333333334, ans=0.0 2024-09-18 22:46:00,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=524794.6666666666, ans=0.0 2024-09-18 22:46:00,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524794.6666666666, ans=0.1 2024-09-18 22:46:03,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=524794.6666666666, ans=0.125 2024-09-18 22:46:08,165 INFO [train.py:1198] (1/2) Epoch 30, batch 50, loss[loss=0.1876, simple_loss=0.2414, pruned_loss=0.0489, ctc_loss=0.1076, cr_loss=0.361, over 34478.00 frames. ], tot_loss[loss=0.2155, simple_loss=0.2687, pruned_loss=0.06034, ctc_loss=0.127, cr_loss=0.4042, over 1480654.97 frames. ], batch size: 82, lr: 4.02e-03, grad_scale: 32.0 2024-09-18 22:46:11,879 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 22:46:16,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=524841.3333333334, ans=0.125 2024-09-18 22:46:41,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=524934.6666666666, ans=0.0 2024-09-18 22:47:08,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=524981.3333333334, ans=0.025 2024-09-18 22:47:09,291 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.17 vs. limit=10.0 2024-09-18 22:47:11,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=524981.3333333334, ans=0.125 2024-09-18 22:47:15,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=525028.0, ans=0.1 2024-09-18 22:47:32,982 INFO [train.py:1198] (1/2) Epoch 30, batch 100, loss[loss=0.1975, simple_loss=0.2521, pruned_loss=0.05293, ctc_loss=0.1156, cr_loss=0.3498, over 34583.00 frames. ], tot_loss[loss=0.2164, simple_loss=0.2701, pruned_loss=0.06049, ctc_loss=0.1273, cr_loss=0.4053, over 2629477.99 frames. ], batch size: 89, lr: 4.02e-03, grad_scale: 32.0 2024-09-18 22:47:37,853 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.068e+02 2.423e+02 2.864e+02 3.480e+02 7.436e+02, threshold=5.728e+02, percent-clipped=3.0 2024-09-18 22:47:41,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=525074.6666666666, ans=0.125 2024-09-18 22:47:45,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.77 vs. limit=15.0 2024-09-18 22:47:56,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=525121.3333333334, ans=0.125 2024-09-18 22:47:59,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=525121.3333333334, ans=0.2 2024-09-18 22:48:09,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=525168.0, ans=0.1 2024-09-18 22:48:15,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=525168.0, ans=0.1 2024-09-18 22:48:15,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=525168.0, ans=0.125 2024-09-18 22:48:23,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=525214.6666666666, ans=0.125 2024-09-18 22:48:23,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=525214.6666666666, ans=0.125 2024-09-18 22:48:56,447 INFO [train.py:1198] (1/2) Epoch 30, batch 150, loss[loss=0.1968, simple_loss=0.2491, pruned_loss=0.05322, ctc_loss=0.1132, cr_loss=0.3846, over 34482.00 frames. ], tot_loss[loss=0.2148, simple_loss=0.2687, pruned_loss=0.05976, ctc_loss=0.126, cr_loss=0.4024, over 3556560.20 frames. ], batch size: 82, lr: 4.02e-03, grad_scale: 32.0 2024-09-18 22:49:01,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=525308.0, ans=0.0 2024-09-18 22:49:43,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.16 vs. limit=15.0 2024-09-18 22:49:44,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=525448.0, ans=0.0 2024-09-18 22:49:56,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.41 vs. limit=10.0 2024-09-18 22:50:09,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=525494.6666666666, ans=0.125 2024-09-18 22:50:11,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=525494.6666666666, ans=0.025 2024-09-18 22:50:18,879 INFO [train.py:1198] (1/2) Epoch 30, batch 200, loss[loss=0.2408, simple_loss=0.2915, pruned_loss=0.07141, ctc_loss=0.1483, cr_loss=0.4414, over 31850.00 frames. ], tot_loss[loss=0.214, simple_loss=0.2679, pruned_loss=0.05948, ctc_loss=0.1256, cr_loss=0.4024, over 4271309.67 frames. ], batch size: 146, lr: 4.02e-03, grad_scale: 32.0 2024-09-18 22:50:23,805 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.925e+02 2.432e+02 2.999e+02 4.131e+02 7.285e+02, threshold=5.997e+02, percent-clipped=7.0 2024-09-18 22:50:34,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2024-09-18 22:50:38,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=525588.0, ans=0.2 2024-09-18 22:51:04,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=525634.6666666666, ans=0.2 2024-09-18 22:51:16,649 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2024-09-18 22:51:17,926 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2024-09-18 22:51:24,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=525681.3333333334, ans=0.0 2024-09-18 22:51:29,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=525728.0, ans=0.125 2024-09-18 22:51:43,756 INFO [train.py:1198] (1/2) Epoch 30, batch 250, loss[loss=0.2261, simple_loss=0.281, pruned_loss=0.06378, ctc_loss=0.1332, cr_loss=0.4243, over 34281.00 frames. ], tot_loss[loss=0.2138, simple_loss=0.2676, pruned_loss=0.05938, ctc_loss=0.1253, cr_loss=0.4016, over 4833534.87 frames. ], batch size: 117, lr: 4.01e-03, grad_scale: 32.0 2024-09-18 22:51:57,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=525774.6666666666, ans=0.0 2024-09-18 22:52:18,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=525868.0, ans=0.125 2024-09-18 22:52:40,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.27 vs. limit=15.0 2024-09-18 22:52:46,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=525914.6666666666, ans=0.1 2024-09-18 22:52:48,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=525914.6666666666, ans=0.125 2024-09-18 22:52:51,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=525961.3333333334, ans=0.025 2024-09-18 22:53:03,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=525961.3333333334, ans=0.0 2024-09-18 22:53:07,844 INFO [train.py:1198] (1/2) Epoch 30, batch 300, loss[loss=0.2352, simple_loss=0.2901, pruned_loss=0.06702, ctc_loss=0.1409, cr_loss=0.4534, over 34339.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.2674, pruned_loss=0.05927, ctc_loss=0.1251, cr_loss=0.4012, over 5262750.15 frames. ], batch size: 107, lr: 4.01e-03, grad_scale: 32.0 2024-09-18 22:53:12,906 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.136e+02 2.467e+02 3.024e+02 3.784e+02 9.240e+02, threshold=6.047e+02, percent-clipped=5.0 2024-09-18 22:53:19,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=526008.0, ans=0.125 2024-09-18 22:53:23,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.89 vs. limit=10.0 2024-09-18 22:53:34,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526054.6666666666, ans=0.1 2024-09-18 22:53:54,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526101.3333333334, ans=0.1 2024-09-18 22:53:55,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=526148.0, ans=0.125 2024-09-18 22:54:01,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=15.0 2024-09-18 22:54:27,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=526194.6666666666, ans=0.0 2024-09-18 22:54:30,342 INFO [train.py:1198] (1/2) Epoch 30, batch 350, loss[loss=0.1888, simple_loss=0.2467, pruned_loss=0.04858, ctc_loss=0.1013, cr_loss=0.3384, over 34298.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.2677, pruned_loss=0.05945, ctc_loss=0.1253, cr_loss=0.4016, over 5597376.46 frames. ], batch size: 83, lr: 4.01e-03, grad_scale: 16.0 2024-09-18 22:55:03,202 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.11 vs. limit=15.0 2024-09-18 22:55:08,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=526334.6666666666, ans=0.0 2024-09-18 22:55:12,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.13 vs. limit=12.0 2024-09-18 22:55:20,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=526381.3333333334, ans=0.1 2024-09-18 22:55:24,275 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.15 vs. limit=10.0 2024-09-18 22:55:55,260 INFO [train.py:1198] (1/2) Epoch 30, batch 400, loss[loss=0.1976, simple_loss=0.2538, pruned_loss=0.0523, ctc_loss=0.111, cr_loss=0.3641, over 34444.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.2676, pruned_loss=0.05929, ctc_loss=0.1251, cr_loss=0.401, over 5864207.23 frames. ], batch size: 95, lr: 4.01e-03, grad_scale: 32.0 2024-09-18 22:56:01,908 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.830e+02 2.432e+02 2.757e+02 3.427e+02 1.578e+03, threshold=5.515e+02, percent-clipped=1.0 2024-09-18 22:56:06,325 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.00 vs. limit=6.0 2024-09-18 22:56:28,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=526568.0, ans=0.0 2024-09-18 22:56:37,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.76 vs. limit=15.0 2024-09-18 22:56:40,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=526568.0, ans=0.04949747468305833 2024-09-18 22:56:42,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=526568.0, ans=0.07 2024-09-18 22:56:44,474 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.73 vs. limit=22.5 2024-09-18 22:56:45,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=526614.6666666666, ans=0.05 2024-09-18 22:56:58,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=526614.6666666666, ans=0.125 2024-09-18 22:57:10,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=526661.3333333334, ans=0.0 2024-09-18 22:57:20,113 INFO [train.py:1198] (1/2) Epoch 30, batch 450, loss[loss=0.2067, simple_loss=0.2656, pruned_loss=0.05471, ctc_loss=0.1181, cr_loss=0.369, over 34715.00 frames. ], tot_loss[loss=0.2141, simple_loss=0.268, pruned_loss=0.05951, ctc_loss=0.1255, cr_loss=0.4019, over 6054311.75 frames. ], batch size: 97, lr: 4.01e-03, grad_scale: 32.0 2024-09-18 22:57:35,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=526754.6666666666, ans=0.125 2024-09-18 22:57:38,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=526754.6666666666, ans=0.0 2024-09-18 22:57:53,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=526801.3333333334, ans=0.025 2024-09-18 22:58:03,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=526801.3333333334, ans=0.1 2024-09-18 22:58:07,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=526801.3333333334, ans=0.125 2024-09-18 22:58:37,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.89 vs. limit=15.0 2024-09-18 22:58:45,059 INFO [train.py:1198] (1/2) Epoch 30, batch 500, loss[loss=0.2416, simple_loss=0.29, pruned_loss=0.07234, ctc_loss=0.1489, cr_loss=0.4667, over 34455.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.267, pruned_loss=0.0592, ctc_loss=0.1248, cr_loss=0.4, over 6221772.25 frames. ], batch size: 110, lr: 4.01e-03, grad_scale: 32.0 2024-09-18 22:58:51,608 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.518e+02 2.869e+02 3.499e+02 6.864e+02, threshold=5.738e+02, percent-clipped=3.0 2024-09-18 22:59:00,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=526988.0, ans=0.5 2024-09-18 22:59:39,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=527081.3333333334, ans=0.125 2024-09-18 23:00:03,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=527128.0, ans=0.125 2024-09-18 23:00:09,414 INFO [train.py:1198] (1/2) Epoch 30, batch 550, loss[loss=0.2203, simple_loss=0.2819, pruned_loss=0.05799, ctc_loss=0.1301, cr_loss=0.4162, over 33932.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.267, pruned_loss=0.05925, ctc_loss=0.125, cr_loss=0.4003, over 6330203.62 frames. ], batch size: 122, lr: 4.01e-03, grad_scale: 32.0 2024-09-18 23:00:11,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=527174.6666666666, ans=0.2 2024-09-18 23:00:25,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.87 vs. limit=22.5 2024-09-18 23:00:26,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=527221.3333333334, ans=0.2 2024-09-18 23:00:32,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=527221.3333333334, ans=0.125 2024-09-18 23:00:41,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.36 vs. limit=22.5 2024-09-18 23:00:42,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=527268.0, ans=0.0 2024-09-18 23:00:43,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2024-09-18 23:00:45,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=527268.0, ans=0.0 2024-09-18 23:00:47,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=527268.0, ans=0.0 2024-09-18 23:00:50,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.10 vs. limit=22.5 2024-09-18 23:00:51,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.13 vs. limit=22.5 2024-09-18 23:01:19,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527361.3333333334, ans=0.1 2024-09-18 23:01:27,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=527361.3333333334, ans=0.125 2024-09-18 23:01:32,119 INFO [train.py:1198] (1/2) Epoch 30, batch 600, loss[loss=0.2157, simple_loss=0.2732, pruned_loss=0.05857, ctc_loss=0.1237, cr_loss=0.4078, over 34229.00 frames. ], tot_loss[loss=0.2131, simple_loss=0.267, pruned_loss=0.05913, ctc_loss=0.1248, cr_loss=0.4002, over 6431610.49 frames. ], batch size: 117, lr: 4.01e-03, grad_scale: 32.0 2024-09-18 23:01:35,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=527408.0, ans=0.125 2024-09-18 23:01:38,746 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.097e+02 2.381e+02 2.738e+02 3.508e+02 6.397e+02, threshold=5.475e+02, percent-clipped=2.0 2024-09-18 23:01:53,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=527454.6666666666, ans=0.125 2024-09-18 23:02:13,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=527501.3333333334, ans=0.0 2024-09-18 23:02:22,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=527548.0, ans=0.125 2024-09-18 23:02:53,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=527594.6666666666, ans=0.05 2024-09-18 23:02:56,333 INFO [train.py:1198] (1/2) Epoch 30, batch 650, loss[loss=0.2187, simple_loss=0.2721, pruned_loss=0.06095, ctc_loss=0.1312, cr_loss=0.4275, over 34546.00 frames. ], tot_loss[loss=0.2123, simple_loss=0.2663, pruned_loss=0.05876, ctc_loss=0.1241, cr_loss=0.3982, over 6522962.68 frames. ], batch size: 94, lr: 4.01e-03, grad_scale: 32.0 2024-09-18 23:02:59,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=527641.3333333334, ans=0.1 2024-09-18 23:03:25,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2024-09-18 23:04:05,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=527828.0, ans=0.0 2024-09-18 23:04:07,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=15.0 2024-09-18 23:04:20,159 INFO [train.py:1198] (1/2) Epoch 30, batch 700, loss[loss=0.2069, simple_loss=0.2495, pruned_loss=0.06152, ctc_loss=0.1242, cr_loss=0.4102, over 34591.00 frames. ], tot_loss[loss=0.2127, simple_loss=0.2668, pruned_loss=0.05888, ctc_loss=0.1244, cr_loss=0.3991, over 6579057.48 frames. ], batch size: 89, lr: 4.01e-03, grad_scale: 32.0 2024-09-18 23:04:26,750 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.441e+02 2.983e+02 3.747e+02 7.179e+02, threshold=5.966e+02, percent-clipped=6.0 2024-09-18 23:04:27,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=527874.6666666666, ans=0.0 2024-09-18 23:04:36,152 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.17 vs. limit=22.5 2024-09-18 23:04:57,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=527968.0, ans=0.125 2024-09-18 23:05:11,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=528014.6666666666, ans=0.0 2024-09-18 23:05:37,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.65 vs. limit=15.0 2024-09-18 23:05:39,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=528061.3333333334, ans=0.125 2024-09-18 23:05:42,863 INFO [train.py:1198] (1/2) Epoch 30, batch 750, loss[loss=0.2154, simple_loss=0.2711, pruned_loss=0.0592, ctc_loss=0.1251, cr_loss=0.4084, over 34395.00 frames. ], tot_loss[loss=0.2123, simple_loss=0.2664, pruned_loss=0.05876, ctc_loss=0.1241, cr_loss=0.398, over 6622411.91 frames. ], batch size: 95, lr: 4.01e-03, grad_scale: 32.0 2024-09-18 23:06:06,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=528154.6666666666, ans=0.2 2024-09-18 23:06:12,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=528154.6666666666, ans=0.0 2024-09-18 23:06:26,921 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.06 vs. limit=15.0 2024-09-18 23:06:45,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=528248.0, ans=0.2 2024-09-18 23:07:03,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=528294.6666666666, ans=0.125 2024-09-18 23:07:08,023 INFO [train.py:1198] (1/2) Epoch 30, batch 800, loss[loss=0.1819, simple_loss=0.2396, pruned_loss=0.0454, ctc_loss=0.1003, cr_loss=0.332, over 34485.00 frames. ], tot_loss[loss=0.2123, simple_loss=0.2662, pruned_loss=0.05877, ctc_loss=0.1241, cr_loss=0.398, over 6659585.08 frames. ], batch size: 85, lr: 4.00e-03, grad_scale: 32.0 2024-09-18 23:07:13,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=528341.3333333334, ans=0.125 2024-09-18 23:07:14,559 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.433e+02 2.838e+02 3.830e+02 5.565e+02, threshold=5.676e+02, percent-clipped=0.0 2024-09-18 23:08:04,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=528481.3333333334, ans=0.1 2024-09-18 23:08:23,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=528528.0, ans=0.05 2024-09-18 23:08:31,729 INFO [train.py:1198] (1/2) Epoch 30, batch 850, loss[loss=0.2254, simple_loss=0.2854, pruned_loss=0.06123, ctc_loss=0.1301, cr_loss=0.4217, over 34377.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2661, pruned_loss=0.0586, ctc_loss=0.124, cr_loss=0.3979, over 6692372.32 frames. ], batch size: 103, lr: 4.00e-03, grad_scale: 32.0 2024-09-18 23:08:49,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=528621.3333333334, ans=0.125 2024-09-18 23:09:01,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=528621.3333333334, ans=0.125 2024-09-18 23:09:08,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=528668.0, ans=0.0 2024-09-18 23:09:54,124 INFO [train.py:1198] (1/2) Epoch 30, batch 900, loss[loss=0.1962, simple_loss=0.2467, pruned_loss=0.05336, ctc_loss=0.1159, cr_loss=0.3976, over 34470.00 frames. ], tot_loss[loss=0.2126, simple_loss=0.2666, pruned_loss=0.05883, ctc_loss=0.1244, cr_loss=0.3987, over 6697748.38 frames. ], batch size: 85, lr: 4.00e-03, grad_scale: 32.0 2024-09-18 23:10:00,624 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.086e+02 2.471e+02 2.837e+02 3.561e+02 5.723e+02, threshold=5.674e+02, percent-clipped=1.0 2024-09-18 23:10:06,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=528808.0, ans=0.1 2024-09-18 23:10:18,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=12.0 2024-09-18 23:10:21,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=528854.6666666666, ans=0.0 2024-09-18 23:10:54,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=528948.0, ans=0.125 2024-09-18 23:10:57,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=528948.0, ans=0.015 2024-09-18 23:11:02,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=528994.6666666666, ans=0.07 2024-09-18 23:11:20,393 INFO [train.py:1198] (1/2) Epoch 30, batch 950, loss[loss=0.1943, simple_loss=0.2499, pruned_loss=0.05104, ctc_loss=0.1099, cr_loss=0.366, over 34688.00 frames. ], tot_loss[loss=0.2128, simple_loss=0.2669, pruned_loss=0.05897, ctc_loss=0.1245, cr_loss=0.3987, over 6700549.83 frames. ], batch size: 87, lr: 4.00e-03, grad_scale: 32.0 2024-09-18 23:11:23,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=529041.3333333334, ans=0.125 2024-09-18 23:11:38,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=529088.0, ans=0.125 2024-09-18 23:11:55,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=529134.6666666666, ans=0.125 2024-09-18 23:12:14,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=529181.3333333334, ans=0.125 2024-09-18 23:12:16,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=529181.3333333334, ans=0.1 2024-09-18 23:12:16,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=529181.3333333334, ans=0.125 2024-09-18 23:12:25,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=529228.0, ans=0.0 2024-09-18 23:12:42,637 INFO [train.py:1198] (1/2) Epoch 30, batch 1000, loss[loss=0.2084, simple_loss=0.2602, pruned_loss=0.05776, ctc_loss=0.1246, cr_loss=0.4049, over 34478.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.2676, pruned_loss=0.05928, ctc_loss=0.125, cr_loss=0.4, over 6694363.77 frames. ], batch size: 90, lr: 4.00e-03, grad_scale: 32.0 2024-09-18 23:12:43,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=529274.6666666666, ans=0.2 2024-09-18 23:12:49,216 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.070e+02 2.561e+02 3.161e+02 3.859e+02 5.485e+02, threshold=6.321e+02, percent-clipped=0.0 2024-09-18 23:13:19,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=529368.0, ans=0.125 2024-09-18 23:13:26,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.03 vs. limit=15.0 2024-09-18 23:13:40,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=529414.6666666666, ans=0.125 2024-09-18 23:13:42,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=529414.6666666666, ans=0.0 2024-09-18 23:13:52,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=529461.3333333334, ans=0.1 2024-09-18 23:13:55,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=529461.3333333334, ans=0.125 2024-09-18 23:14:04,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=529461.3333333334, ans=0.125 2024-09-18 23:14:07,204 INFO [train.py:1198] (1/2) Epoch 30, batch 1050, loss[loss=0.2145, simple_loss=0.275, pruned_loss=0.05667, ctc_loss=0.1243, cr_loss=0.3963, over 34584.00 frames. ], tot_loss[loss=0.2131, simple_loss=0.267, pruned_loss=0.0591, ctc_loss=0.1246, cr_loss=0.3993, over 6702042.43 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 32.0 2024-09-18 23:14:19,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=529508.0, ans=0.2 2024-09-18 23:14:29,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.08 vs. limit=15.0 2024-09-18 23:14:37,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=529554.6666666666, ans=0.125 2024-09-18 23:15:03,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=529648.0, ans=0.025 2024-09-18 23:15:24,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=529694.6666666666, ans=0.015 2024-09-18 23:15:31,032 INFO [train.py:1198] (1/2) Epoch 30, batch 1100, loss[loss=0.2068, simple_loss=0.2556, pruned_loss=0.05909, ctc_loss=0.1209, cr_loss=0.392, over 34368.00 frames. ], tot_loss[loss=0.213, simple_loss=0.267, pruned_loss=0.05906, ctc_loss=0.1246, cr_loss=0.3993, over 6714948.39 frames. ], batch size: 91, lr: 4.00e-03, grad_scale: 32.0 2024-09-18 23:15:37,621 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.493e+02 2.806e+02 3.644e+02 5.850e+02, threshold=5.613e+02, percent-clipped=0.0 2024-09-18 23:15:52,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=529788.0, ans=0.125 2024-09-18 23:16:05,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.15 vs. limit=15.0 2024-09-18 23:16:06,701 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.63 vs. limit=10.0 2024-09-18 23:16:29,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=529881.3333333334, ans=0.125 2024-09-18 23:16:41,048 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:16:53,733 INFO [train.py:1198] (1/2) Epoch 30, batch 1150, loss[loss=0.2065, simple_loss=0.262, pruned_loss=0.05601, ctc_loss=0.1169, cr_loss=0.3903, over 34339.00 frames. ], tot_loss[loss=0.2129, simple_loss=0.2668, pruned_loss=0.05911, ctc_loss=0.1246, cr_loss=0.3987, over 6713958.17 frames. ], batch size: 91, lr: 4.00e-03, grad_scale: 32.0 2024-09-18 23:16:54,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=529974.6666666666, ans=0.1 2024-09-18 23:17:00,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=529974.6666666666, ans=0.0 2024-09-18 23:17:18,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=530021.3333333334, ans=0.125 2024-09-18 23:17:36,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=530068.0, ans=0.125 2024-09-18 23:17:39,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=530068.0, ans=0.125 2024-09-18 23:17:51,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=530114.6666666666, ans=0.125 2024-09-18 23:18:01,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=530161.3333333334, ans=0.125 2024-09-18 23:18:04,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.25 vs. limit=6.0 2024-09-18 23:18:19,298 INFO [train.py:1198] (1/2) Epoch 30, batch 1200, loss[loss=0.231, simple_loss=0.2859, pruned_loss=0.06573, ctc_loss=0.1375, cr_loss=0.4255, over 34572.00 frames. ], tot_loss[loss=0.2134, simple_loss=0.2674, pruned_loss=0.0592, ctc_loss=0.1249, cr_loss=0.3992, over 6706624.24 frames. ], batch size: 99, lr: 4.00e-03, grad_scale: 32.0 2024-09-18 23:18:26,061 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.382e+02 2.708e+02 3.375e+02 5.193e+02, threshold=5.416e+02, percent-clipped=0.0 2024-09-18 23:18:34,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=530254.6666666666, ans=0.1 2024-09-18 23:18:46,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=530254.6666666666, ans=0.0 2024-09-18 23:19:05,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.56 vs. limit=15.0 2024-09-18 23:19:21,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=530348.0, ans=0.025 2024-09-18 23:19:25,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2024-09-18 23:19:38,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=530394.6666666666, ans=0.1 2024-09-18 23:19:44,113 INFO [train.py:1198] (1/2) Epoch 30, batch 1250, loss[loss=0.238, simple_loss=0.2903, pruned_loss=0.06994, ctc_loss=0.1401, cr_loss=0.4452, over 34325.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2678, pruned_loss=0.05933, ctc_loss=0.1251, cr_loss=0.4, over 6740218.49 frames. ], batch size: 107, lr: 4.00e-03, grad_scale: 32.0 2024-09-18 23:19:56,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=530441.3333333334, ans=0.125 2024-09-18 23:20:04,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=530488.0, ans=0.125 2024-09-18 23:21:06,622 INFO [train.py:1198] (1/2) Epoch 30, batch 1300, loss[loss=0.2259, simple_loss=0.2844, pruned_loss=0.06196, ctc_loss=0.1329, cr_loss=0.4209, over 33140.00 frames. ], tot_loss[loss=0.2134, simple_loss=0.2673, pruned_loss=0.05922, ctc_loss=0.125, cr_loss=0.3996, over 6743214.25 frames. ], batch size: 130, lr: 4.00e-03, grad_scale: 32.0 2024-09-18 23:21:13,223 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.437e+02 2.895e+02 3.443e+02 5.948e+02, threshold=5.791e+02, percent-clipped=3.0 2024-09-18 23:21:26,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=530721.3333333334, ans=0.125 2024-09-18 23:21:35,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=530721.3333333334, ans=0.0 2024-09-18 23:21:56,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.94 vs. limit=22.5 2024-09-18 23:21:58,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=530814.6666666666, ans=0.1 2024-09-18 23:22:22,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=530861.3333333334, ans=0.0 2024-09-18 23:22:33,479 INFO [train.py:1198] (1/2) Epoch 30, batch 1350, loss[loss=0.2128, simple_loss=0.2632, pruned_loss=0.06024, ctc_loss=0.1281, cr_loss=0.405, over 34516.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.2672, pruned_loss=0.05917, ctc_loss=0.1248, cr_loss=0.3996, over 6763289.91 frames. ], batch size: 94, lr: 4.00e-03, grad_scale: 32.0 2024-09-18 23:22:46,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=530908.0, ans=0.0 2024-09-18 23:23:10,690 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.79 vs. limit=15.0 2024-09-18 23:23:14,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=531001.3333333334, ans=0.0 2024-09-18 23:23:16,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.49 vs. limit=15.0 2024-09-18 23:23:16,736 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.20 vs. limit=22.5 2024-09-18 23:23:56,202 INFO [train.py:1198] (1/2) Epoch 30, batch 1400, loss[loss=0.1773, simple_loss=0.2274, pruned_loss=0.04664, ctc_loss=0.1005, cr_loss=0.3437, over 34314.00 frames. ], tot_loss[loss=0.2129, simple_loss=0.2668, pruned_loss=0.05904, ctc_loss=0.1245, cr_loss=0.3995, over 6775744.51 frames. ], batch size: 80, lr: 3.99e-03, grad_scale: 32.0 2024-09-18 23:24:02,827 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.956e+02 2.522e+02 2.872e+02 3.547e+02 7.361e+02, threshold=5.744e+02, percent-clipped=2.0 2024-09-18 23:24:04,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=531141.3333333334, ans=0.0 2024-09-18 23:24:32,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=531234.6666666666, ans=0.05 2024-09-18 23:24:32,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=531234.6666666666, ans=0.125 2024-09-18 23:24:37,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=531234.6666666666, ans=0.025 2024-09-18 23:24:38,219 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.11 vs. limit=15.0 2024-09-18 23:24:54,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=531281.3333333334, ans=0.1 2024-09-18 23:25:12,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=531328.0, ans=0.025 2024-09-18 23:25:13,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=531328.0, ans=0.025 2024-09-18 23:25:17,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=531374.6666666666, ans=0.2 2024-09-18 23:25:20,724 INFO [train.py:1198] (1/2) Epoch 30, batch 1450, loss[loss=0.2189, simple_loss=0.2781, pruned_loss=0.05877, ctc_loss=0.1282, cr_loss=0.4144, over 34454.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.2674, pruned_loss=0.05905, ctc_loss=0.1247, cr_loss=0.3999, over 6774170.07 frames. ], batch size: 110, lr: 3.99e-03, grad_scale: 32.0 2024-09-18 23:25:38,393 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.82 vs. limit=22.5 2024-09-18 23:26:30,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=531561.3333333334, ans=0.125 2024-09-18 23:26:30,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.27 vs. limit=6.0 2024-09-18 23:26:31,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=531561.3333333334, ans=0.1 2024-09-18 23:26:34,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=531561.3333333334, ans=0.125 2024-09-18 23:26:38,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=531561.3333333334, ans=0.125 2024-09-18 23:26:41,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=531561.3333333334, ans=0.0 2024-09-18 23:26:41,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=531561.3333333334, ans=0.2 2024-09-18 23:26:44,516 INFO [train.py:1198] (1/2) Epoch 30, batch 1500, loss[loss=0.2324, simple_loss=0.286, pruned_loss=0.06689, ctc_loss=0.1383, cr_loss=0.4337, over 34459.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2679, pruned_loss=0.05923, ctc_loss=0.1252, cr_loss=0.401, over 6774698.05 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 32.0 2024-09-18 23:26:51,150 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.443e+02 2.800e+02 3.252e+02 4.909e+02, threshold=5.601e+02, percent-clipped=0.0 2024-09-18 23:26:58,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=531608.0, ans=0.2 2024-09-18 23:27:11,563 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:27:13,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=15.0 2024-09-18 23:27:21,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=531701.3333333334, ans=0.0 2024-09-18 23:27:24,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=531701.3333333334, ans=0.0 2024-09-18 23:27:44,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=531748.0, ans=0.025 2024-09-18 23:27:49,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=531794.6666666666, ans=0.025 2024-09-18 23:28:04,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=531794.6666666666, ans=0.125 2024-09-18 23:28:07,663 INFO [train.py:1198] (1/2) Epoch 30, batch 1550, loss[loss=0.2349, simple_loss=0.2858, pruned_loss=0.06886, ctc_loss=0.1432, cr_loss=0.4413, over 34441.00 frames. ], tot_loss[loss=0.2139, simple_loss=0.2677, pruned_loss=0.05944, ctc_loss=0.1255, cr_loss=0.4015, over 6745239.47 frames. ], batch size: 105, lr: 3.99e-03, grad_scale: 32.0 2024-09-18 23:28:17,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=531841.3333333334, ans=0.07 2024-09-18 23:28:30,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.08 vs. limit=10.0 2024-09-18 23:28:42,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=531934.6666666666, ans=0.125 2024-09-18 23:28:49,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=531934.6666666666, ans=0.2 2024-09-18 23:29:14,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=532028.0, ans=0.125 2024-09-18 23:29:14,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=532028.0, ans=0.0 2024-09-18 23:29:27,494 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=22.5 2024-09-18 23:29:32,810 INFO [train.py:1198] (1/2) Epoch 30, batch 1600, loss[loss=0.2266, simple_loss=0.2805, pruned_loss=0.06427, ctc_loss=0.1376, cr_loss=0.4188, over 34554.00 frames. ], tot_loss[loss=0.214, simple_loss=0.2677, pruned_loss=0.05951, ctc_loss=0.1256, cr_loss=0.4014, over 6723961.02 frames. ], batch size: 99, lr: 3.99e-03, grad_scale: 32.0 2024-09-18 23:29:40,814 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.068e+02 2.489e+02 2.785e+02 3.652e+02 1.277e+03, threshold=5.570e+02, percent-clipped=7.0 2024-09-18 23:30:29,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=532214.6666666666, ans=0.025 2024-09-18 23:30:52,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=532261.3333333334, ans=0.125 2024-09-18 23:30:56,492 INFO [train.py:1198] (1/2) Epoch 30, batch 1650, loss[loss=0.2281, simple_loss=0.2776, pruned_loss=0.06645, ctc_loss=0.1392, cr_loss=0.4476, over 34381.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.2672, pruned_loss=0.05915, ctc_loss=0.125, cr_loss=0.4, over 6716128.57 frames. ], batch size: 103, lr: 3.99e-03, grad_scale: 32.0 2024-09-18 23:31:00,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.20 vs. limit=10.0 2024-09-18 23:31:19,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=532354.6666666666, ans=0.125 2024-09-18 23:31:23,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=532354.6666666666, ans=0.05 2024-09-18 23:31:51,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=532448.0, ans=10.0 2024-09-18 23:32:04,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=532494.6666666666, ans=0.2 2024-09-18 23:32:18,786 INFO [train.py:1198] (1/2) Epoch 30, batch 1700, loss[loss=0.1954, simple_loss=0.2455, pruned_loss=0.05377, ctc_loss=0.1142, cr_loss=0.3716, over 34290.00 frames. ], tot_loss[loss=0.2129, simple_loss=0.267, pruned_loss=0.05896, ctc_loss=0.1246, cr_loss=0.3994, over 6740506.28 frames. ], batch size: 80, lr: 3.99e-03, grad_scale: 32.0 2024-09-18 23:32:26,975 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 2.438e+02 2.944e+02 3.686e+02 6.729e+02, threshold=5.888e+02, percent-clipped=2.0 2024-09-18 23:32:35,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=532588.0, ans=0.0 2024-09-18 23:32:40,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=532588.0, ans=0.025 2024-09-18 23:32:55,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=532634.6666666666, ans=0.125 2024-09-18 23:33:04,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=532634.6666666666, ans=0.0 2024-09-18 23:33:12,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=532681.3333333334, ans=0.0 2024-09-18 23:33:17,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=532681.3333333334, ans=0.0 2024-09-18 23:33:20,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=532681.3333333334, ans=0.95 2024-09-18 23:33:22,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=532681.3333333334, ans=0.1 2024-09-18 23:33:30,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2024-09-18 23:33:30,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=532728.0, ans=0.125 2024-09-18 23:33:30,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=532728.0, ans=0.0 2024-09-18 23:33:40,978 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:33:45,613 INFO [train.py:1198] (1/2) Epoch 30, batch 1750, loss[loss=0.19, simple_loss=0.2401, pruned_loss=0.05133, ctc_loss=0.1126, cr_loss=0.3666, over 34175.00 frames. ], tot_loss[loss=0.2125, simple_loss=0.2666, pruned_loss=0.05883, ctc_loss=0.1244, cr_loss=0.3992, over 6749534.39 frames. ], batch size: 78, lr: 3.99e-03, grad_scale: 32.0 2024-09-18 23:34:08,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=532821.3333333334, ans=0.125 2024-09-18 23:34:53,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=532961.3333333334, ans=0.0 2024-09-18 23:35:07,828 INFO [train.py:1198] (1/2) Epoch 30, batch 1800, loss[loss=0.2079, simple_loss=0.2677, pruned_loss=0.0545, ctc_loss=0.1183, cr_loss=0.3847, over 34683.00 frames. ], tot_loss[loss=0.2126, simple_loss=0.2666, pruned_loss=0.05889, ctc_loss=0.1246, cr_loss=0.3991, over 6752662.25 frames. ], batch size: 97, lr: 3.99e-03, grad_scale: 32.0 2024-09-18 23:35:09,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=533008.0, ans=0.0 2024-09-18 23:35:16,197 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.059e+02 2.415e+02 2.867e+02 3.808e+02 6.137e+02, threshold=5.734e+02, percent-clipped=1.0 2024-09-18 23:35:23,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.47 vs. limit=10.0 2024-09-18 23:35:36,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=533054.6666666666, ans=0.125 2024-09-18 23:35:44,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=533101.3333333334, ans=0.125 2024-09-18 23:35:52,830 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:36:07,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2024-09-18 23:36:22,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=533194.6666666666, ans=0.125 2024-09-18 23:36:22,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=533194.6666666666, ans=0.2 2024-09-18 23:36:27,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=533194.6666666666, ans=0.025 2024-09-18 23:36:30,321 INFO [train.py:1198] (1/2) Epoch 30, batch 1850, loss[loss=0.2221, simple_loss=0.2813, pruned_loss=0.06063, ctc_loss=0.1278, cr_loss=0.403, over 34456.00 frames. ], tot_loss[loss=0.2128, simple_loss=0.2668, pruned_loss=0.05897, ctc_loss=0.1247, cr_loss=0.4, over 6761124.64 frames. ], batch size: 100, lr: 3.99e-03, grad_scale: 32.0 2024-09-18 23:36:41,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=533241.3333333334, ans=0.125 2024-09-18 23:36:56,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.23 vs. limit=15.0 2024-09-18 23:37:12,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=533334.6666666666, ans=0.0 2024-09-18 23:37:16,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=533334.6666666666, ans=0.0 2024-09-18 23:37:35,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 2024-09-18 23:37:56,126 INFO [train.py:1198] (1/2) Epoch 30, batch 1900, loss[loss=0.2164, simple_loss=0.2763, pruned_loss=0.05751, ctc_loss=0.1253, cr_loss=0.4089, over 34402.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.2677, pruned_loss=0.05918, ctc_loss=0.1251, cr_loss=0.4005, over 6771691.24 frames. ], batch size: 103, lr: 3.99e-03, grad_scale: 32.0 2024-09-18 23:37:59,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=533474.6666666666, ans=0.125 2024-09-18 23:38:03,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.15 vs. limit=10.0 2024-09-18 23:38:04,428 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.128e+02 2.566e+02 3.309e+02 4.191e+02 7.071e+02, threshold=6.617e+02, percent-clipped=7.0 2024-09-18 23:38:09,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=533474.6666666666, ans=0.125 2024-09-18 23:38:26,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=533521.3333333334, ans=0.125 2024-09-18 23:38:49,098 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:38:53,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=533614.6666666666, ans=0.07 2024-09-18 23:39:09,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.10 vs. limit=6.0 2024-09-18 23:39:18,060 INFO [train.py:1198] (1/2) Epoch 30, batch 1950, loss[loss=0.2076, simple_loss=0.2607, pruned_loss=0.05677, ctc_loss=0.123, cr_loss=0.407, over 34371.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2688, pruned_loss=0.05952, ctc_loss=0.1258, cr_loss=0.4025, over 6788455.51 frames. ], batch size: 91, lr: 3.98e-03, grad_scale: 32.0 2024-09-18 23:39:21,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=533708.0, ans=0.125 2024-09-18 23:39:28,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=533708.0, ans=0.05 2024-09-18 23:39:29,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.85 vs. limit=15.0 2024-09-18 23:39:37,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.19 vs. limit=15.0 2024-09-18 23:39:46,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=533754.6666666666, ans=0.0 2024-09-18 23:39:51,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=533801.3333333334, ans=0.125 2024-09-18 23:40:01,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=533801.3333333334, ans=0.125 2024-09-18 23:40:04,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=533801.3333333334, ans=0.0 2024-09-18 23:40:12,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2024-09-18 23:40:42,917 INFO [train.py:1198] (1/2) Epoch 30, batch 2000, loss[loss=0.1899, simple_loss=0.2398, pruned_loss=0.0517, ctc_loss=0.1109, cr_loss=0.3617, over 34146.00 frames. ], tot_loss[loss=0.2148, simple_loss=0.269, pruned_loss=0.05965, ctc_loss=0.1261, cr_loss=0.4031, over 6764197.52 frames. ], batch size: 78, lr: 3.98e-03, grad_scale: 32.0 2024-09-18 23:40:51,323 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.981e+02 2.392e+02 2.752e+02 3.588e+02 6.447e+02, threshold=5.504e+02, percent-clipped=0.0 2024-09-18 23:40:53,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=533941.3333333334, ans=0.0 2024-09-18 23:40:58,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=533941.3333333334, ans=0.2 2024-09-18 23:41:03,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=533988.0, ans=0.2 2024-09-18 23:41:10,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=533988.0, ans=0.025 2024-09-18 23:41:42,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2024-09-18 23:41:49,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=534128.0, ans=0.025 2024-09-18 23:41:51,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=534128.0, ans=0.2 2024-09-18 23:42:03,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=534128.0, ans=0.1 2024-09-18 23:42:04,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=534128.0, ans=0.125 2024-09-18 23:42:07,621 INFO [train.py:1198] (1/2) Epoch 30, batch 2050, loss[loss=0.1888, simple_loss=0.2399, pruned_loss=0.05099, ctc_loss=0.1089, cr_loss=0.3474, over 34484.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2677, pruned_loss=0.05925, ctc_loss=0.1252, cr_loss=0.4013, over 6755791.13 frames. ], batch size: 82, lr: 3.98e-03, grad_scale: 32.0 2024-09-18 23:42:24,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=534221.3333333334, ans=0.1 2024-09-18 23:42:35,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=534221.3333333334, ans=0.125 2024-09-18 23:42:40,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=534268.0, ans=0.125 2024-09-18 23:42:43,068 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.92 vs. limit=22.5 2024-09-18 23:43:15,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=534361.3333333334, ans=0.0 2024-09-18 23:43:16,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=534361.3333333334, ans=0.2 2024-09-18 23:43:29,755 INFO [train.py:1198] (1/2) Epoch 30, batch 2100, loss[loss=0.2141, simple_loss=0.2682, pruned_loss=0.05944, ctc_loss=0.1243, cr_loss=0.406, over 34513.00 frames. ], tot_loss[loss=0.213, simple_loss=0.2671, pruned_loss=0.05901, ctc_loss=0.1247, cr_loss=0.4002, over 6768392.27 frames. ], batch size: 94, lr: 3.98e-03, grad_scale: 32.0 2024-09-18 23:43:30,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=534408.0, ans=0.09899494936611666 2024-09-18 23:43:35,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=534408.0, ans=0.1 2024-09-18 23:43:35,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=534408.0, ans=0.125 2024-09-18 23:43:37,864 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.495e+02 2.919e+02 3.795e+02 6.480e+02, threshold=5.838e+02, percent-clipped=3.0 2024-09-18 23:43:44,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=534454.6666666666, ans=0.125 2024-09-18 23:44:09,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=534501.3333333334, ans=0.125 2024-09-18 23:44:12,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=534501.3333333334, ans=0.025 2024-09-18 23:44:29,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=534548.0, ans=0.125 2024-09-18 23:44:42,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=534594.6666666666, ans=0.125 2024-09-18 23:44:45,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.81 vs. limit=15.0 2024-09-18 23:44:54,507 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:44:55,792 INFO [train.py:1198] (1/2) Epoch 30, batch 2150, loss[loss=0.2132, simple_loss=0.2659, pruned_loss=0.05976, ctc_loss=0.1235, cr_loss=0.4093, over 34328.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.2665, pruned_loss=0.05875, ctc_loss=0.1242, cr_loss=0.3989, over 6785352.70 frames. ], batch size: 91, lr: 3.98e-03, grad_scale: 32.0 2024-09-18 23:45:09,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=534641.3333333334, ans=0.2 2024-09-18 23:45:22,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=534688.0, ans=0.125 2024-09-18 23:46:00,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=534828.0, ans=0.0 2024-09-18 23:46:03,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=534828.0, ans=0.125 2024-09-18 23:46:07,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=534828.0, ans=0.125 2024-09-18 23:46:07,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.80 vs. limit=10.0 2024-09-18 23:46:18,301 INFO [train.py:1198] (1/2) Epoch 30, batch 2200, loss[loss=0.2237, simple_loss=0.281, pruned_loss=0.06206, ctc_loss=0.1299, cr_loss=0.4045, over 34441.00 frames. ], tot_loss[loss=0.2126, simple_loss=0.2666, pruned_loss=0.05883, ctc_loss=0.1244, cr_loss=0.3996, over 6780624.89 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 32.0 2024-09-18 23:46:26,470 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.114e+02 2.429e+02 2.863e+02 3.942e+02 7.444e+02, threshold=5.725e+02, percent-clipped=5.0 2024-09-18 23:46:30,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=534874.6666666666, ans=0.125 2024-09-18 23:46:34,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=534921.3333333334, ans=0.125 2024-09-18 23:46:51,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=534968.0, ans=0.09899494936611666 2024-09-18 23:46:57,977 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-18 23:47:02,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=534968.0, ans=0.125 2024-09-18 23:47:11,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.58 vs. limit=15.0 2024-09-18 23:47:13,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.53 vs. limit=22.5 2024-09-18 23:47:39,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=535108.0, ans=0.125 2024-09-18 23:47:40,388 INFO [train.py:1198] (1/2) Epoch 30, batch 2250, loss[loss=0.2155, simple_loss=0.2702, pruned_loss=0.05972, ctc_loss=0.1256, cr_loss=0.4046, over 34435.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.2666, pruned_loss=0.05875, ctc_loss=0.1243, cr_loss=0.3993, over 6777979.37 frames. ], batch size: 95, lr: 3.98e-03, grad_scale: 32.0 2024-09-18 23:47:44,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-18 23:47:58,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=535154.6666666666, ans=0.125 2024-09-18 23:48:03,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=535154.6666666666, ans=0.0 2024-09-18 23:48:04,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.10 vs. limit=15.0 2024-09-18 23:48:48,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=535294.6666666666, ans=0.09899494936611666 2024-09-18 23:48:55,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=535294.6666666666, ans=0.125 2024-09-18 23:49:00,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=535294.6666666666, ans=0.125 2024-09-18 23:49:06,801 INFO [train.py:1198] (1/2) Epoch 30, batch 2300, loss[loss=0.1925, simple_loss=0.2442, pruned_loss=0.05214, ctc_loss=0.1096, cr_loss=0.3649, over 34267.00 frames. ], tot_loss[loss=0.2118, simple_loss=0.2657, pruned_loss=0.05856, ctc_loss=0.1239, cr_loss=0.398, over 6764548.04 frames. ], batch size: 83, lr: 3.98e-03, grad_scale: 32.0 2024-09-18 23:49:12,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=535341.3333333334, ans=0.1 2024-09-18 23:49:15,035 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.565e+02 2.901e+02 3.518e+02 8.122e+02, threshold=5.801e+02, percent-clipped=1.0 2024-09-18 23:49:26,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=535388.0, ans=0.125 2024-09-18 23:49:28,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=535388.0, ans=0.0 2024-09-18 23:49:31,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=535388.0, ans=0.0 2024-09-18 23:49:39,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=535434.6666666666, ans=0.04949747468305833 2024-09-18 23:49:49,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=535434.6666666666, ans=0.0 2024-09-18 23:49:53,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=535434.6666666666, ans=0.125 2024-09-18 23:49:58,694 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.19 vs. limit=15.0 2024-09-18 23:50:14,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=535528.0, ans=0.0 2024-09-18 23:50:29,218 INFO [train.py:1198] (1/2) Epoch 30, batch 2350, loss[loss=0.2161, simple_loss=0.2738, pruned_loss=0.05825, ctc_loss=0.1247, cr_loss=0.4229, over 34701.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2657, pruned_loss=0.05865, ctc_loss=0.124, cr_loss=0.3989, over 6770283.11 frames. ], batch size: 97, lr: 3.98e-03, grad_scale: 32.0 2024-09-18 23:50:37,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=535574.6666666666, ans=0.125 2024-09-18 23:50:39,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=535574.6666666666, ans=0.2 2024-09-18 23:50:50,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=535621.3333333334, ans=0.125 2024-09-18 23:51:06,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.29 vs. limit=10.0 2024-09-18 23:51:52,154 INFO [train.py:1198] (1/2) Epoch 30, batch 2400, loss[loss=0.1899, simple_loss=0.2474, pruned_loss=0.04888, ctc_loss=0.1068, cr_loss=0.3334, over 34604.00 frames. ], tot_loss[loss=0.2123, simple_loss=0.2661, pruned_loss=0.05881, ctc_loss=0.1242, cr_loss=0.3993, over 6774917.46 frames. ], batch size: 89, lr: 3.98e-03, grad_scale: 32.0 2024-09-18 23:52:02,334 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.061e+02 2.489e+02 2.810e+02 3.560e+02 6.943e+02, threshold=5.619e+02, percent-clipped=1.0 2024-09-18 23:52:14,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=535854.6666666666, ans=0.2 2024-09-18 23:52:31,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=535901.3333333334, ans=0.0 2024-09-18 23:52:39,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=535901.3333333334, ans=0.125 2024-09-18 23:52:49,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=535948.0, ans=0.1 2024-09-18 23:53:02,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=535994.6666666666, ans=0.5 2024-09-18 23:53:19,049 INFO [train.py:1198] (1/2) Epoch 30, batch 2450, loss[loss=0.2106, simple_loss=0.2665, pruned_loss=0.05753, ctc_loss=0.1204, cr_loss=0.3899, over 34402.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2672, pruned_loss=0.0592, ctc_loss=0.125, cr_loss=0.4007, over 6748787.76 frames. ], batch size: 95, lr: 3.98e-03, grad_scale: 32.0 2024-09-18 23:53:29,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=536041.3333333334, ans=0.05 2024-09-18 23:53:37,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=536088.0, ans=0.5 2024-09-18 23:53:39,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536088.0, ans=0.1 2024-09-18 23:53:54,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=536134.6666666666, ans=0.1 2024-09-18 23:54:02,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=536134.6666666666, ans=0.0 2024-09-18 23:54:03,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=536134.6666666666, ans=0.0 2024-09-18 23:54:25,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=536228.0, ans=0.125 2024-09-18 23:54:36,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=536228.0, ans=0.04949747468305833 2024-09-18 23:54:41,155 INFO [train.py:1198] (1/2) Epoch 30, batch 2500, loss[loss=0.2171, simple_loss=0.2779, pruned_loss=0.05813, ctc_loss=0.1231, cr_loss=0.3866, over 34456.00 frames. ], tot_loss[loss=0.2134, simple_loss=0.2672, pruned_loss=0.05925, ctc_loss=0.125, cr_loss=0.4008, over 6760394.60 frames. ], batch size: 100, lr: 3.98e-03, grad_scale: 32.0 2024-09-18 23:54:43,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=536274.6666666666, ans=0.0 2024-09-18 23:54:44,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=536274.6666666666, ans=0.2 2024-09-18 23:54:49,192 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.997e+02 2.443e+02 2.735e+02 3.219e+02 5.208e+02, threshold=5.470e+02, percent-clipped=0.0 2024-09-18 23:54:54,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.98 vs. limit=22.5 2024-09-18 23:54:57,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=536321.3333333334, ans=0.125 2024-09-18 23:55:07,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=536321.3333333334, ans=0.0 2024-09-18 23:55:09,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=536321.3333333334, ans=0.07 2024-09-18 23:55:16,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.66 vs. limit=15.0 2024-09-18 23:55:50,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.93 vs. limit=15.0 2024-09-18 23:55:58,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=536461.3333333334, ans=0.1 2024-09-18 23:55:59,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.89 vs. limit=22.5 2024-09-18 23:56:07,216 INFO [train.py:1198] (1/2) Epoch 30, batch 2550, loss[loss=0.1826, simple_loss=0.2341, pruned_loss=0.04821, ctc_loss=0.1035, cr_loss=0.3484, over 34158.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.2671, pruned_loss=0.05915, ctc_loss=0.1248, cr_loss=0.4002, over 6764988.38 frames. ], batch size: 78, lr: 3.97e-03, grad_scale: 16.0 2024-09-18 23:56:17,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=536508.0, ans=0.125 2024-09-18 23:56:46,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=536601.3333333334, ans=0.125 2024-09-18 23:57:09,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=536648.0, ans=0.2 2024-09-18 23:57:14,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=536694.6666666666, ans=0.0 2024-09-18 23:57:19,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=536694.6666666666, ans=0.0 2024-09-18 23:57:29,568 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.70 vs. limit=10.0 2024-09-18 23:57:30,163 INFO [train.py:1198] (1/2) Epoch 30, batch 2600, loss[loss=0.1998, simple_loss=0.2574, pruned_loss=0.05183, ctc_loss=0.1171, cr_loss=0.3777, over 34354.00 frames. ], tot_loss[loss=0.2138, simple_loss=0.2677, pruned_loss=0.05935, ctc_loss=0.1252, cr_loss=0.4016, over 6760579.14 frames. ], batch size: 91, lr: 3.97e-03, grad_scale: 16.0 2024-09-18 23:57:39,837 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.167e+02 2.484e+02 2.958e+02 4.070e+02 6.837e+02, threshold=5.916e+02, percent-clipped=7.0 2024-09-18 23:57:49,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=536788.0, ans=0.125 2024-09-18 23:57:53,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=536788.0, ans=0.09899494936611666 2024-09-18 23:58:29,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=536881.3333333334, ans=0.09899494936611666 2024-09-18 23:58:51,979 INFO [train.py:1198] (1/2) Epoch 30, batch 2650, loss[loss=0.2263, simple_loss=0.2836, pruned_loss=0.06321, ctc_loss=0.1312, cr_loss=0.4088, over 34186.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2677, pruned_loss=0.05928, ctc_loss=0.1252, cr_loss=0.4015, over 6768607.60 frames. ], batch size: 117, lr: 3.97e-03, grad_scale: 16.0 2024-09-18 23:58:57,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=536974.6666666666, ans=0.2 2024-09-18 23:59:00,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=536974.6666666666, ans=0.025 2024-09-18 23:59:08,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=537021.3333333334, ans=0.1 2024-09-18 23:59:11,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=537021.3333333334, ans=0.125 2024-09-18 23:59:39,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.48 vs. limit=15.0 2024-09-18 23:59:49,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=537114.6666666666, ans=0.125 2024-09-18 23:59:54,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=537114.6666666666, ans=0.125 2024-09-19 00:00:17,625 INFO [train.py:1198] (1/2) Epoch 30, batch 2700, loss[loss=0.2145, simple_loss=0.2731, pruned_loss=0.05744, ctc_loss=0.1247, cr_loss=0.401, over 34598.00 frames. ], tot_loss[loss=0.214, simple_loss=0.2681, pruned_loss=0.05936, ctc_loss=0.1254, cr_loss=0.4018, over 6764143.50 frames. ], batch size: 102, lr: 3.97e-03, grad_scale: 16.0 2024-09-19 00:00:26,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=22.5 2024-09-19 00:00:27,478 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.085e+02 2.512e+02 2.953e+02 3.669e+02 5.820e+02, threshold=5.905e+02, percent-clipped=0.0 2024-09-19 00:00:27,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=537208.0, ans=0.125 2024-09-19 00:01:14,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.80 vs. limit=15.0 2024-09-19 00:01:40,256 INFO [train.py:1198] (1/2) Epoch 30, batch 2750, loss[loss=0.2123, simple_loss=0.2637, pruned_loss=0.05982, ctc_loss=0.1274, cr_loss=0.3952, over 34643.00 frames. ], tot_loss[loss=0.213, simple_loss=0.2671, pruned_loss=0.05899, ctc_loss=0.1247, cr_loss=0.4002, over 6760839.32 frames. ], batch size: 88, lr: 3.97e-03, grad_scale: 16.0 2024-09-19 00:01:43,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=537441.3333333334, ans=0.025 2024-09-19 00:01:44,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=537441.3333333334, ans=0.2 2024-09-19 00:03:02,945 INFO [train.py:1198] (1/2) Epoch 30, batch 2800, loss[loss=0.2482, simple_loss=0.2917, pruned_loss=0.0776, ctc_loss=0.1596, cr_loss=0.4371, over 23683.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2671, pruned_loss=0.0592, ctc_loss=0.1251, cr_loss=0.401, over 6739055.82 frames. ], batch size: 244, lr: 3.97e-03, grad_scale: 32.0 2024-09-19 00:03:14,472 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.153e+02 2.758e+02 3.384e+02 4.028e+02 6.394e+02, threshold=6.768e+02, percent-clipped=2.0 2024-09-19 00:03:18,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=537674.6666666666, ans=0.0 2024-09-19 00:03:23,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=537721.3333333334, ans=0.125 2024-09-19 00:03:26,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=537721.3333333334, ans=0.125 2024-09-19 00:03:51,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=537768.0, ans=0.1 2024-09-19 00:03:53,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=537768.0, ans=0.2 2024-09-19 00:04:11,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=537861.3333333334, ans=0.0 2024-09-19 00:04:29,180 INFO [train.py:1198] (1/2) Epoch 30, batch 2850, loss[loss=0.2122, simple_loss=0.2637, pruned_loss=0.0601, ctc_loss=0.1236, cr_loss=0.3971, over 34478.00 frames. ], tot_loss[loss=0.2138, simple_loss=0.2676, pruned_loss=0.05944, ctc_loss=0.1255, cr_loss=0.4016, over 6723575.28 frames. ], batch size: 90, lr: 3.97e-03, grad_scale: 32.0 2024-09-19 00:04:47,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=537954.6666666666, ans=0.1 2024-09-19 00:04:57,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=537954.6666666666, ans=0.125 2024-09-19 00:04:59,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=537954.6666666666, ans=0.0 2024-09-19 00:05:05,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=538001.3333333334, ans=0.125 2024-09-19 00:05:09,541 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.55 vs. limit=15.0 2024-09-19 00:05:11,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=538001.3333333334, ans=0.125 2024-09-19 00:05:23,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.27 vs. limit=15.0 2024-09-19 00:05:47,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=538094.6666666666, ans=0.0 2024-09-19 00:05:50,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=538141.3333333334, ans=0.125 2024-09-19 00:05:51,876 INFO [train.py:1198] (1/2) Epoch 30, batch 2900, loss[loss=0.2143, simple_loss=0.2658, pruned_loss=0.06059, ctc_loss=0.1269, cr_loss=0.4053, over 34514.00 frames. ], tot_loss[loss=0.2148, simple_loss=0.2689, pruned_loss=0.05967, ctc_loss=0.126, cr_loss=0.4037, over 6754525.10 frames. ], batch size: 94, lr: 3.97e-03, grad_scale: 32.0 2024-09-19 00:06:01,801 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 2.504e+02 2.947e+02 3.867e+02 7.739e+02, threshold=5.894e+02, percent-clipped=1.0 2024-09-19 00:06:22,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=538188.0, ans=0.0 2024-09-19 00:06:30,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=538234.6666666666, ans=0.125 2024-09-19 00:06:36,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=538234.6666666666, ans=0.1 2024-09-19 00:06:43,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=538281.3333333334, ans=0.0 2024-09-19 00:06:56,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=538281.3333333334, ans=0.125 2024-09-19 00:06:58,721 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=22.5 2024-09-19 00:07:15,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=538328.0, ans=0.1 2024-09-19 00:07:18,076 INFO [train.py:1198] (1/2) Epoch 30, batch 2950, loss[loss=0.2106, simple_loss=0.2608, pruned_loss=0.05968, ctc_loss=0.1276, cr_loss=0.3865, over 34612.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.2673, pruned_loss=0.05908, ctc_loss=0.1249, cr_loss=0.4011, over 6750076.34 frames. ], batch size: 88, lr: 3.97e-03, grad_scale: 32.0 2024-09-19 00:07:30,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=538374.6666666666, ans=0.1 2024-09-19 00:07:34,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=538421.3333333334, ans=0.125 2024-09-19 00:07:40,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.62 vs. limit=10.0 2024-09-19 00:08:01,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=538468.0, ans=0.125 2024-09-19 00:08:03,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.01 vs. limit=15.0 2024-09-19 00:08:40,992 INFO [train.py:1198] (1/2) Epoch 30, batch 3000, loss[loss=0.2012, simple_loss=0.2577, pruned_loss=0.0528, ctc_loss=0.1166, cr_loss=0.3952, over 34507.00 frames. ], tot_loss[loss=0.2126, simple_loss=0.2668, pruned_loss=0.05877, ctc_loss=0.1244, cr_loss=0.4001, over 6750870.40 frames. ], batch size: 94, lr: 3.97e-03, grad_scale: 32.0 2024-09-19 00:08:40,992 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 00:08:57,797 INFO [train.py:1230] (1/2) Epoch 30, validation: loss=0.1489, simple_loss=0.2437, pruned_loss=0.0231, ctc_loss=0.03994, cr_loss=2.069e-14, over 944034.00 frames. 2024-09-19 00:08:57,798 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 00:09:07,872 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.467e+02 3.035e+02 4.069e+02 7.187e+02, threshold=6.069e+02, percent-clipped=2.0 2024-09-19 00:09:11,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=538608.0, ans=0.125 2024-09-19 00:09:13,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=538654.6666666666, ans=0.1 2024-09-19 00:09:24,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=538654.6666666666, ans=10.0 2024-09-19 00:09:24,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=538654.6666666666, ans=0.0 2024-09-19 00:09:32,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.19 vs. limit=15.0 2024-09-19 00:09:52,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.47 vs. limit=15.0 2024-09-19 00:09:54,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.15 vs. limit=15.0 2024-09-19 00:09:56,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=538748.0, ans=0.0 2024-09-19 00:10:12,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=538794.6666666666, ans=0.125 2024-09-19 00:10:19,262 INFO [train.py:1198] (1/2) Epoch 30, batch 3050, loss[loss=0.2029, simple_loss=0.2543, pruned_loss=0.05596, ctc_loss=0.1189, cr_loss=0.3939, over 34581.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.2676, pruned_loss=0.05912, ctc_loss=0.125, cr_loss=0.4017, over 6743391.24 frames. ], batch size: 89, lr: 3.97e-03, grad_scale: 32.0 2024-09-19 00:10:27,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=538841.3333333334, ans=0.05 2024-09-19 00:10:39,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=538888.0, ans=0.0 2024-09-19 00:10:58,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=538934.6666666666, ans=0.0 2024-09-19 00:11:06,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=538981.3333333334, ans=0.035 2024-09-19 00:11:26,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=539028.0, ans=0.125 2024-09-19 00:11:39,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=539028.0, ans=0.035 2024-09-19 00:11:43,865 INFO [train.py:1198] (1/2) Epoch 30, batch 3100, loss[loss=0.229, simple_loss=0.2885, pruned_loss=0.06301, ctc_loss=0.1308, cr_loss=0.4327, over 34256.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.2675, pruned_loss=0.05919, ctc_loss=0.1251, cr_loss=0.4017, over 6742258.19 frames. ], batch size: 117, lr: 3.96e-03, grad_scale: 32.0 2024-09-19 00:11:53,592 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.119e+02 2.437e+02 2.712e+02 3.408e+02 5.752e+02, threshold=5.423e+02, percent-clipped=0.0 2024-09-19 00:11:53,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=539074.6666666666, ans=0.025 2024-09-19 00:11:54,527 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.15 vs. limit=10.0 2024-09-19 00:11:57,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=539074.6666666666, ans=0.125 2024-09-19 00:12:14,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=539168.0, ans=0.0 2024-09-19 00:12:14,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=539168.0, ans=0.0 2024-09-19 00:12:17,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=539168.0, ans=0.09899494936611666 2024-09-19 00:12:19,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=539168.0, ans=0.125 2024-09-19 00:12:24,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=539168.0, ans=0.1 2024-09-19 00:12:50,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=539261.3333333334, ans=0.125 2024-09-19 00:13:00,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=6.0 2024-09-19 00:13:01,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=539261.3333333334, ans=0.1 2024-09-19 00:13:04,799 INFO [train.py:1198] (1/2) Epoch 30, batch 3150, loss[loss=0.2341, simple_loss=0.2898, pruned_loss=0.06671, ctc_loss=0.1386, cr_loss=0.4304, over 33816.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.2676, pruned_loss=0.05914, ctc_loss=0.1251, cr_loss=0.4014, over 6748290.65 frames. ], batch size: 122, lr: 3.96e-03, grad_scale: 32.0 2024-09-19 00:13:16,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=539308.0, ans=0.125 2024-09-19 00:13:29,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=539354.6666666666, ans=0.1 2024-09-19 00:14:09,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=539494.6666666666, ans=0.1 2024-09-19 00:14:25,667 INFO [train.py:1198] (1/2) Epoch 30, batch 3200, loss[loss=0.2038, simple_loss=0.2616, pruned_loss=0.05399, ctc_loss=0.1159, cr_loss=0.3713, over 34550.00 frames. ], tot_loss[loss=0.2126, simple_loss=0.2669, pruned_loss=0.05875, ctc_loss=0.1244, cr_loss=0.3994, over 6760715.98 frames. ], batch size: 94, lr: 3.96e-03, grad_scale: 32.0 2024-09-19 00:14:30,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=539541.3333333334, ans=0.2 2024-09-19 00:14:35,345 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.089e+02 2.490e+02 2.970e+02 3.552e+02 6.221e+02, threshold=5.940e+02, percent-clipped=4.0 2024-09-19 00:14:43,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=539588.0, ans=0.2 2024-09-19 00:15:06,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=539634.6666666666, ans=0.125 2024-09-19 00:15:47,032 INFO [train.py:1198] (1/2) Epoch 30, batch 3250, loss[loss=0.225, simple_loss=0.2851, pruned_loss=0.06143, ctc_loss=0.1287, cr_loss=0.406, over 34656.00 frames. ], tot_loss[loss=0.2131, simple_loss=0.2674, pruned_loss=0.05893, ctc_loss=0.1247, cr_loss=0.4, over 6769490.57 frames. ], batch size: 98, lr: 3.96e-03, grad_scale: 32.0 2024-09-19 00:16:09,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=539821.3333333334, ans=0.0 2024-09-19 00:16:13,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=539821.3333333334, ans=0.125 2024-09-19 00:16:28,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=539868.0, ans=0.125 2024-09-19 00:16:40,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=539914.6666666666, ans=0.2 2024-09-19 00:16:48,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=539914.6666666666, ans=0.125 2024-09-19 00:16:56,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2024-09-19 00:17:08,570 INFO [train.py:1198] (1/2) Epoch 30, batch 3300, loss[loss=0.2167, simple_loss=0.2764, pruned_loss=0.05798, ctc_loss=0.1255, cr_loss=0.3952, over 33044.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2662, pruned_loss=0.05853, ctc_loss=0.1239, cr_loss=0.3984, over 6768464.71 frames. ], batch size: 130, lr: 3.96e-03, grad_scale: 32.0 2024-09-19 00:17:10,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=540008.0, ans=0.0 2024-09-19 00:17:12,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=540008.0, ans=0.125 2024-09-19 00:17:18,464 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.447e+02 3.069e+02 3.538e+02 5.210e+02, threshold=6.139e+02, percent-clipped=0.0 2024-09-19 00:17:23,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=540054.6666666666, ans=0.025 2024-09-19 00:17:28,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=540054.6666666666, ans=0.125 2024-09-19 00:17:31,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=540054.6666666666, ans=0.125 2024-09-19 00:17:31,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=540054.6666666666, ans=0.125 2024-09-19 00:17:52,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=540101.3333333334, ans=0.125 2024-09-19 00:17:57,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=540148.0, ans=0.2 2024-09-19 00:18:31,140 INFO [train.py:1198] (1/2) Epoch 30, batch 3350, loss[loss=0.2194, simple_loss=0.2768, pruned_loss=0.0598, ctc_loss=0.1286, cr_loss=0.4155, over 33903.00 frames. ], tot_loss[loss=0.2128, simple_loss=0.267, pruned_loss=0.0589, ctc_loss=0.1246, cr_loss=0.3995, over 6742854.02 frames. ], batch size: 122, lr: 3.96e-03, grad_scale: 16.0 2024-09-19 00:18:51,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.55 vs. limit=15.0 2024-09-19 00:19:28,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=540381.3333333334, ans=0.1 2024-09-19 00:19:36,340 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:19:52,326 INFO [train.py:1198] (1/2) Epoch 30, batch 3400, loss[loss=0.1838, simple_loss=0.2371, pruned_loss=0.04788, ctc_loss=0.1044, cr_loss=0.3488, over 34142.00 frames. ], tot_loss[loss=0.2129, simple_loss=0.2669, pruned_loss=0.05902, ctc_loss=0.1248, cr_loss=0.4004, over 6732993.22 frames. ], batch size: 78, lr: 3.96e-03, grad_scale: 16.0 2024-09-19 00:20:01,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.44 vs. limit=22.5 2024-09-19 00:20:03,638 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.048e+02 2.449e+02 2.841e+02 3.585e+02 6.280e+02, threshold=5.682e+02, percent-clipped=1.0 2024-09-19 00:20:13,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.36 vs. limit=22.5 2024-09-19 00:20:31,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=540568.0, ans=0.0 2024-09-19 00:20:37,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=540568.0, ans=0.5 2024-09-19 00:20:39,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=540614.6666666666, ans=0.125 2024-09-19 00:20:53,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=540614.6666666666, ans=0.0 2024-09-19 00:21:12,328 INFO [train.py:1198] (1/2) Epoch 30, batch 3450, loss[loss=0.2253, simple_loss=0.2762, pruned_loss=0.06477, ctc_loss=0.1354, cr_loss=0.4457, over 33096.00 frames. ], tot_loss[loss=0.2134, simple_loss=0.2673, pruned_loss=0.0592, ctc_loss=0.125, cr_loss=0.4009, over 6745227.48 frames. ], batch size: 130, lr: 3.96e-03, grad_scale: 16.0 2024-09-19 00:21:20,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=540708.0, ans=0.0 2024-09-19 00:21:42,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=540754.6666666666, ans=0.5 2024-09-19 00:21:52,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=540801.3333333334, ans=0.0 2024-09-19 00:21:58,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=540801.3333333334, ans=0.125 2024-09-19 00:22:03,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=540848.0, ans=0.0 2024-09-19 00:22:34,960 INFO [train.py:1198] (1/2) Epoch 30, batch 3500, loss[loss=0.19, simple_loss=0.2444, pruned_loss=0.0499, ctc_loss=0.1072, cr_loss=0.3569, over 34452.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.2671, pruned_loss=0.05917, ctc_loss=0.125, cr_loss=0.4001, over 6747695.90 frames. ], batch size: 85, lr: 3.96e-03, grad_scale: 16.0 2024-09-19 00:22:46,256 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.077e+02 2.504e+02 2.872e+02 3.488e+02 6.528e+02, threshold=5.743e+02, percent-clipped=1.0 2024-09-19 00:23:16,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.87 vs. limit=15.0 2024-09-19 00:23:23,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=541081.3333333334, ans=0.125 2024-09-19 00:23:25,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=541081.3333333334, ans=0.2 2024-09-19 00:23:28,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=541081.3333333334, ans=0.0 2024-09-19 00:23:37,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=541128.0, ans=0.125 2024-09-19 00:23:52,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=541128.0, ans=0.1 2024-09-19 00:23:54,969 INFO [train.py:1198] (1/2) Epoch 30, batch 3550, loss[loss=0.2152, simple_loss=0.273, pruned_loss=0.05859, ctc_loss=0.1213, cr_loss=0.3987, over 34359.00 frames. ], tot_loss[loss=0.2133, simple_loss=0.2673, pruned_loss=0.05918, ctc_loss=0.1249, cr_loss=0.4004, over 6756371.79 frames. ], batch size: 103, lr: 3.96e-03, grad_scale: 16.0 2024-09-19 00:24:00,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=541174.6666666666, ans=0.5 2024-09-19 00:24:38,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=541268.0, ans=0.0 2024-09-19 00:24:41,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=541314.6666666666, ans=15.0 2024-09-19 00:24:57,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=541314.6666666666, ans=0.2 2024-09-19 00:24:59,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=541314.6666666666, ans=0.125 2024-09-19 00:25:09,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=541361.3333333334, ans=0.0 2024-09-19 00:25:12,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=541361.3333333334, ans=0.0 2024-09-19 00:25:18,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=541361.3333333334, ans=0.0 2024-09-19 00:25:21,709 INFO [train.py:1198] (1/2) Epoch 30, batch 3600, loss[loss=0.2019, simple_loss=0.254, pruned_loss=0.05534, ctc_loss=0.1179, cr_loss=0.3893, over 34481.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.2676, pruned_loss=0.05919, ctc_loss=0.1251, cr_loss=0.4008, over 6765799.68 frames. ], batch size: 90, lr: 3.96e-03, grad_scale: 32.0 2024-09-19 00:25:29,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=541408.0, ans=0.0 2024-09-19 00:25:32,873 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.121e+02 2.507e+02 2.844e+02 3.523e+02 6.739e+02, threshold=5.689e+02, percent-clipped=2.0 2024-09-19 00:25:39,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=541454.6666666666, ans=0.125 2024-09-19 00:25:48,811 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2024-09-19 00:26:17,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=541548.0, ans=0.125 2024-09-19 00:26:21,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=541548.0, ans=0.025 2024-09-19 00:26:40,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=541594.6666666666, ans=0.0 2024-09-19 00:26:43,625 INFO [train.py:1198] (1/2) Epoch 30, batch 3650, loss[loss=0.234, simple_loss=0.2881, pruned_loss=0.06707, ctc_loss=0.141, cr_loss=0.4374, over 34498.00 frames. ], tot_loss[loss=0.2128, simple_loss=0.267, pruned_loss=0.05889, ctc_loss=0.1246, cr_loss=0.3991, over 6768681.02 frames. ], batch size: 110, lr: 3.96e-03, grad_scale: 16.0 2024-09-19 00:27:04,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=541688.0, ans=0.125 2024-09-19 00:27:09,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=541688.0, ans=0.0 2024-09-19 00:27:30,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.00 vs. limit=15.0 2024-09-19 00:27:52,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541828.0, ans=0.1 2024-09-19 00:28:03,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2024-09-19 00:28:03,573 INFO [train.py:1198] (1/2) Epoch 30, batch 3700, loss[loss=0.217, simple_loss=0.2748, pruned_loss=0.0594, ctc_loss=0.1245, cr_loss=0.3884, over 34628.00 frames. ], tot_loss[loss=0.2127, simple_loss=0.267, pruned_loss=0.05877, ctc_loss=0.1244, cr_loss=0.3989, over 6783388.22 frames. ], batch size: 102, lr: 3.95e-03, grad_scale: 16.0 2024-09-19 00:28:08,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=541874.6666666666, ans=0.0 2024-09-19 00:28:13,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=541874.6666666666, ans=0.0 2024-09-19 00:28:16,400 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.985e+02 2.518e+02 2.956e+02 3.890e+02 8.491e+02, threshold=5.912e+02, percent-clipped=4.0 2024-09-19 00:28:16,659 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:28:34,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=541968.0, ans=0.1 2024-09-19 00:28:37,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=541968.0, ans=0.0 2024-09-19 00:29:05,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=542014.6666666666, ans=0.0 2024-09-19 00:29:24,844 INFO [train.py:1198] (1/2) Epoch 30, batch 3750, loss[loss=0.2234, simple_loss=0.2767, pruned_loss=0.06348, ctc_loss=0.1319, cr_loss=0.4209, over 34318.00 frames. ], tot_loss[loss=0.2154, simple_loss=0.2699, pruned_loss=0.05973, ctc_loss=0.1262, cr_loss=0.4032, over 6785214.12 frames. ], batch size: 113, lr: 3.95e-03, grad_scale: 16.0 2024-09-19 00:30:03,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.75 vs. limit=22.5 2024-09-19 00:30:40,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=542294.6666666666, ans=0.95 2024-09-19 00:30:46,480 INFO [train.py:1198] (1/2) Epoch 30, batch 3800, loss[loss=0.2468, simple_loss=0.2938, pruned_loss=0.07503, ctc_loss=0.1565, cr_loss=0.4609, over 30077.00 frames. ], tot_loss[loss=0.2184, simple_loss=0.2725, pruned_loss=0.06109, ctc_loss=0.1289, cr_loss=0.4086, over 6676586.00 frames. ], batch size: 175, lr: 3.95e-03, grad_scale: 16.0 2024-09-19 00:30:53,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=542341.3333333334, ans=0.0 2024-09-19 00:30:59,834 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.180e+02 2.368e+02 2.552e+02 2.773e+02 4.033e+02, threshold=5.104e+02, percent-clipped=0.0 2024-09-19 00:31:05,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=542388.0, ans=0.125 2024-09-19 00:31:41,334 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:32:04,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=542528.0, ans=0.0 2024-09-19 00:32:06,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=542528.0, ans=0.2 2024-09-19 00:32:10,562 INFO [train.py:1198] (1/2) Epoch 30, batch 3850, loss[loss=0.2247, simple_loss=0.276, pruned_loss=0.06499, ctc_loss=0.1379, cr_loss=0.3976, over 24579.00 frames. ], tot_loss[loss=0.2218, simple_loss=0.2746, pruned_loss=0.06301, ctc_loss=0.1329, cr_loss=0.4124, over 6252646.36 frames. ], batch size: 245, lr: 3.95e-03, grad_scale: 16.0 2024-09-19 00:32:23,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=22.5 2024-09-19 00:32:44,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=542668.0, ans=0.0 2024-09-19 00:32:45,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.46 vs. limit=22.5 2024-09-19 00:33:39,452 INFO [train.py:1198] (1/2) Epoch 31, batch 0, loss[loss=0.199, simple_loss=0.2542, pruned_loss=0.05279, ctc_loss=0.1143, cr_loss=0.3818, over 34483.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2542, pruned_loss=0.05279, ctc_loss=0.1143, cr_loss=0.3818, over 34483.00 frames. ], batch size: 85, lr: 3.89e-03, grad_scale: 32.0 2024-09-19 00:33:39,453 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 00:33:56,288 INFO [train.py:1230] (1/2) Epoch 31, validation: loss=0.1492, simple_loss=0.2449, pruned_loss=0.02277, ctc_loss=0.0398, cr_loss=2.015e-14, over 944034.00 frames. 2024-09-19 00:33:56,289 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 00:33:58,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=542700.6666666666, ans=0.2 2024-09-19 00:34:31,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=542794.0, ans=0.07 2024-09-19 00:34:41,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=542794.0, ans=0.125 2024-09-19 00:34:49,854 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.215e+02 2.673e+02 2.845e+02 3.278e+02 7.131e+02, threshold=5.691e+02, percent-clipped=5.0 2024-09-19 00:35:16,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=542887.3333333334, ans=0.2 2024-09-19 00:35:16,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=542887.3333333334, ans=0.0 2024-09-19 00:35:21,299 INFO [train.py:1198] (1/2) Epoch 31, batch 50, loss[loss=0.1857, simple_loss=0.2419, pruned_loss=0.04755, ctc_loss=0.1016, cr_loss=0.3514, over 34473.00 frames. ], tot_loss[loss=0.2157, simple_loss=0.2694, pruned_loss=0.06021, ctc_loss=0.1266, cr_loss=0.4055, over 1480634.50 frames. ], batch size: 82, lr: 3.89e-03, grad_scale: 32.0 2024-09-19 00:35:23,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=542934.0, ans=0.025 2024-09-19 00:35:31,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=542934.0, ans=0.0 2024-09-19 00:35:55,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=543027.3333333334, ans=0.0 2024-09-19 00:35:56,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=543027.3333333334, ans=0.0 2024-09-19 00:36:29,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=543120.6666666666, ans=0.125 2024-09-19 00:36:40,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=543120.6666666666, ans=0.04949747468305833 2024-09-19 00:36:42,187 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 00:36:44,917 INFO [train.py:1198] (1/2) Epoch 31, batch 100, loss[loss=0.2068, simple_loss=0.259, pruned_loss=0.05755, ctc_loss=0.1196, cr_loss=0.3907, over 34581.00 frames. ], tot_loss[loss=0.2162, simple_loss=0.2705, pruned_loss=0.06009, ctc_loss=0.1268, cr_loss=0.4065, over 2629537.10 frames. ], batch size: 89, lr: 3.88e-03, grad_scale: 32.0 2024-09-19 00:36:45,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=543167.3333333334, ans=0.125 2024-09-19 00:37:08,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=543214.0, ans=0.125 2024-09-19 00:37:21,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=543260.6666666666, ans=0.125 2024-09-19 00:37:30,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.47 vs. limit=15.0 2024-09-19 00:37:35,801 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.100e+02 2.573e+02 2.856e+02 3.612e+02 5.834e+02, threshold=5.713e+02, percent-clipped=2.0 2024-09-19 00:37:40,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=543307.3333333334, ans=0.2 2024-09-19 00:38:06,777 INFO [train.py:1198] (1/2) Epoch 31, batch 150, loss[loss=0.1797, simple_loss=0.2375, pruned_loss=0.04426, ctc_loss=0.09829, cr_loss=0.3427, over 34472.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.2679, pruned_loss=0.05876, ctc_loss=0.1245, cr_loss=0.4011, over 3557563.92 frames. ], batch size: 82, lr: 3.88e-03, grad_scale: 32.0 2024-09-19 00:38:18,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2024-09-19 00:38:27,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=543447.3333333334, ans=0.2 2024-09-19 00:38:27,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=543447.3333333334, ans=0.125 2024-09-19 00:38:28,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.46 vs. limit=6.0 2024-09-19 00:38:30,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.11 vs. limit=15.0 2024-09-19 00:38:36,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=543447.3333333334, ans=0.1 2024-09-19 00:38:38,555 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2024-09-19 00:39:15,866 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=12.0 2024-09-19 00:39:31,283 INFO [train.py:1198] (1/2) Epoch 31, batch 200, loss[loss=0.2306, simple_loss=0.2832, pruned_loss=0.06658, ctc_loss=0.1393, cr_loss=0.4223, over 31674.00 frames. ], tot_loss[loss=0.2129, simple_loss=0.2672, pruned_loss=0.05882, ctc_loss=0.1244, cr_loss=0.3999, over 4272356.64 frames. ], batch size: 145, lr: 3.88e-03, grad_scale: 32.0 2024-09-19 00:39:34,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=543634.0, ans=0.125 2024-09-19 00:39:37,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.85 vs. limit=22.5 2024-09-19 00:39:39,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=543634.0, ans=0.0 2024-09-19 00:40:07,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=543727.3333333334, ans=0.0 2024-09-19 00:40:11,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=543727.3333333334, ans=0.125 2024-09-19 00:40:16,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2024-09-19 00:40:22,044 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.71 vs. limit=15.0 2024-09-19 00:40:23,986 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.147e+02 2.525e+02 2.994e+02 4.153e+02 8.415e+02, threshold=5.988e+02, percent-clipped=7.0 2024-09-19 00:40:34,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=15.0 2024-09-19 00:40:40,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.44 vs. limit=15.0 2024-09-19 00:40:42,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=543820.6666666666, ans=0.025 2024-09-19 00:40:55,742 INFO [train.py:1198] (1/2) Epoch 31, batch 250, loss[loss=0.2256, simple_loss=0.2806, pruned_loss=0.06347, ctc_loss=0.1341, cr_loss=0.423, over 34197.00 frames. ], tot_loss[loss=0.2135, simple_loss=0.2677, pruned_loss=0.05915, ctc_loss=0.1248, cr_loss=0.4011, over 4834727.45 frames. ], batch size: 117, lr: 3.88e-03, grad_scale: 32.0 2024-09-19 00:41:27,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=543960.6666666666, ans=0.0 2024-09-19 00:41:32,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=543960.6666666666, ans=0.1 2024-09-19 00:41:34,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=543960.6666666666, ans=0.1 2024-09-19 00:41:37,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=543960.6666666666, ans=0.025 2024-09-19 00:41:37,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=543960.6666666666, ans=0.0 2024-09-19 00:42:11,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=544054.0, ans=0.125 2024-09-19 00:42:14,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-09-19 00:42:16,774 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.56 vs. limit=6.0 2024-09-19 00:42:20,749 INFO [train.py:1198] (1/2) Epoch 31, batch 300, loss[loss=0.2325, simple_loss=0.2862, pruned_loss=0.06734, ctc_loss=0.1355, cr_loss=0.4262, over 34319.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.2667, pruned_loss=0.05869, ctc_loss=0.124, cr_loss=0.3993, over 5264289.86 frames. ], batch size: 107, lr: 3.88e-03, grad_scale: 32.0 2024-09-19 00:42:21,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=544100.6666666666, ans=0.0 2024-09-19 00:42:29,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=544100.6666666666, ans=0.1 2024-09-19 00:42:33,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=544100.6666666666, ans=0.125 2024-09-19 00:42:50,211 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.85 vs. limit=15.0 2024-09-19 00:43:12,159 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 2.651e+02 3.157e+02 3.877e+02 6.911e+02, threshold=6.314e+02, percent-clipped=5.0 2024-09-19 00:43:19,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=544240.6666666666, ans=0.2 2024-09-19 00:43:30,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=544287.3333333334, ans=0.035 2024-09-19 00:43:34,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.70 vs. limit=15.0 2024-09-19 00:43:43,538 INFO [train.py:1198] (1/2) Epoch 31, batch 350, loss[loss=0.1873, simple_loss=0.2407, pruned_loss=0.04933, ctc_loss=0.1073, cr_loss=0.3473, over 34287.00 frames. ], tot_loss[loss=0.2128, simple_loss=0.2671, pruned_loss=0.05881, ctc_loss=0.1242, cr_loss=0.3994, over 5598430.82 frames. ], batch size: 83, lr: 3.88e-03, grad_scale: 32.0 2024-09-19 00:43:57,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=544334.0, ans=0.2 2024-09-19 00:44:12,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=544380.6666666666, ans=0.04949747468305833 2024-09-19 00:44:20,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=544427.3333333334, ans=0.0 2024-09-19 00:44:26,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=544427.3333333334, ans=0.025 2024-09-19 00:44:28,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=544427.3333333334, ans=0.125 2024-09-19 00:44:58,328 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.03 vs. limit=10.0 2024-09-19 00:45:04,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=544520.6666666666, ans=0.1 2024-09-19 00:45:07,149 INFO [train.py:1198] (1/2) Epoch 31, batch 400, loss[loss=0.2009, simple_loss=0.2569, pruned_loss=0.05341, ctc_loss=0.1146, cr_loss=0.3781, over 34428.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.2666, pruned_loss=0.05865, ctc_loss=0.124, cr_loss=0.3994, over 5864603.96 frames. ], batch size: 95, lr: 3.88e-03, grad_scale: 32.0 2024-09-19 00:45:22,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=544614.0, ans=0.1 2024-09-19 00:45:27,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=544614.0, ans=0.0 2024-09-19 00:45:37,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=544614.0, ans=0.125 2024-09-19 00:45:42,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=544660.6666666666, ans=0.0 2024-09-19 00:45:52,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=544660.6666666666, ans=0.125 2024-09-19 00:45:57,139 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.82 vs. limit=22.5 2024-09-19 00:46:00,342 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.41 vs. limit=10.0 2024-09-19 00:46:01,144 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.948e+02 2.372e+02 2.700e+02 3.390e+02 5.181e+02, threshold=5.400e+02, percent-clipped=0.0 2024-09-19 00:46:14,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=544754.0, ans=0.1 2024-09-19 00:46:16,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=544754.0, ans=0.125 2024-09-19 00:46:16,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=544754.0, ans=0.125 2024-09-19 00:46:32,644 INFO [train.py:1198] (1/2) Epoch 31, batch 450, loss[loss=0.2298, simple_loss=0.2816, pruned_loss=0.06661, ctc_loss=0.137, cr_loss=0.4335, over 34699.00 frames. ], tot_loss[loss=0.2126, simple_loss=0.2669, pruned_loss=0.05872, ctc_loss=0.124, cr_loss=0.3994, over 6055437.33 frames. ], batch size: 97, lr: 3.88e-03, grad_scale: 32.0 2024-09-19 00:46:33,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=544800.6666666666, ans=0.125 2024-09-19 00:46:36,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=544800.6666666666, ans=0.125 2024-09-19 00:46:38,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=544800.6666666666, ans=15.0 2024-09-19 00:46:51,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=544847.3333333334, ans=0.2 2024-09-19 00:47:02,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=544847.3333333334, ans=0.025 2024-09-19 00:47:10,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=544894.0, ans=0.0 2024-09-19 00:47:11,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=544894.0, ans=0.125 2024-09-19 00:47:12,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=544894.0, ans=0.125 2024-09-19 00:47:13,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.36 vs. limit=10.0 2024-09-19 00:47:38,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=544940.6666666666, ans=0.07 2024-09-19 00:47:57,514 INFO [train.py:1198] (1/2) Epoch 31, batch 500, loss[loss=0.2417, simple_loss=0.2926, pruned_loss=0.07151, ctc_loss=0.148, cr_loss=0.4539, over 34464.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2662, pruned_loss=0.05853, ctc_loss=0.1236, cr_loss=0.3987, over 6220253.74 frames. ], batch size: 110, lr: 3.88e-03, grad_scale: 32.0 2024-09-19 00:48:05,135 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2024-09-19 00:48:26,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=545080.6666666666, ans=0.04949747468305833 2024-09-19 00:48:32,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=545127.3333333334, ans=0.125 2024-09-19 00:48:41,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=545127.3333333334, ans=0.025 2024-09-19 00:48:49,249 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.016e+02 2.479e+02 2.920e+02 4.019e+02 7.754e+02, threshold=5.839e+02, percent-clipped=8.0 2024-09-19 00:49:20,612 INFO [train.py:1198] (1/2) Epoch 31, batch 550, loss[loss=0.2285, simple_loss=0.2837, pruned_loss=0.06437, ctc_loss=0.135, cr_loss=0.439, over 33710.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.2664, pruned_loss=0.05858, ctc_loss=0.1238, cr_loss=0.3986, over 6327609.58 frames. ], batch size: 122, lr: 3.88e-03, grad_scale: 32.0 2024-09-19 00:50:23,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=545407.3333333334, ans=0.0 2024-09-19 00:50:42,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=545454.0, ans=0.0 2024-09-19 00:50:46,007 INFO [train.py:1198] (1/2) Epoch 31, batch 600, loss[loss=0.2167, simple_loss=0.2721, pruned_loss=0.06018, ctc_loss=0.1249, cr_loss=0.3972, over 34231.00 frames. ], tot_loss[loss=0.2128, simple_loss=0.267, pruned_loss=0.05889, ctc_loss=0.1244, cr_loss=0.3998, over 6429994.55 frames. ], batch size: 117, lr: 3.88e-03, grad_scale: 32.0 2024-09-19 00:50:52,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=545500.6666666666, ans=0.07 2024-09-19 00:51:09,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=545547.3333333334, ans=0.125 2024-09-19 00:51:17,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=545594.0, ans=0.025 2024-09-19 00:51:38,584 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.081e+02 2.485e+02 2.891e+02 3.838e+02 7.622e+02, threshold=5.781e+02, percent-clipped=7.0 2024-09-19 00:51:39,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.85 vs. limit=6.0 2024-09-19 00:51:42,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=545640.6666666666, ans=0.07 2024-09-19 00:51:47,818 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.98 vs. limit=15.0 2024-09-19 00:51:58,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=545687.3333333334, ans=0.125 2024-09-19 00:52:09,847 INFO [train.py:1198] (1/2) Epoch 31, batch 650, loss[loss=0.2099, simple_loss=0.2659, pruned_loss=0.05658, ctc_loss=0.123, cr_loss=0.4016, over 34535.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2664, pruned_loss=0.05845, ctc_loss=0.1237, cr_loss=0.3985, over 6521666.66 frames. ], batch size: 94, lr: 3.88e-03, grad_scale: 32.0 2024-09-19 00:52:11,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=545734.0, ans=0.125 2024-09-19 00:52:52,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=545827.3333333334, ans=0.125 2024-09-19 00:52:55,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=545827.3333333334, ans=0.0 2024-09-19 00:52:59,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=545874.0, ans=0.2 2024-09-19 00:53:27,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=545920.6666666666, ans=0.0 2024-09-19 00:53:31,871 INFO [train.py:1198] (1/2) Epoch 31, batch 700, loss[loss=0.2038, simple_loss=0.2576, pruned_loss=0.05549, ctc_loss=0.1186, cr_loss=0.3846, over 34579.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2665, pruned_loss=0.05838, ctc_loss=0.1236, cr_loss=0.3986, over 6578141.12 frames. ], batch size: 89, lr: 3.87e-03, grad_scale: 16.0 2024-09-19 00:53:32,550 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.15 vs. limit=15.0 2024-09-19 00:54:19,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=546060.6666666666, ans=0.0 2024-09-19 00:54:27,241 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.072e+02 2.415e+02 2.682e+02 3.600e+02 5.147e+02, threshold=5.364e+02, percent-clipped=0.0 2024-09-19 00:54:28,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.30 vs. limit=10.0 2024-09-19 00:54:32,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=546107.3333333334, ans=0.0 2024-09-19 00:54:52,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=546154.0, ans=0.0 2024-09-19 00:54:54,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=546154.0, ans=0.125 2024-09-19 00:54:56,871 INFO [train.py:1198] (1/2) Epoch 31, batch 750, loss[loss=0.2122, simple_loss=0.2668, pruned_loss=0.05852, ctc_loss=0.1246, cr_loss=0.3907, over 34423.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.2662, pruned_loss=0.05829, ctc_loss=0.1235, cr_loss=0.398, over 6623347.19 frames. ], batch size: 95, lr: 3.87e-03, grad_scale: 16.0 2024-09-19 00:55:20,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.67 vs. limit=22.5 2024-09-19 00:56:21,042 INFO [train.py:1198] (1/2) Epoch 31, batch 800, loss[loss=0.182, simple_loss=0.2368, pruned_loss=0.04637, ctc_loss=0.1013, cr_loss=0.3556, over 34464.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2665, pruned_loss=0.05844, ctc_loss=0.1237, cr_loss=0.3992, over 6657797.70 frames. ], batch size: 85, lr: 3.87e-03, grad_scale: 32.0 2024-09-19 00:56:49,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=546480.6666666666, ans=0.2 2024-09-19 00:57:00,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=546527.3333333334, ans=0.125 2024-09-19 00:57:01,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=12.0 2024-09-19 00:57:13,682 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.996e+02 2.456e+02 3.015e+02 3.730e+02 6.700e+02, threshold=6.031e+02, percent-clipped=4.0 2024-09-19 00:57:28,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=546620.6666666666, ans=0.2 2024-09-19 00:57:33,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=546620.6666666666, ans=0.125 2024-09-19 00:57:42,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=546620.6666666666, ans=0.125 2024-09-19 00:57:45,148 INFO [train.py:1198] (1/2) Epoch 31, batch 850, loss[loss=0.2182, simple_loss=0.2765, pruned_loss=0.05895, ctc_loss=0.1273, cr_loss=0.4167, over 34407.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.2662, pruned_loss=0.05825, ctc_loss=0.1233, cr_loss=0.3985, over 6690385.84 frames. ], batch size: 103, lr: 3.87e-03, grad_scale: 32.0 2024-09-19 00:57:52,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=546667.3333333334, ans=0.05 2024-09-19 00:57:53,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=15.0 2024-09-19 00:58:23,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=546760.6666666666, ans=0.125 2024-09-19 00:58:41,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=546807.3333333334, ans=0.125 2024-09-19 00:59:00,582 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.44 vs. limit=15.0 2024-09-19 00:59:09,449 INFO [train.py:1198] (1/2) Epoch 31, batch 900, loss[loss=0.1905, simple_loss=0.2479, pruned_loss=0.04913, ctc_loss=0.1059, cr_loss=0.3415, over 34504.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2663, pruned_loss=0.05843, ctc_loss=0.1236, cr_loss=0.3986, over 6698400.69 frames. ], batch size: 85, lr: 3.87e-03, grad_scale: 16.0 2024-09-19 00:59:16,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2024-09-19 00:59:41,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=546994.0, ans=0.0 2024-09-19 00:59:46,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=546994.0, ans=0.0 2024-09-19 01:00:04,357 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.090e+02 2.538e+02 2.934e+02 3.609e+02 5.947e+02, threshold=5.867e+02, percent-clipped=0.0 2024-09-19 01:00:11,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=547040.6666666666, ans=0.0 2024-09-19 01:00:27,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=547087.3333333334, ans=0.125 2024-09-19 01:00:32,084 INFO [train.py:1198] (1/2) Epoch 31, batch 950, loss[loss=0.2115, simple_loss=0.2639, pruned_loss=0.05876, ctc_loss=0.1239, cr_loss=0.4189, over 34694.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.2662, pruned_loss=0.05834, ctc_loss=0.1235, cr_loss=0.3982, over 6702363.73 frames. ], batch size: 87, lr: 3.87e-03, grad_scale: 16.0 2024-09-19 01:00:39,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=547134.0, ans=0.125 2024-09-19 01:00:48,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.30 vs. limit=15.0 2024-09-19 01:01:08,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=547227.3333333334, ans=0.125 2024-09-19 01:01:30,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=547274.0, ans=0.1 2024-09-19 01:01:31,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.05 vs. limit=15.0 2024-09-19 01:01:40,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=547320.6666666666, ans=0.125 2024-09-19 01:01:40,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=547320.6666666666, ans=0.125 2024-09-19 01:01:50,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=547320.6666666666, ans=0.2 2024-09-19 01:01:56,541 INFO [train.py:1198] (1/2) Epoch 31, batch 1000, loss[loss=0.201, simple_loss=0.2544, pruned_loss=0.0545, ctc_loss=0.1155, cr_loss=0.3863, over 34473.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.2667, pruned_loss=0.0585, ctc_loss=0.1238, cr_loss=0.3991, over 6695382.47 frames. ], batch size: 90, lr: 3.87e-03, grad_scale: 16.0 2024-09-19 01:02:16,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=547414.0, ans=0.0 2024-09-19 01:02:18,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=547414.0, ans=0.125 2024-09-19 01:02:29,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.28 vs. limit=15.0 2024-09-19 01:02:31,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=547460.6666666666, ans=0.125 2024-09-19 01:02:43,633 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:02:52,951 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.130e+02 2.713e+02 3.396e+02 4.361e+02 6.708e+02, threshold=6.791e+02, percent-clipped=7.0 2024-09-19 01:03:09,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=547554.0, ans=0.125 2024-09-19 01:03:21,057 INFO [train.py:1198] (1/2) Epoch 31, batch 1050, loss[loss=0.2238, simple_loss=0.278, pruned_loss=0.06324, ctc_loss=0.1322, cr_loss=0.4162, over 34556.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2662, pruned_loss=0.0585, ctc_loss=0.1239, cr_loss=0.3989, over 6704623.17 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 16.0 2024-09-19 01:03:49,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=547647.3333333334, ans=0.125 2024-09-19 01:04:07,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=547694.0, ans=0.0 2024-09-19 01:04:14,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=547740.6666666666, ans=0.5 2024-09-19 01:04:16,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=547740.6666666666, ans=0.125 2024-09-19 01:04:17,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=547740.6666666666, ans=0.035 2024-09-19 01:04:43,688 INFO [train.py:1198] (1/2) Epoch 31, batch 1100, loss[loss=0.203, simple_loss=0.2562, pruned_loss=0.05548, ctc_loss=0.1148, cr_loss=0.3952, over 34721.00 frames. ], tot_loss[loss=0.2118, simple_loss=0.2662, pruned_loss=0.05837, ctc_loss=0.1237, cr_loss=0.3981, over 6717512.27 frames. ], batch size: 92, lr: 3.87e-03, grad_scale: 16.0 2024-09-19 01:04:44,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=547834.0, ans=0.1 2024-09-19 01:04:46,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.21 vs. limit=15.0 2024-09-19 01:04:52,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=547834.0, ans=0.125 2024-09-19 01:05:03,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=547880.6666666666, ans=0.0 2024-09-19 01:05:07,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=547880.6666666666, ans=0.2 2024-09-19 01:05:22,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=547927.3333333334, ans=0.125 2024-09-19 01:05:40,228 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.440e+02 2.885e+02 3.694e+02 5.558e+02, threshold=5.770e+02, percent-clipped=0.0 2024-09-19 01:05:42,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=547974.0, ans=0.125 2024-09-19 01:05:54,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=548020.6666666666, ans=15.0 2024-09-19 01:05:55,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=548020.6666666666, ans=0.125 2024-09-19 01:06:08,231 INFO [train.py:1198] (1/2) Epoch 31, batch 1150, loss[loss=0.2028, simple_loss=0.2591, pruned_loss=0.05456, ctc_loss=0.1158, cr_loss=0.3591, over 34357.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.2665, pruned_loss=0.05853, ctc_loss=0.1241, cr_loss=0.3982, over 6715032.17 frames. ], batch size: 91, lr: 3.87e-03, grad_scale: 16.0 2024-09-19 01:06:10,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=548067.3333333334, ans=0.025 2024-09-19 01:07:00,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=548207.3333333334, ans=0.125 2024-09-19 01:07:26,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=548254.0, ans=0.125 2024-09-19 01:07:32,933 INFO [train.py:1198] (1/2) Epoch 31, batch 1200, loss[loss=0.2214, simple_loss=0.2784, pruned_loss=0.06098, ctc_loss=0.1325, cr_loss=0.3983, over 34545.00 frames. ], tot_loss[loss=0.2125, simple_loss=0.2669, pruned_loss=0.05861, ctc_loss=0.1243, cr_loss=0.3987, over 6707109.73 frames. ], batch size: 99, lr: 3.87e-03, grad_scale: 32.0 2024-09-19 01:07:33,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=548300.6666666666, ans=0.125 2024-09-19 01:07:46,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=548300.6666666666, ans=0.0 2024-09-19 01:07:52,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=548347.3333333334, ans=0.125 2024-09-19 01:08:05,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=548394.0, ans=0.125 2024-09-19 01:08:14,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=548394.0, ans=0.0 2024-09-19 01:08:27,104 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.107e+02 2.448e+02 2.848e+02 3.863e+02 6.755e+02, threshold=5.697e+02, percent-clipped=6.0 2024-09-19 01:08:52,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=548487.3333333334, ans=0.1 2024-09-19 01:08:57,367 INFO [train.py:1198] (1/2) Epoch 31, batch 1250, loss[loss=0.2394, simple_loss=0.2919, pruned_loss=0.06986, ctc_loss=0.1443, cr_loss=0.4573, over 34339.00 frames. ], tot_loss[loss=0.213, simple_loss=0.2675, pruned_loss=0.05883, ctc_loss=0.1246, cr_loss=0.4, over 6741604.33 frames. ], batch size: 107, lr: 3.87e-03, grad_scale: 32.0 2024-09-19 01:09:09,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=548534.0, ans=0.2 2024-09-19 01:09:34,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.21 vs. limit=10.0 2024-09-19 01:10:21,472 INFO [train.py:1198] (1/2) Epoch 31, batch 1300, loss[loss=0.208, simple_loss=0.2699, pruned_loss=0.05406, ctc_loss=0.1141, cr_loss=0.3764, over 33101.00 frames. ], tot_loss[loss=0.2123, simple_loss=0.2668, pruned_loss=0.05856, ctc_loss=0.124, cr_loss=0.399, over 6745157.50 frames. ], batch size: 130, lr: 3.86e-03, grad_scale: 32.0 2024-09-19 01:10:31,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=548767.3333333334, ans=0.0 2024-09-19 01:10:33,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=548767.3333333334, ans=0.125 2024-09-19 01:10:56,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=548860.6666666666, ans=0.95 2024-09-19 01:11:08,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=548860.6666666666, ans=0.0 2024-09-19 01:11:16,384 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.057e+02 2.390e+02 2.864e+02 3.545e+02 6.385e+02, threshold=5.728e+02, percent-clipped=2.0 2024-09-19 01:11:16,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=548907.3333333334, ans=0.0 2024-09-19 01:11:25,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=22.5 2024-09-19 01:11:44,300 INFO [train.py:1198] (1/2) Epoch 31, batch 1350, loss[loss=0.2163, simple_loss=0.2702, pruned_loss=0.06024, ctc_loss=0.1278, cr_loss=0.4099, over 34538.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.2666, pruned_loss=0.05849, ctc_loss=0.1238, cr_loss=0.399, over 6766535.91 frames. ], batch size: 94, lr: 3.86e-03, grad_scale: 32.0 2024-09-19 01:11:46,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=549000.6666666666, ans=0.0 2024-09-19 01:11:52,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=549000.6666666666, ans=0.0 2024-09-19 01:11:53,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.75 vs. limit=22.5 2024-09-19 01:11:55,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=549000.6666666666, ans=0.025 2024-09-19 01:11:57,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=549000.6666666666, ans=0.2 2024-09-19 01:12:00,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=549047.3333333334, ans=0.125 2024-09-19 01:12:05,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=549047.3333333334, ans=0.0 2024-09-19 01:12:09,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.73 vs. limit=15.0 2024-09-19 01:12:10,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=549047.3333333334, ans=0.2 2024-09-19 01:12:18,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=549094.0, ans=0.125 2024-09-19 01:12:27,511 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.42 vs. limit=10.0 2024-09-19 01:12:36,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=549140.6666666666, ans=0.0 2024-09-19 01:12:40,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=549140.6666666666, ans=0.05 2024-09-19 01:12:44,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=549140.6666666666, ans=0.125 2024-09-19 01:12:56,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=549187.3333333334, ans=0.1 2024-09-19 01:13:08,333 INFO [train.py:1198] (1/2) Epoch 31, batch 1400, loss[loss=0.1683, simple_loss=0.2221, pruned_loss=0.04135, ctc_loss=0.09204, cr_loss=0.3344, over 34280.00 frames. ], tot_loss[loss=0.2118, simple_loss=0.2664, pruned_loss=0.05829, ctc_loss=0.1234, cr_loss=0.3982, over 6778576.86 frames. ], batch size: 80, lr: 3.86e-03, grad_scale: 32.0 2024-09-19 01:13:16,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=549234.0, ans=0.125 2024-09-19 01:13:58,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=549374.0, ans=0.1 2024-09-19 01:14:04,355 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.183e+02 2.539e+02 3.134e+02 3.832e+02 5.576e+02, threshold=6.268e+02, percent-clipped=0.0 2024-09-19 01:14:15,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=549420.6666666666, ans=0.1 2024-09-19 01:14:29,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=549420.6666666666, ans=0.0 2024-09-19 01:14:32,404 INFO [train.py:1198] (1/2) Epoch 31, batch 1450, loss[loss=0.2341, simple_loss=0.2884, pruned_loss=0.06755, ctc_loss=0.1387, cr_loss=0.4259, over 34474.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.2668, pruned_loss=0.05845, ctc_loss=0.1237, cr_loss=0.3983, over 6774728.02 frames. ], batch size: 110, lr: 3.86e-03, grad_scale: 32.0 2024-09-19 01:14:35,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=549467.3333333334, ans=0.125 2024-09-19 01:14:44,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=549467.3333333334, ans=0.2 2024-09-19 01:14:46,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=15.0 2024-09-19 01:15:11,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=549560.6666666666, ans=15.0 2024-09-19 01:15:23,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=549607.3333333334, ans=0.025 2024-09-19 01:15:28,830 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.68 vs. limit=15.0 2024-09-19 01:15:44,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=549654.0, ans=0.125 2024-09-19 01:15:54,721 INFO [train.py:1198] (1/2) Epoch 31, batch 1500, loss[loss=0.2076, simple_loss=0.2659, pruned_loss=0.0553, ctc_loss=0.1158, cr_loss=0.388, over 34466.00 frames. ], tot_loss[loss=0.2126, simple_loss=0.2674, pruned_loss=0.05855, ctc_loss=0.124, cr_loss=0.3993, over 6773710.36 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 32.0 2024-09-19 01:16:49,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=549840.6666666666, ans=0.125 2024-09-19 01:16:52,140 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.068e+02 2.475e+02 2.841e+02 3.696e+02 5.562e+02, threshold=5.683e+02, percent-clipped=0.0 2024-09-19 01:17:02,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=549887.3333333334, ans=0.035 2024-09-19 01:17:20,080 INFO [train.py:1198] (1/2) Epoch 31, batch 1550, loss[loss=0.2171, simple_loss=0.274, pruned_loss=0.05959, ctc_loss=0.1264, cr_loss=0.395, over 34442.00 frames. ], tot_loss[loss=0.2129, simple_loss=0.2674, pruned_loss=0.05878, ctc_loss=0.1244, cr_loss=0.4001, over 6745719.57 frames. ], batch size: 105, lr: 3.86e-03, grad_scale: 32.0 2024-09-19 01:17:25,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=549934.0, ans=0.0 2024-09-19 01:17:25,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=549934.0, ans=0.0 2024-09-19 01:18:00,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=550027.3333333334, ans=0.2 2024-09-19 01:18:07,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.61 vs. limit=15.0 2024-09-19 01:18:44,121 INFO [train.py:1198] (1/2) Epoch 31, batch 1600, loss[loss=0.1982, simple_loss=0.2608, pruned_loss=0.04925, ctc_loss=0.1094, cr_loss=0.379, over 34558.00 frames. ], tot_loss[loss=0.2126, simple_loss=0.2671, pruned_loss=0.05867, ctc_loss=0.1242, cr_loss=0.3991, over 6723932.28 frames. ], batch size: 99, lr: 3.86e-03, grad_scale: 32.0 2024-09-19 01:18:51,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=550167.3333333334, ans=0.2 2024-09-19 01:19:27,354 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:19:38,293 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.114e+02 2.432e+02 2.748e+02 3.243e+02 6.022e+02, threshold=5.497e+02, percent-clipped=2.0 2024-09-19 01:20:04,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=550400.6666666666, ans=0.125 2024-09-19 01:20:06,242 INFO [train.py:1198] (1/2) Epoch 31, batch 1650, loss[loss=0.2296, simple_loss=0.2857, pruned_loss=0.06438, ctc_loss=0.1387, cr_loss=0.4243, over 34379.00 frames. ], tot_loss[loss=0.2125, simple_loss=0.267, pruned_loss=0.05855, ctc_loss=0.1242, cr_loss=0.3996, over 6716922.01 frames. ], batch size: 103, lr: 3.86e-03, grad_scale: 32.0 2024-09-19 01:20:33,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=550447.3333333334, ans=0.125 2024-09-19 01:20:45,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=10.03 vs. limit=10.0 2024-09-19 01:21:32,675 INFO [train.py:1198] (1/2) Epoch 31, batch 1700, loss[loss=0.1832, simple_loss=0.237, pruned_loss=0.04751, ctc_loss=0.1018, cr_loss=0.3499, over 34296.00 frames. ], tot_loss[loss=0.2123, simple_loss=0.2669, pruned_loss=0.05847, ctc_loss=0.124, cr_loss=0.3993, over 6741516.28 frames. ], batch size: 80, lr: 3.86e-03, grad_scale: 32.0 2024-09-19 01:21:39,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=550634.0, ans=0.0 2024-09-19 01:21:55,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2024-09-19 01:21:56,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=550680.6666666666, ans=0.2 2024-09-19 01:22:09,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.66 vs. limit=15.0 2024-09-19 01:22:15,251 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=6.59 vs. limit=12.0 2024-09-19 01:22:26,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=550774.0, ans=0.2 2024-09-19 01:22:27,346 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.447e+02 2.865e+02 3.475e+02 8.081e+02, threshold=5.730e+02, percent-clipped=5.0 2024-09-19 01:22:27,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=550774.0, ans=0.125 2024-09-19 01:22:29,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=550774.0, ans=0.2 2024-09-19 01:22:55,470 INFO [train.py:1198] (1/2) Epoch 31, batch 1750, loss[loss=0.1849, simple_loss=0.2373, pruned_loss=0.04854, ctc_loss=0.1051, cr_loss=0.3617, over 34120.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2666, pruned_loss=0.05832, ctc_loss=0.1237, cr_loss=0.399, over 6751324.52 frames. ], batch size: 78, lr: 3.86e-03, grad_scale: 32.0 2024-09-19 01:22:57,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=550867.3333333334, ans=0.1 2024-09-19 01:22:57,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=550867.3333333334, ans=0.0 2024-09-19 01:23:20,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2024-09-19 01:23:41,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=550960.6666666666, ans=0.1 2024-09-19 01:23:43,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=551007.3333333334, ans=0.125 2024-09-19 01:23:53,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2024-09-19 01:24:19,527 INFO [train.py:1198] (1/2) Epoch 31, batch 1800, loss[loss=0.2268, simple_loss=0.2835, pruned_loss=0.06305, ctc_loss=0.1339, cr_loss=0.4284, over 34681.00 frames. ], tot_loss[loss=0.2123, simple_loss=0.2669, pruned_loss=0.05843, ctc_loss=0.124, cr_loss=0.3999, over 6753609.46 frames. ], batch size: 97, lr: 3.86e-03, grad_scale: 32.0 2024-09-19 01:24:21,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=551100.6666666666, ans=0.1 2024-09-19 01:24:34,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=551147.3333333334, ans=0.125 2024-09-19 01:24:52,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=551194.0, ans=0.95 2024-09-19 01:24:53,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.29 vs. limit=15.0 2024-09-19 01:25:15,790 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.984e+02 2.493e+02 3.233e+02 3.789e+02 6.576e+02, threshold=6.465e+02, percent-clipped=6.0 2024-09-19 01:25:17,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=551240.6666666666, ans=0.125 2024-09-19 01:25:26,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=551287.3333333334, ans=0.07 2024-09-19 01:25:44,017 INFO [train.py:1198] (1/2) Epoch 31, batch 1850, loss[loss=0.2164, simple_loss=0.2743, pruned_loss=0.05892, ctc_loss=0.1244, cr_loss=0.3957, over 34454.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2665, pruned_loss=0.05831, ctc_loss=0.1237, cr_loss=0.3993, over 6762296.82 frames. ], batch size: 100, lr: 3.86e-03, grad_scale: 32.0 2024-09-19 01:25:46,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=551334.0, ans=0.125 2024-09-19 01:25:47,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=551334.0, ans=0.0 2024-09-19 01:25:50,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=551334.0, ans=0.2 2024-09-19 01:26:10,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=551380.6666666666, ans=0.0 2024-09-19 01:26:48,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=551520.6666666666, ans=0.0 2024-09-19 01:26:52,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.85 vs. limit=15.0 2024-09-19 01:26:56,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=551520.6666666666, ans=0.0 2024-09-19 01:27:06,240 INFO [train.py:1198] (1/2) Epoch 31, batch 1900, loss[loss=0.2233, simple_loss=0.2804, pruned_loss=0.06179, ctc_loss=0.13, cr_loss=0.4163, over 34366.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.2671, pruned_loss=0.05841, ctc_loss=0.1239, cr_loss=0.3999, over 6772567.95 frames. ], batch size: 103, lr: 3.85e-03, grad_scale: 32.0 2024-09-19 01:27:08,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=551567.3333333334, ans=0.125 2024-09-19 01:27:18,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=551567.3333333334, ans=0.1 2024-09-19 01:27:53,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=551660.6666666666, ans=0.125 2024-09-19 01:28:03,157 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.082e+02 2.499e+02 3.114e+02 3.922e+02 6.324e+02, threshold=6.228e+02, percent-clipped=0.0 2024-09-19 01:28:13,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=551754.0, ans=0.125 2024-09-19 01:28:13,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=551754.0, ans=0.0 2024-09-19 01:28:31,293 INFO [train.py:1198] (1/2) Epoch 31, batch 1950, loss[loss=0.2045, simple_loss=0.2602, pruned_loss=0.05478, ctc_loss=0.119, cr_loss=0.387, over 34380.00 frames. ], tot_loss[loss=0.2134, simple_loss=0.2683, pruned_loss=0.05876, ctc_loss=0.1246, cr_loss=0.4016, over 6789227.00 frames. ], batch size: 91, lr: 3.85e-03, grad_scale: 32.0 2024-09-19 01:28:33,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=551800.6666666666, ans=0.125 2024-09-19 01:28:40,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=551800.6666666666, ans=0.025 2024-09-19 01:28:47,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.92 vs. limit=15.0 2024-09-19 01:29:21,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=551940.6666666666, ans=0.2 2024-09-19 01:29:26,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=551940.6666666666, ans=0.125 2024-09-19 01:29:46,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=551987.3333333334, ans=0.125 2024-09-19 01:29:55,643 INFO [train.py:1198] (1/2) Epoch 31, batch 2000, loss[loss=0.1805, simple_loss=0.2322, pruned_loss=0.04673, ctc_loss=0.1046, cr_loss=0.3589, over 34119.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.2683, pruned_loss=0.0589, ctc_loss=0.1249, cr_loss=0.4018, over 6765520.02 frames. ], batch size: 78, lr: 3.85e-03, grad_scale: 32.0 2024-09-19 01:30:01,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.32 vs. limit=15.0 2024-09-19 01:30:12,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=552080.6666666666, ans=0.0 2024-09-19 01:30:14,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=552080.6666666666, ans=0.1 2024-09-19 01:30:22,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=552080.6666666666, ans=10.0 2024-09-19 01:30:30,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552127.3333333334, ans=0.1 2024-09-19 01:30:46,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=552174.0, ans=0.025 2024-09-19 01:30:49,840 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.873e+02 2.496e+02 2.962e+02 3.899e+02 8.360e+02, threshold=5.924e+02, percent-clipped=4.0 2024-09-19 01:31:06,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552220.6666666666, ans=0.1 2024-09-19 01:31:11,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=552220.6666666666, ans=0.0 2024-09-19 01:31:18,180 INFO [train.py:1198] (1/2) Epoch 31, batch 2050, loss[loss=0.1869, simple_loss=0.2404, pruned_loss=0.04912, ctc_loss=0.1057, cr_loss=0.3488, over 34448.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.2668, pruned_loss=0.05839, ctc_loss=0.1238, cr_loss=0.3988, over 6756189.26 frames. ], batch size: 82, lr: 3.85e-03, grad_scale: 32.0 2024-09-19 01:31:48,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=552314.0, ans=0.125 2024-09-19 01:31:52,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=552360.6666666666, ans=0.125 2024-09-19 01:31:53,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=552360.6666666666, ans=0.125 2024-09-19 01:31:55,680 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:32:09,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=552407.3333333334, ans=12.0 2024-09-19 01:32:10,779 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.95 vs. limit=15.0 2024-09-19 01:32:24,282 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.50 vs. limit=15.0 2024-09-19 01:32:36,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.91 vs. limit=12.0 2024-09-19 01:32:40,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=552454.0, ans=0.0 2024-09-19 01:32:44,832 INFO [train.py:1198] (1/2) Epoch 31, batch 2100, loss[loss=0.2136, simple_loss=0.2679, pruned_loss=0.05957, ctc_loss=0.1225, cr_loss=0.3899, over 34556.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.2663, pruned_loss=0.05821, ctc_loss=0.1234, cr_loss=0.3984, over 6770788.80 frames. ], batch size: 94, lr: 3.85e-03, grad_scale: 32.0 2024-09-19 01:33:00,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=552547.3333333334, ans=0.0 2024-09-19 01:33:00,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=552547.3333333334, ans=0.0 2024-09-19 01:33:08,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552547.3333333334, ans=0.1 2024-09-19 01:33:13,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=552547.3333333334, ans=0.0 2024-09-19 01:33:26,775 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.56 vs. limit=10.0 2024-09-19 01:33:27,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=552594.0, ans=0.125 2024-09-19 01:33:39,026 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.992e+02 2.432e+02 2.941e+02 3.576e+02 8.338e+02, threshold=5.881e+02, percent-clipped=5.0 2024-09-19 01:33:49,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552687.3333333334, ans=0.1 2024-09-19 01:34:06,959 INFO [train.py:1198] (1/2) Epoch 31, batch 2150, loss[loss=0.2099, simple_loss=0.2609, pruned_loss=0.05913, ctc_loss=0.1251, cr_loss=0.3911, over 34356.00 frames. ], tot_loss[loss=0.2111, simple_loss=0.2658, pruned_loss=0.05793, ctc_loss=0.1228, cr_loss=0.3974, over 6789903.24 frames. ], batch size: 91, lr: 3.85e-03, grad_scale: 32.0 2024-09-19 01:34:07,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=552734.0, ans=10.0 2024-09-19 01:34:08,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552734.0, ans=0.1 2024-09-19 01:34:14,035 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:34:17,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=552734.0, ans=0.2 2024-09-19 01:34:27,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.54 vs. limit=22.5 2024-09-19 01:34:42,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=552827.3333333334, ans=12.0 2024-09-19 01:34:43,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=552827.3333333334, ans=0.125 2024-09-19 01:34:58,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=552874.0, ans=0.125 2024-09-19 01:35:16,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=552920.6666666666, ans=0.125 2024-09-19 01:35:23,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=552920.6666666666, ans=15.0 2024-09-19 01:35:29,465 INFO [train.py:1198] (1/2) Epoch 31, batch 2200, loss[loss=0.2271, simple_loss=0.2844, pruned_loss=0.06313, ctc_loss=0.1322, cr_loss=0.4279, over 34464.00 frames. ], tot_loss[loss=0.2111, simple_loss=0.2659, pruned_loss=0.05796, ctc_loss=0.1229, cr_loss=0.3976, over 6784720.30 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 32.0 2024-09-19 01:36:27,608 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.007e+02 2.545e+02 3.068e+02 3.895e+02 1.200e+03, threshold=6.135e+02, percent-clipped=2.0 2024-09-19 01:36:45,166 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.10 vs. limit=15.0 2024-09-19 01:36:46,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=553154.0, ans=0.0 2024-09-19 01:36:51,616 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.27 vs. limit=22.5 2024-09-19 01:36:55,864 INFO [train.py:1198] (1/2) Epoch 31, batch 2250, loss[loss=0.2063, simple_loss=0.262, pruned_loss=0.05548, ctc_loss=0.1194, cr_loss=0.3948, over 34409.00 frames. ], tot_loss[loss=0.211, simple_loss=0.2658, pruned_loss=0.05789, ctc_loss=0.1228, cr_loss=0.3973, over 6781858.72 frames. ], batch size: 95, lr: 3.85e-03, grad_scale: 32.0 2024-09-19 01:37:09,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=553200.6666666666, ans=0.125 2024-09-19 01:37:14,983 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.60 vs. limit=15.0 2024-09-19 01:37:55,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=553340.6666666666, ans=0.125 2024-09-19 01:38:11,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=553387.3333333334, ans=22.5 2024-09-19 01:38:11,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=553387.3333333334, ans=0.125 2024-09-19 01:38:18,221 INFO [train.py:1198] (1/2) Epoch 31, batch 2300, loss[loss=0.1834, simple_loss=0.2373, pruned_loss=0.04765, ctc_loss=0.1023, cr_loss=0.3447, over 34278.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.2646, pruned_loss=0.05751, ctc_loss=0.122, cr_loss=0.3956, over 6767546.37 frames. ], batch size: 83, lr: 3.85e-03, grad_scale: 32.0 2024-09-19 01:38:43,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=553480.6666666666, ans=0.125 2024-09-19 01:38:46,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=553480.6666666666, ans=0.125 2024-09-19 01:38:46,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=553480.6666666666, ans=0.125 2024-09-19 01:39:12,774 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.087e+02 2.486e+02 2.937e+02 3.564e+02 5.834e+02, threshold=5.875e+02, percent-clipped=0.0 2024-09-19 01:39:23,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=553620.6666666666, ans=0.125 2024-09-19 01:39:43,038 INFO [train.py:1198] (1/2) Epoch 31, batch 2350, loss[loss=0.2227, simple_loss=0.2771, pruned_loss=0.06235, ctc_loss=0.1315, cr_loss=0.434, over 34698.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.265, pruned_loss=0.05767, ctc_loss=0.1224, cr_loss=0.3967, over 6773452.67 frames. ], batch size: 97, lr: 3.85e-03, grad_scale: 32.0 2024-09-19 01:40:19,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=553760.6666666666, ans=0.125 2024-09-19 01:40:23,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=553760.6666666666, ans=22.5 2024-09-19 01:40:50,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=553854.0, ans=0.2 2024-09-19 01:40:55,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=553854.0, ans=0.125 2024-09-19 01:41:00,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=553854.0, ans=0.1 2024-09-19 01:41:05,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=553900.6666666666, ans=0.1 2024-09-19 01:41:07,102 INFO [train.py:1198] (1/2) Epoch 31, batch 2400, loss[loss=0.205, simple_loss=0.2595, pruned_loss=0.05582, ctc_loss=0.1167, cr_loss=0.3878, over 34605.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.2658, pruned_loss=0.058, ctc_loss=0.1231, cr_loss=0.3978, over 6777657.50 frames. ], batch size: 89, lr: 3.85e-03, grad_scale: 32.0 2024-09-19 01:41:13,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=553900.6666666666, ans=0.1 2024-09-19 01:41:27,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.00 vs. limit=22.5 2024-09-19 01:41:42,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=553994.0, ans=0.125 2024-09-19 01:41:42,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=553994.0, ans=0.1 2024-09-19 01:41:45,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=553994.0, ans=0.1 2024-09-19 01:42:03,477 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.151e+02 2.423e+02 2.995e+02 3.682e+02 6.097e+02, threshold=5.990e+02, percent-clipped=1.0 2024-09-19 01:42:13,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=554087.3333333334, ans=0.125 2024-09-19 01:42:14,394 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.61 vs. limit=12.0 2024-09-19 01:42:29,946 INFO [train.py:1198] (1/2) Epoch 31, batch 2450, loss[loss=0.2247, simple_loss=0.2794, pruned_loss=0.06328, ctc_loss=0.1342, cr_loss=0.4161, over 34429.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.267, pruned_loss=0.05851, ctc_loss=0.124, cr_loss=0.3996, over 6747989.18 frames. ], batch size: 95, lr: 3.85e-03, grad_scale: 16.0 2024-09-19 01:42:35,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=554134.0, ans=0.125 2024-09-19 01:42:40,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.05 vs. limit=12.0 2024-09-19 01:42:50,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.12 vs. limit=15.0 2024-09-19 01:43:09,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=554227.3333333334, ans=0.125 2024-09-19 01:43:14,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=554227.3333333334, ans=0.025 2024-09-19 01:43:19,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-09-19 01:43:46,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=554320.6666666666, ans=0.2 2024-09-19 01:43:48,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=554320.6666666666, ans=0.125 2024-09-19 01:43:51,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=554320.6666666666, ans=0.2 2024-09-19 01:43:56,054 INFO [train.py:1198] (1/2) Epoch 31, batch 2500, loss[loss=0.2223, simple_loss=0.2839, pruned_loss=0.05917, ctc_loss=0.1274, cr_loss=0.4224, over 34448.00 frames. ], tot_loss[loss=0.2125, simple_loss=0.2672, pruned_loss=0.0585, ctc_loss=0.1239, cr_loss=0.3995, over 6760105.31 frames. ], batch size: 100, lr: 3.85e-03, grad_scale: 16.0 2024-09-19 01:44:28,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=554460.6666666666, ans=0.07 2024-09-19 01:44:47,735 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:44:52,301 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.057e+02 2.387e+02 2.732e+02 3.464e+02 5.043e+02, threshold=5.465e+02, percent-clipped=0.0 2024-09-19 01:44:58,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2024-09-19 01:45:06,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=554554.0, ans=0.125 2024-09-19 01:45:19,005 INFO [train.py:1198] (1/2) Epoch 31, batch 2550, loss[loss=0.1893, simple_loss=0.242, pruned_loss=0.05047, ctc_loss=0.109, cr_loss=0.3474, over 34214.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.2671, pruned_loss=0.05845, ctc_loss=0.1238, cr_loss=0.3997, over 6764839.17 frames. ], batch size: 78, lr: 3.84e-03, grad_scale: 16.0 2024-09-19 01:45:22,041 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.78 vs. limit=22.5 2024-09-19 01:45:27,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=554600.6666666666, ans=0.0 2024-09-19 01:45:36,790 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.53 vs. limit=15.0 2024-09-19 01:45:37,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=554647.3333333334, ans=0.2 2024-09-19 01:46:08,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=554740.6666666666, ans=0.125 2024-09-19 01:46:14,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.65 vs. limit=12.0 2024-09-19 01:46:15,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=554740.6666666666, ans=0.1 2024-09-19 01:46:34,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=22.5 2024-09-19 01:46:41,798 INFO [train.py:1198] (1/2) Epoch 31, batch 2600, loss[loss=0.2145, simple_loss=0.268, pruned_loss=0.06008, ctc_loss=0.1246, cr_loss=0.3955, over 34353.00 frames. ], tot_loss[loss=0.2129, simple_loss=0.2674, pruned_loss=0.05869, ctc_loss=0.1244, cr_loss=0.4009, over 6760661.13 frames. ], batch size: 91, lr: 3.84e-03, grad_scale: 16.0 2024-09-19 01:46:42,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=554834.0, ans=0.0 2024-09-19 01:47:01,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=554880.6666666666, ans=0.0 2024-09-19 01:47:34,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=554974.0, ans=0.125 2024-09-19 01:47:39,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=554974.0, ans=0.125 2024-09-19 01:47:41,062 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 2.530e+02 3.020e+02 3.933e+02 6.402e+02, threshold=6.041e+02, percent-clipped=5.0 2024-09-19 01:47:41,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=554974.0, ans=0.125 2024-09-19 01:48:07,177 INFO [train.py:1198] (1/2) Epoch 31, batch 2650, loss[loss=0.2229, simple_loss=0.2834, pruned_loss=0.06005, ctc_loss=0.1297, cr_loss=0.4106, over 34259.00 frames. ], tot_loss[loss=0.2131, simple_loss=0.2677, pruned_loss=0.05873, ctc_loss=0.1245, cr_loss=0.4011, over 6768538.66 frames. ], batch size: 117, lr: 3.84e-03, grad_scale: 16.0 2024-09-19 01:48:22,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=555114.0, ans=0.09899494936611666 2024-09-19 01:48:25,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=555114.0, ans=0.125 2024-09-19 01:48:27,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=555114.0, ans=0.125 2024-09-19 01:48:37,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=555114.0, ans=0.0 2024-09-19 01:48:58,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2024-09-19 01:49:06,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=555207.3333333334, ans=0.125 2024-09-19 01:49:15,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.20 vs. limit=15.0 2024-09-19 01:49:23,143 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:49:29,303 INFO [train.py:1198] (1/2) Epoch 31, batch 2700, loss[loss=0.2288, simple_loss=0.2842, pruned_loss=0.06431, ctc_loss=0.1368, cr_loss=0.4357, over 34612.00 frames. ], tot_loss[loss=0.2132, simple_loss=0.2679, pruned_loss=0.0588, ctc_loss=0.1246, cr_loss=0.4013, over 6763262.34 frames. ], batch size: 102, lr: 3.84e-03, grad_scale: 16.0 2024-09-19 01:49:37,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.62 vs. limit=15.0 2024-09-19 01:49:38,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=555300.6666666666, ans=0.025 2024-09-19 01:49:43,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.53 vs. limit=15.0 2024-09-19 01:49:46,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=555347.3333333334, ans=0.0 2024-09-19 01:50:26,003 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.068e+02 2.558e+02 3.008e+02 3.895e+02 7.648e+02, threshold=6.017e+02, percent-clipped=1.0 2024-09-19 01:50:41,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=555487.3333333334, ans=0.0 2024-09-19 01:50:46,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=555487.3333333334, ans=0.125 2024-09-19 01:50:52,426 INFO [train.py:1198] (1/2) Epoch 31, batch 2750, loss[loss=0.2014, simple_loss=0.2548, pruned_loss=0.05522, ctc_loss=0.1146, cr_loss=0.3659, over 34633.00 frames. ], tot_loss[loss=0.2125, simple_loss=0.267, pruned_loss=0.05861, ctc_loss=0.1242, cr_loss=0.4002, over 6761616.16 frames. ], batch size: 88, lr: 3.84e-03, grad_scale: 16.0 2024-09-19 01:50:58,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=555534.0, ans=0.125 2024-09-19 01:50:59,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=555534.0, ans=0.1 2024-09-19 01:51:08,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=555534.0, ans=0.125 2024-09-19 01:51:36,256 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.96 vs. limit=15.0 2024-09-19 01:51:42,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=555627.3333333334, ans=0.2 2024-09-19 01:51:52,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=555674.0, ans=0.1 2024-09-19 01:52:12,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=555720.6666666666, ans=0.125 2024-09-19 01:52:18,668 INFO [train.py:1198] (1/2) Epoch 31, batch 2800, loss[loss=0.2518, simple_loss=0.294, pruned_loss=0.07891, ctc_loss=0.1674, cr_loss=0.4581, over 23564.00 frames. ], tot_loss[loss=0.2127, simple_loss=0.2671, pruned_loss=0.05871, ctc_loss=0.1243, cr_loss=0.3999, over 6739387.15 frames. ], batch size: 244, lr: 3.84e-03, grad_scale: 32.0 2024-09-19 01:52:30,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=555767.3333333334, ans=0.125 2024-09-19 01:52:32,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=555767.3333333334, ans=0.125 2024-09-19 01:52:40,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=555814.0, ans=0.0 2024-09-19 01:52:45,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=555814.0, ans=0.125 2024-09-19 01:52:53,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=555860.6666666666, ans=0.05 2024-09-19 01:52:59,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=15.0 2024-09-19 01:53:13,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555907.3333333334, ans=0.1 2024-09-19 01:53:14,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.075e+02 2.465e+02 2.999e+02 3.533e+02 6.089e+02, threshold=5.997e+02, percent-clipped=1.0 2024-09-19 01:53:40,648 INFO [train.py:1198] (1/2) Epoch 31, batch 2850, loss[loss=0.2084, simple_loss=0.2607, pruned_loss=0.05836, ctc_loss=0.1201, cr_loss=0.3862, over 34476.00 frames. ], tot_loss[loss=0.2128, simple_loss=0.2673, pruned_loss=0.05875, ctc_loss=0.1244, cr_loss=0.4006, over 6724426.90 frames. ], batch size: 90, lr: 3.84e-03, grad_scale: 16.0 2024-09-19 01:53:56,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.66 vs. limit=6.0 2024-09-19 01:54:16,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2024-09-19 01:54:51,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=556187.3333333334, ans=0.125 2024-09-19 01:55:00,895 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 01:55:00,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=556187.3333333334, ans=0.0 2024-09-19 01:55:03,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=556187.3333333334, ans=0.1 2024-09-19 01:55:06,997 INFO [train.py:1198] (1/2) Epoch 31, batch 2900, loss[loss=0.2143, simple_loss=0.2679, pruned_loss=0.05998, ctc_loss=0.1238, cr_loss=0.4015, over 34549.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.2683, pruned_loss=0.05897, ctc_loss=0.1248, cr_loss=0.402, over 6755411.46 frames. ], batch size: 94, lr: 3.84e-03, grad_scale: 16.0 2024-09-19 01:55:23,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.28 vs. limit=15.0 2024-09-19 01:55:35,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=556280.6666666666, ans=0.0 2024-09-19 01:55:39,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=556327.3333333334, ans=0.125 2024-09-19 01:55:58,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=556374.0, ans=0.125 2024-09-19 01:56:05,039 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 2.547e+02 3.145e+02 3.969e+02 6.428e+02, threshold=6.290e+02, percent-clipped=1.0 2024-09-19 01:56:05,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=556374.0, ans=0.07 2024-09-19 01:56:08,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=556374.0, ans=0.125 2024-09-19 01:56:13,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=556420.6666666666, ans=0.0 2024-09-19 01:56:28,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=556467.3333333334, ans=0.125 2024-09-19 01:56:29,837 INFO [train.py:1198] (1/2) Epoch 31, batch 2950, loss[loss=0.2066, simple_loss=0.2583, pruned_loss=0.05738, ctc_loss=0.1213, cr_loss=0.3981, over 34628.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.2668, pruned_loss=0.05836, ctc_loss=0.1237, cr_loss=0.3992, over 6751391.15 frames. ], batch size: 88, lr: 3.84e-03, grad_scale: 16.0 2024-09-19 01:56:30,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=556467.3333333334, ans=0.0 2024-09-19 01:56:40,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.31 vs. limit=15.0 2024-09-19 01:57:08,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=556560.6666666666, ans=0.2 2024-09-19 01:57:24,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=556607.3333333334, ans=0.125 2024-09-19 01:57:43,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=556654.0, ans=0.2 2024-09-19 01:57:48,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2024-09-19 01:57:52,683 INFO [train.py:1198] (1/2) Epoch 31, batch 3000, loss[loss=0.2052, simple_loss=0.2609, pruned_loss=0.05573, ctc_loss=0.1161, cr_loss=0.3735, over 34549.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2665, pruned_loss=0.05836, ctc_loss=0.1237, cr_loss=0.3989, over 6751782.80 frames. ], batch size: 94, lr: 3.84e-03, grad_scale: 16.0 2024-09-19 01:57:52,684 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 01:57:58,267 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.4.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.5009, 3.0568, 2.7015, 2.9714], device='cuda:1') 2024-09-19 01:58:09,592 INFO [train.py:1230] (1/2) Epoch 31, validation: loss=0.1488, simple_loss=0.2435, pruned_loss=0.02301, ctc_loss=0.03986, cr_loss=2.093e-14, over 944034.00 frames. 2024-09-19 01:58:09,592 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 01:58:15,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=556700.6666666666, ans=0.125 2024-09-19 01:58:33,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=556747.3333333334, ans=0.0 2024-09-19 01:58:46,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=556794.0, ans=0.025 2024-09-19 01:58:46,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=556794.0, ans=0.125 2024-09-19 01:58:52,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=556794.0, ans=0.0 2024-09-19 01:58:56,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=556794.0, ans=0.125 2024-09-19 01:58:56,783 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.80 vs. limit=22.5 2024-09-19 01:58:59,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=556794.0, ans=0.1 2024-09-19 01:59:07,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=556840.6666666666, ans=0.2 2024-09-19 01:59:10,440 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.379e+02 2.703e+02 3.212e+02 5.032e+02, threshold=5.406e+02, percent-clipped=0.0 2024-09-19 01:59:34,939 INFO [train.py:1198] (1/2) Epoch 31, batch 3050, loss[loss=0.2078, simple_loss=0.2599, pruned_loss=0.05753, ctc_loss=0.1232, cr_loss=0.4009, over 34617.00 frames. ], tot_loss[loss=0.2127, simple_loss=0.2673, pruned_loss=0.05861, ctc_loss=0.1241, cr_loss=0.4, over 6742863.14 frames. ], batch size: 89, lr: 3.84e-03, grad_scale: 16.0 2024-09-19 02:00:06,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=557027.3333333334, ans=0.025 2024-09-19 02:00:32,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=12.0 2024-09-19 02:00:34,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=557074.0, ans=0.0 2024-09-19 02:00:54,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=557167.3333333334, ans=0.2 2024-09-19 02:00:55,608 INFO [train.py:1198] (1/2) Epoch 31, batch 3100, loss[loss=0.2253, simple_loss=0.28, pruned_loss=0.06334, ctc_loss=0.1338, cr_loss=0.4271, over 34200.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.2669, pruned_loss=0.0584, ctc_loss=0.1237, cr_loss=0.3989, over 6742838.05 frames. ], batch size: 117, lr: 3.84e-03, grad_scale: 16.0 2024-09-19 02:01:00,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=557167.3333333334, ans=0.2 2024-09-19 02:01:04,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=557167.3333333334, ans=0.125 2024-09-19 02:01:17,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2024-09-19 02:01:41,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=557260.6666666666, ans=0.125 2024-09-19 02:01:52,487 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.483e+02 2.742e+02 3.532e+02 1.378e+03, threshold=5.484e+02, percent-clipped=5.0 2024-09-19 02:02:12,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=557354.0, ans=0.0 2024-09-19 02:02:13,996 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:02:16,903 INFO [train.py:1198] (1/2) Epoch 31, batch 3150, loss[loss=0.2233, simple_loss=0.2782, pruned_loss=0.0627, ctc_loss=0.1321, cr_loss=0.4133, over 33757.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2667, pruned_loss=0.05832, ctc_loss=0.1237, cr_loss=0.3985, over 6748785.43 frames. ], batch size: 122, lr: 3.83e-03, grad_scale: 16.0 2024-09-19 02:02:18,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=557400.6666666666, ans=0.125 2024-09-19 02:02:25,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=557400.6666666666, ans=0.125 2024-09-19 02:02:35,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=557447.3333333334, ans=0.0 2024-09-19 02:02:59,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=15.0 2024-09-19 02:03:09,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.19 vs. limit=15.0 2024-09-19 02:03:28,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.55 vs. limit=15.0 2024-09-19 02:03:37,470 INFO [train.py:1198] (1/2) Epoch 31, batch 3200, loss[loss=0.2125, simple_loss=0.2696, pruned_loss=0.05792, ctc_loss=0.1214, cr_loss=0.3828, over 34514.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2665, pruned_loss=0.05832, ctc_loss=0.1236, cr_loss=0.3985, over 6762773.38 frames. ], batch size: 94, lr: 3.83e-03, grad_scale: 32.0 2024-09-19 02:03:55,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=557680.6666666666, ans=0.0 2024-09-19 02:04:19,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=557727.3333333334, ans=0.0 2024-09-19 02:04:35,965 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.155e+02 2.522e+02 2.949e+02 3.843e+02 5.129e+02, threshold=5.897e+02, percent-clipped=0.0 2024-09-19 02:04:52,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=557820.6666666666, ans=10.0 2024-09-19 02:05:02,040 INFO [train.py:1198] (1/2) Epoch 31, batch 3250, loss[loss=0.217, simple_loss=0.2714, pruned_loss=0.0604, ctc_loss=0.1274, cr_loss=0.4097, over 34659.00 frames. ], tot_loss[loss=0.2123, simple_loss=0.2669, pruned_loss=0.05849, ctc_loss=0.1238, cr_loss=0.399, over 6771724.65 frames. ], batch size: 98, lr: 3.83e-03, grad_scale: 32.0 2024-09-19 02:05:05,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=557867.3333333334, ans=0.125 2024-09-19 02:05:29,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=557914.0, ans=10.0 2024-09-19 02:05:29,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=557914.0, ans=0.1 2024-09-19 02:05:34,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=557960.6666666666, ans=0.07 2024-09-19 02:05:36,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=557960.6666666666, ans=0.125 2024-09-19 02:05:36,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=557960.6666666666, ans=6.0 2024-09-19 02:05:39,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=557960.6666666666, ans=0.125 2024-09-19 02:05:45,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=557960.6666666666, ans=10.0 2024-09-19 02:06:09,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=558054.0, ans=0.025 2024-09-19 02:06:09,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=558054.0, ans=0.2 2024-09-19 02:06:12,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=558054.0, ans=0.125 2024-09-19 02:06:15,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.18 vs. limit=12.0 2024-09-19 02:06:22,246 INFO [train.py:1198] (1/2) Epoch 31, batch 3300, loss[loss=0.213, simple_loss=0.2691, pruned_loss=0.05834, ctc_loss=0.1242, cr_loss=0.3855, over 33214.00 frames. ], tot_loss[loss=0.2114, simple_loss=0.2659, pruned_loss=0.0582, ctc_loss=0.1231, cr_loss=0.3974, over 6770142.16 frames. ], batch size: 130, lr: 3.83e-03, grad_scale: 32.0 2024-09-19 02:06:22,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=558100.6666666666, ans=0.2 2024-09-19 02:06:25,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=558100.6666666666, ans=0.125 2024-09-19 02:06:27,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=558100.6666666666, ans=0.125 2024-09-19 02:06:28,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=16.27 vs. limit=22.5 2024-09-19 02:06:41,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=558147.3333333334, ans=0.025 2024-09-19 02:07:02,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=558194.0, ans=0.125 2024-09-19 02:07:02,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.93 vs. limit=6.0 2024-09-19 02:07:04,490 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.80 vs. limit=15.0 2024-09-19 02:07:19,708 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.417e+02 2.717e+02 3.266e+02 7.091e+02, threshold=5.434e+02, percent-clipped=2.0 2024-09-19 02:07:44,008 INFO [train.py:1198] (1/2) Epoch 31, batch 3350, loss[loss=0.2364, simple_loss=0.2856, pruned_loss=0.07041, ctc_loss=0.1421, cr_loss=0.4475, over 33864.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.2668, pruned_loss=0.05865, ctc_loss=0.1239, cr_loss=0.3989, over 6745267.44 frames. ], batch size: 122, lr: 3.83e-03, grad_scale: 32.0 2024-09-19 02:07:49,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=558334.0, ans=0.125 2024-09-19 02:08:21,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=558427.3333333334, ans=0.125 2024-09-19 02:08:29,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=8.02 vs. limit=15.0 2024-09-19 02:09:04,682 INFO [train.py:1198] (1/2) Epoch 31, batch 3400, loss[loss=0.1876, simple_loss=0.2426, pruned_loss=0.04842, ctc_loss=0.1059, cr_loss=0.3634, over 34170.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.2661, pruned_loss=0.05838, ctc_loss=0.1234, cr_loss=0.3978, over 6733977.30 frames. ], batch size: 78, lr: 3.83e-03, grad_scale: 32.0 2024-09-19 02:09:25,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=558614.0, ans=0.125 2024-09-19 02:09:29,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.29 vs. limit=15.0 2024-09-19 02:10:03,789 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.198e+02 2.514e+02 2.942e+02 3.623e+02 5.688e+02, threshold=5.884e+02, percent-clipped=4.0 2024-09-19 02:10:18,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=558754.0, ans=0.125 2024-09-19 02:10:26,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=558800.6666666666, ans=0.125 2024-09-19 02:10:27,665 INFO [train.py:1198] (1/2) Epoch 31, batch 3450, loss[loss=0.2116, simple_loss=0.2699, pruned_loss=0.05711, ctc_loss=0.1195, cr_loss=0.3816, over 32935.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2665, pruned_loss=0.05843, ctc_loss=0.1234, cr_loss=0.3977, over 6745986.25 frames. ], batch size: 130, lr: 3.83e-03, grad_scale: 32.0 2024-09-19 02:10:39,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=558800.6666666666, ans=0.125 2024-09-19 02:11:16,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.75 vs. limit=12.0 2024-09-19 02:11:21,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=558940.6666666666, ans=0.0 2024-09-19 02:11:36,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=558987.3333333334, ans=0.0 2024-09-19 02:11:42,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.32 vs. limit=10.0 2024-09-19 02:11:47,949 INFO [train.py:1198] (1/2) Epoch 31, batch 3500, loss[loss=0.1888, simple_loss=0.2462, pruned_loss=0.04832, ctc_loss=0.104, cr_loss=0.3486, over 34477.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.2658, pruned_loss=0.05812, ctc_loss=0.1228, cr_loss=0.3967, over 6749475.06 frames. ], batch size: 85, lr: 3.83e-03, grad_scale: 32.0 2024-09-19 02:11:53,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559034.0, ans=0.1 2024-09-19 02:12:02,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.13 vs. limit=22.5 2024-09-19 02:12:05,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.18 vs. limit=15.0 2024-09-19 02:12:30,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=559127.3333333334, ans=0.125 2024-09-19 02:12:37,084 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:12:41,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559174.0, ans=0.1 2024-09-19 02:12:44,529 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.462e+02 2.805e+02 3.517e+02 6.083e+02, threshold=5.609e+02, percent-clipped=1.0 2024-09-19 02:12:46,521 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:12:52,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=559220.6666666666, ans=0.05 2024-09-19 02:12:54,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=559220.6666666666, ans=0.125 2024-09-19 02:13:04,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.51 vs. limit=15.0 2024-09-19 02:13:05,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=559220.6666666666, ans=0.125 2024-09-19 02:13:08,294 INFO [train.py:1198] (1/2) Epoch 31, batch 3550, loss[loss=0.2093, simple_loss=0.2696, pruned_loss=0.05556, ctc_loss=0.1145, cr_loss=0.377, over 34378.00 frames. ], tot_loss[loss=0.2118, simple_loss=0.2663, pruned_loss=0.05833, ctc_loss=0.1232, cr_loss=0.3977, over 6759067.53 frames. ], batch size: 103, lr: 3.83e-03, grad_scale: 32.0 2024-09-19 02:13:31,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=559314.0, ans=0.125 2024-09-19 02:13:48,017 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.75 vs. limit=15.0 2024-09-19 02:13:57,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=559407.3333333334, ans=0.2 2024-09-19 02:14:02,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=559407.3333333334, ans=0.0 2024-09-19 02:14:09,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=559407.3333333334, ans=0.0 2024-09-19 02:14:13,916 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=22.5 2024-09-19 02:14:30,264 INFO [train.py:1198] (1/2) Epoch 31, batch 3600, loss[loss=0.2054, simple_loss=0.2639, pruned_loss=0.0544, ctc_loss=0.1145, cr_loss=0.3804, over 34465.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.2668, pruned_loss=0.05847, ctc_loss=0.1235, cr_loss=0.3987, over 6768545.71 frames. ], batch size: 90, lr: 3.83e-03, grad_scale: 32.0 2024-09-19 02:14:47,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=559547.3333333334, ans=0.0 2024-09-19 02:14:56,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=559547.3333333334, ans=0.125 2024-09-19 02:15:20,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=559640.6666666666, ans=0.07 2024-09-19 02:15:26,201 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.018e+02 2.520e+02 2.929e+02 3.890e+02 6.339e+02, threshold=5.857e+02, percent-clipped=4.0 2024-09-19 02:15:29,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.63 vs. limit=15.0 2024-09-19 02:15:30,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-09-19 02:15:36,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=559687.3333333334, ans=0.09899494936611666 2024-09-19 02:15:50,353 INFO [train.py:1198] (1/2) Epoch 31, batch 3650, loss[loss=0.2261, simple_loss=0.2828, pruned_loss=0.06278, ctc_loss=0.1329, cr_loss=0.4307, over 34420.00 frames. ], tot_loss[loss=0.2115, simple_loss=0.2662, pruned_loss=0.05818, ctc_loss=0.123, cr_loss=0.3982, over 6770695.11 frames. ], batch size: 110, lr: 3.83e-03, grad_scale: 32.0 2024-09-19 02:16:13,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2024-09-19 02:16:24,717 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.73 vs. limit=22.5 2024-09-19 02:16:38,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=559874.0, ans=0.125 2024-09-19 02:17:08,649 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2024-09-19 02:17:10,824 INFO [train.py:1198] (1/2) Epoch 31, batch 3700, loss[loss=0.2163, simple_loss=0.2774, pruned_loss=0.05771, ctc_loss=0.1229, cr_loss=0.3827, over 34615.00 frames. ], tot_loss[loss=0.2115, simple_loss=0.2664, pruned_loss=0.05809, ctc_loss=0.123, cr_loss=0.3979, over 6785512.31 frames. ], batch size: 102, lr: 3.83e-03, grad_scale: 32.0 2024-09-19 02:17:19,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=559967.3333333334, ans=0.2 2024-09-19 02:17:45,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=560014.0, ans=0.0 2024-09-19 02:17:53,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.28 vs. limit=15.0 2024-09-19 02:18:10,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=560107.3333333334, ans=15.0 2024-09-19 02:18:14,683 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.076e+02 2.402e+02 2.671e+02 3.390e+02 5.595e+02, threshold=5.342e+02, percent-clipped=0.0 2024-09-19 02:18:38,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.55 vs. limit=22.5 2024-09-19 02:18:38,985 INFO [train.py:1198] (1/2) Epoch 31, batch 3750, loss[loss=0.2174, simple_loss=0.2751, pruned_loss=0.05916, ctc_loss=0.1262, cr_loss=0.4054, over 34363.00 frames. ], tot_loss[loss=0.2143, simple_loss=0.2691, pruned_loss=0.05917, ctc_loss=0.1251, cr_loss=0.4029, over 6787019.54 frames. ], batch size: 113, lr: 3.83e-03, grad_scale: 32.0 2024-09-19 02:18:44,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.38 vs. limit=22.5 2024-09-19 02:18:57,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2024-09-19 02:19:08,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=560247.3333333334, ans=0.125 2024-09-19 02:19:17,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=560294.0, ans=0.0 2024-09-19 02:19:31,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=560340.6666666666, ans=0.125 2024-09-19 02:19:36,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=560340.6666666666, ans=0.0 2024-09-19 02:20:00,479 INFO [train.py:1198] (1/2) Epoch 31, batch 3800, loss[loss=0.2452, simple_loss=0.2885, pruned_loss=0.07667, ctc_loss=0.1549, cr_loss=0.4374, over 30039.00 frames. ], tot_loss[loss=0.2174, simple_loss=0.2717, pruned_loss=0.06059, ctc_loss=0.1277, cr_loss=0.4078, over 6675091.39 frames. ], batch size: 175, lr: 3.82e-03, grad_scale: 32.0 2024-09-19 02:20:19,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=560480.6666666666, ans=0.125 2024-09-19 02:20:34,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.53 vs. limit=15.0 2024-09-19 02:20:47,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.24 vs. limit=15.0 2024-09-19 02:20:47,533 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=22.5 2024-09-19 02:20:57,182 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.59 vs. limit=15.0 2024-09-19 02:20:59,272 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.130e+02 2.349e+02 2.531e+02 2.768e+02 1.055e+03, threshold=5.062e+02, percent-clipped=1.0 2024-09-19 02:21:06,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=560620.6666666666, ans=0.2 2024-09-19 02:21:13,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.35 vs. limit=10.0 2024-09-19 02:21:19,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=560620.6666666666, ans=0.125 2024-09-19 02:21:21,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=560620.6666666666, ans=0.2 2024-09-19 02:21:24,116 INFO [train.py:1198] (1/2) Epoch 31, batch 3850, loss[loss=0.2172, simple_loss=0.2694, pruned_loss=0.06116, ctc_loss=0.1352, cr_loss=0.3905, over 23882.00 frames. ], tot_loss[loss=0.2212, simple_loss=0.2742, pruned_loss=0.06269, ctc_loss=0.132, cr_loss=0.4129, over 6252829.28 frames. ], batch size: 244, lr: 3.82e-03, grad_scale: 32.0 2024-09-19 02:21:38,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=560667.3333333334, ans=0.125 2024-09-19 02:21:44,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=560714.0, ans=0.125 2024-09-19 02:21:46,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=560714.0, ans=0.025 2024-09-19 02:22:56,525 INFO [train.py:1198] (1/2) Epoch 32, batch 0, loss[loss=0.1946, simple_loss=0.2509, pruned_loss=0.05109, ctc_loss=0.1097, cr_loss=0.3555, over 34503.00 frames. ], tot_loss[loss=0.1946, simple_loss=0.2509, pruned_loss=0.05109, ctc_loss=0.1097, cr_loss=0.3555, over 34503.00 frames. ], batch size: 85, lr: 3.76e-03, grad_scale: 32.0 2024-09-19 02:22:56,525 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 02:23:13,377 INFO [train.py:1230] (1/2) Epoch 32, validation: loss=0.1488, simple_loss=0.2442, pruned_loss=0.02266, ctc_loss=0.04003, cr_loss=2.119e-14, over 944034.00 frames. 2024-09-19 02:23:13,377 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 02:23:23,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=560788.6666666666, ans=0.1 2024-09-19 02:23:37,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=560835.3333333334, ans=0.0 2024-09-19 02:23:42,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=560835.3333333334, ans=0.125 2024-09-19 02:23:42,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=560835.3333333334, ans=0.0 2024-09-19 02:23:49,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.63 vs. limit=15.0 2024-09-19 02:23:51,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.26 vs. limit=15.0 2024-09-19 02:23:57,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=560882.0, ans=0.125 2024-09-19 02:23:59,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=560882.0, ans=0.2 2024-09-19 02:24:19,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=560975.3333333334, ans=0.0 2024-09-19 02:24:37,401 INFO [train.py:1198] (1/2) Epoch 32, batch 50, loss[loss=0.1794, simple_loss=0.237, pruned_loss=0.04466, ctc_loss=0.09732, cr_loss=0.3243, over 34472.00 frames. ], tot_loss[loss=0.2136, simple_loss=0.2674, pruned_loss=0.05932, ctc_loss=0.1255, cr_loss=0.4023, over 1481804.99 frames. ], batch size: 82, lr: 3.76e-03, grad_scale: 32.0 2024-09-19 02:24:52,050 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.096e+02 2.573e+02 2.834e+02 3.247e+02 6.644e+02, threshold=5.668e+02, percent-clipped=3.0 2024-09-19 02:25:19,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=561115.3333333334, ans=0.1 2024-09-19 02:25:47,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=561208.6666666666, ans=0.125 2024-09-19 02:25:51,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-09-19 02:25:59,789 INFO [train.py:1198] (1/2) Epoch 32, batch 100, loss[loss=0.193, simple_loss=0.2501, pruned_loss=0.04949, ctc_loss=0.1099, cr_loss=0.3722, over 34586.00 frames. ], tot_loss[loss=0.2153, simple_loss=0.2696, pruned_loss=0.05976, ctc_loss=0.1264, cr_loss=0.4053, over 2630563.43 frames. ], batch size: 89, lr: 3.76e-03, grad_scale: 16.0 2024-09-19 02:26:00,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=12.0 2024-09-19 02:26:25,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.72 vs. limit=12.0 2024-09-19 02:26:44,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=561348.6666666666, ans=0.125 2024-09-19 02:26:59,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561395.3333333334, ans=0.1 2024-09-19 02:27:20,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=561442.0, ans=0.0 2024-09-19 02:27:22,253 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.80 vs. limit=15.0 2024-09-19 02:27:22,996 INFO [train.py:1198] (1/2) Epoch 32, batch 150, loss[loss=0.1711, simple_loss=0.2296, pruned_loss=0.04068, ctc_loss=0.0894, cr_loss=0.3311, over 34488.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.267, pruned_loss=0.05832, ctc_loss=0.1235, cr_loss=0.3994, over 3558221.01 frames. ], batch size: 82, lr: 3.76e-03, grad_scale: 16.0 2024-09-19 02:27:41,255 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.427e+02 2.831e+02 3.496e+02 5.847e+02, threshold=5.661e+02, percent-clipped=2.0 2024-09-19 02:28:12,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=561628.6666666666, ans=10.0 2024-09-19 02:28:46,745 INFO [train.py:1198] (1/2) Epoch 32, batch 200, loss[loss=0.2346, simple_loss=0.2845, pruned_loss=0.06943, ctc_loss=0.1418, cr_loss=0.4377, over 31950.00 frames. ], tot_loss[loss=0.2109, simple_loss=0.2658, pruned_loss=0.05786, ctc_loss=0.1226, cr_loss=0.3974, over 4271371.84 frames. ], batch size: 145, lr: 3.76e-03, grad_scale: 16.0 2024-09-19 02:30:12,096 INFO [train.py:1198] (1/2) Epoch 32, batch 250, loss[loss=0.2222, simple_loss=0.2786, pruned_loss=0.06156, ctc_loss=0.1299, cr_loss=0.4147, over 34258.00 frames. ], tot_loss[loss=0.2113, simple_loss=0.2663, pruned_loss=0.05788, ctc_loss=0.1228, cr_loss=0.3985, over 4834289.84 frames. ], batch size: 117, lr: 3.76e-03, grad_scale: 16.0 2024-09-19 02:30:15,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=561955.3333333334, ans=0.05 2024-09-19 02:30:27,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=562002.0, ans=0.125 2024-09-19 02:30:29,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.49 vs. limit=6.0 2024-09-19 02:30:30,231 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.975e+02 2.671e+02 3.139e+02 4.034e+02 8.877e+02, threshold=6.277e+02, percent-clipped=6.0 2024-09-19 02:30:37,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=562002.0, ans=0.0 2024-09-19 02:30:49,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.06 vs. limit=15.0 2024-09-19 02:30:53,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=562048.6666666666, ans=0.07 2024-09-19 02:31:05,364 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:31:05,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=562095.3333333334, ans=0.125 2024-09-19 02:31:20,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=562142.0, ans=0.125 2024-09-19 02:31:22,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.52 vs. limit=15.0 2024-09-19 02:31:23,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=562142.0, ans=0.0 2024-09-19 02:31:28,983 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 02:31:36,748 INFO [train.py:1198] (1/2) Epoch 32, batch 300, loss[loss=0.232, simple_loss=0.2864, pruned_loss=0.06657, ctc_loss=0.1365, cr_loss=0.4265, over 34354.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.2654, pruned_loss=0.05756, ctc_loss=0.1222, cr_loss=0.3971, over 5263121.80 frames. ], batch size: 107, lr: 3.76e-03, grad_scale: 8.0 2024-09-19 02:31:48,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=562188.6666666666, ans=0.125 2024-09-19 02:32:06,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=562235.3333333334, ans=0.125 2024-09-19 02:32:08,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=562282.0, ans=0.0 2024-09-19 02:32:08,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=562282.0, ans=0.0 2024-09-19 02:32:30,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.20 vs. limit=8.0 2024-09-19 02:32:59,078 INFO [train.py:1198] (1/2) Epoch 32, batch 350, loss[loss=0.1771, simple_loss=0.2342, pruned_loss=0.04357, ctc_loss=0.09818, cr_loss=0.3296, over 34263.00 frames. ], tot_loss[loss=0.2108, simple_loss=0.2658, pruned_loss=0.05768, ctc_loss=0.1224, cr_loss=0.3979, over 5598418.85 frames. ], batch size: 83, lr: 3.76e-03, grad_scale: 8.0 2024-09-19 02:33:05,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=562422.0, ans=0.0 2024-09-19 02:33:15,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=562468.6666666666, ans=0.125 2024-09-19 02:33:17,022 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.109e+02 2.472e+02 2.826e+02 3.382e+02 5.424e+02, threshold=5.652e+02, percent-clipped=0.0 2024-09-19 02:33:24,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=562468.6666666666, ans=0.125 2024-09-19 02:33:33,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=562515.3333333334, ans=0.125 2024-09-19 02:33:42,840 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2024-09-19 02:33:42,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.52 vs. limit=15.0 2024-09-19 02:33:48,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=562562.0, ans=0.125 2024-09-19 02:33:52,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=562562.0, ans=0.05 2024-09-19 02:33:56,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=562562.0, ans=0.0 2024-09-19 02:33:56,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=562562.0, ans=0.0 2024-09-19 02:34:12,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=562608.6666666666, ans=0.1 2024-09-19 02:34:23,186 INFO [train.py:1198] (1/2) Epoch 32, batch 400, loss[loss=0.2183, simple_loss=0.2732, pruned_loss=0.06087, ctc_loss=0.1271, cr_loss=0.4068, over 34429.00 frames. ], tot_loss[loss=0.2105, simple_loss=0.2655, pruned_loss=0.05762, ctc_loss=0.1223, cr_loss=0.3971, over 5865209.58 frames. ], batch size: 95, lr: 3.76e-03, grad_scale: 8.0 2024-09-19 02:34:31,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=562655.3333333334, ans=0.0 2024-09-19 02:34:43,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=562702.0, ans=0.125 2024-09-19 02:35:26,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=562795.3333333334, ans=0.125 2024-09-19 02:35:47,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=7.76 vs. limit=15.0 2024-09-19 02:35:48,001 INFO [train.py:1198] (1/2) Epoch 32, batch 450, loss[loss=0.2206, simple_loss=0.2781, pruned_loss=0.06098, ctc_loss=0.1273, cr_loss=0.3927, over 34699.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.266, pruned_loss=0.05794, ctc_loss=0.1229, cr_loss=0.3982, over 6054678.30 frames. ], batch size: 97, lr: 3.75e-03, grad_scale: 8.0 2024-09-19 02:35:59,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=562888.6666666666, ans=0.125 2024-09-19 02:35:59,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=562888.6666666666, ans=0.125 2024-09-19 02:36:02,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=562888.6666666666, ans=15.0 2024-09-19 02:36:07,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.081e+02 2.476e+02 2.853e+02 3.465e+02 1.143e+03, threshold=5.707e+02, percent-clipped=2.0 2024-09-19 02:36:31,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=562982.0, ans=0.2 2024-09-19 02:37:10,805 INFO [train.py:1198] (1/2) Epoch 32, batch 500, loss[loss=0.2414, simple_loss=0.289, pruned_loss=0.07279, ctc_loss=0.1481, cr_loss=0.464, over 34430.00 frames. ], tot_loss[loss=0.2109, simple_loss=0.2656, pruned_loss=0.05786, ctc_loss=0.1227, cr_loss=0.3979, over 6221157.45 frames. ], batch size: 110, lr: 3.75e-03, grad_scale: 8.0 2024-09-19 02:37:12,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=563122.0, ans=0.125 2024-09-19 02:37:34,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=563168.6666666666, ans=0.0 2024-09-19 02:37:44,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=563215.3333333334, ans=0.0 2024-09-19 02:38:35,533 INFO [train.py:1198] (1/2) Epoch 32, batch 550, loss[loss=0.2229, simple_loss=0.2787, pruned_loss=0.06244, ctc_loss=0.1308, cr_loss=0.4015, over 33872.00 frames. ], tot_loss[loss=0.2109, simple_loss=0.2657, pruned_loss=0.05786, ctc_loss=0.1226, cr_loss=0.3975, over 6330126.16 frames. ], batch size: 122, lr: 3.75e-03, grad_scale: 8.0 2024-09-19 02:38:37,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=563355.3333333334, ans=0.0 2024-09-19 02:38:57,168 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.991e+02 2.396e+02 2.749e+02 3.464e+02 5.221e+02, threshold=5.497e+02, percent-clipped=0.0 2024-09-19 02:39:10,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=563448.6666666666, ans=0.125 2024-09-19 02:39:14,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=563448.6666666666, ans=0.125 2024-09-19 02:39:35,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=563495.3333333334, ans=0.0 2024-09-19 02:39:50,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=563542.0, ans=0.0 2024-09-19 02:40:00,288 INFO [train.py:1198] (1/2) Epoch 32, batch 600, loss[loss=0.223, simple_loss=0.2815, pruned_loss=0.06097, ctc_loss=0.1298, cr_loss=0.4168, over 34232.00 frames. ], tot_loss[loss=0.2108, simple_loss=0.2656, pruned_loss=0.05778, ctc_loss=0.1225, cr_loss=0.3971, over 6432455.23 frames. ], batch size: 117, lr: 3.75e-03, grad_scale: 8.0 2024-09-19 02:40:13,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=563588.6666666666, ans=0.125 2024-09-19 02:40:15,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=563635.3333333334, ans=0.025 2024-09-19 02:40:25,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=563635.3333333334, ans=0.2 2024-09-19 02:40:51,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=563728.6666666666, ans=0.0 2024-09-19 02:40:55,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=563728.6666666666, ans=0.0 2024-09-19 02:41:13,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=563775.3333333334, ans=0.2 2024-09-19 02:41:16,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=563775.3333333334, ans=0.125 2024-09-19 02:41:22,734 INFO [train.py:1198] (1/2) Epoch 32, batch 650, loss[loss=0.2147, simple_loss=0.2702, pruned_loss=0.05937, ctc_loss=0.1231, cr_loss=0.3975, over 34536.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.2649, pruned_loss=0.05733, ctc_loss=0.1218, cr_loss=0.3956, over 6524336.25 frames. ], batch size: 94, lr: 3.75e-03, grad_scale: 8.0 2024-09-19 02:41:29,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=563822.0, ans=0.2 2024-09-19 02:41:42,483 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.479e+02 2.866e+02 3.934e+02 8.476e+02, threshold=5.732e+02, percent-clipped=6.0 2024-09-19 02:41:46,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=563868.6666666666, ans=0.0 2024-09-19 02:42:06,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=563915.3333333334, ans=0.125 2024-09-19 02:42:11,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=563915.3333333334, ans=0.0 2024-09-19 02:42:26,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=563962.0, ans=0.125 2024-09-19 02:42:49,298 INFO [train.py:1198] (1/2) Epoch 32, batch 700, loss[loss=0.2045, simple_loss=0.2551, pruned_loss=0.05699, ctc_loss=0.1217, cr_loss=0.3883, over 34609.00 frames. ], tot_loss[loss=0.2105, simple_loss=0.2654, pruned_loss=0.05759, ctc_loss=0.1223, cr_loss=0.3964, over 6580023.64 frames. ], batch size: 89, lr: 3.75e-03, grad_scale: 8.0 2024-09-19 02:42:51,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=564055.3333333334, ans=0.125 2024-09-19 02:42:59,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=564055.3333333334, ans=0.125 2024-09-19 02:43:03,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.60 vs. limit=15.0 2024-09-19 02:43:07,957 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.34 vs. limit=22.5 2024-09-19 02:43:21,381 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.72 vs. limit=10.0 2024-09-19 02:43:22,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=564148.6666666666, ans=0.0 2024-09-19 02:43:34,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.80 vs. limit=22.5 2024-09-19 02:43:37,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=564195.3333333334, ans=0.2 2024-09-19 02:43:45,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=564195.3333333334, ans=0.125 2024-09-19 02:43:57,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.18 vs. limit=15.0 2024-09-19 02:44:00,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=564242.0, ans=0.5 2024-09-19 02:44:00,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.58 vs. limit=15.0 2024-09-19 02:44:06,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=564242.0, ans=0.1 2024-09-19 02:44:11,533 INFO [train.py:1198] (1/2) Epoch 32, batch 750, loss[loss=0.227, simple_loss=0.2804, pruned_loss=0.06472, ctc_loss=0.1345, cr_loss=0.4319, over 34425.00 frames. ], tot_loss[loss=0.21, simple_loss=0.2649, pruned_loss=0.05742, ctc_loss=0.1218, cr_loss=0.3952, over 6624244.17 frames. ], batch size: 95, lr: 3.75e-03, grad_scale: 8.0 2024-09-19 02:44:21,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=564288.6666666666, ans=0.0 2024-09-19 02:44:31,167 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.928e+02 2.510e+02 3.001e+02 3.815e+02 6.365e+02, threshold=6.002e+02, percent-clipped=3.0 2024-09-19 02:44:31,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=564335.3333333334, ans=0.0 2024-09-19 02:44:51,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=564382.0, ans=0.125 2024-09-19 02:45:33,700 INFO [train.py:1198] (1/2) Epoch 32, batch 800, loss[loss=0.1951, simple_loss=0.247, pruned_loss=0.05283, ctc_loss=0.1134, cr_loss=0.3699, over 34454.00 frames. ], tot_loss[loss=0.2107, simple_loss=0.2653, pruned_loss=0.0578, ctc_loss=0.1225, cr_loss=0.3974, over 6660007.09 frames. ], batch size: 85, lr: 3.75e-03, grad_scale: 16.0 2024-09-19 02:45:33,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=564522.0, ans=0.0 2024-09-19 02:45:42,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=564522.0, ans=0.125 2024-09-19 02:45:57,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=564568.6666666666, ans=10.0 2024-09-19 02:46:05,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=564568.6666666666, ans=0.0 2024-09-19 02:46:11,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.17 vs. limit=15.0 2024-09-19 02:46:22,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=564615.3333333334, ans=0.125 2024-09-19 02:46:43,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=564708.6666666666, ans=0.95 2024-09-19 02:46:45,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=564708.6666666666, ans=0.95 2024-09-19 02:46:46,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=564708.6666666666, ans=0.125 2024-09-19 02:46:59,203 INFO [train.py:1198] (1/2) Epoch 32, batch 850, loss[loss=0.2185, simple_loss=0.275, pruned_loss=0.06, ctc_loss=0.1281, cr_loss=0.4077, over 34409.00 frames. ], tot_loss[loss=0.2102, simple_loss=0.265, pruned_loss=0.05753, ctc_loss=0.122, cr_loss=0.3961, over 6692672.69 frames. ], batch size: 103, lr: 3.75e-03, grad_scale: 16.0 2024-09-19 02:47:01,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=564755.3333333334, ans=0.2 2024-09-19 02:47:02,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=564755.3333333334, ans=0.125 2024-09-19 02:47:15,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2024-09-19 02:47:17,978 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2024-09-19 02:47:18,833 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.931e+02 2.544e+02 2.910e+02 3.632e+02 5.964e+02, threshold=5.821e+02, percent-clipped=0.0 2024-09-19 02:47:30,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=564848.6666666666, ans=0.2 2024-09-19 02:47:48,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=564895.3333333334, ans=0.125 2024-09-19 02:47:57,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=564895.3333333334, ans=0.1 2024-09-19 02:47:58,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=564895.3333333334, ans=0.125 2024-09-19 02:48:05,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=564942.0, ans=0.125 2024-09-19 02:48:12,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=564942.0, ans=0.5 2024-09-19 02:48:20,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=564988.6666666666, ans=0.125 2024-09-19 02:48:21,697 INFO [train.py:1198] (1/2) Epoch 32, batch 900, loss[loss=0.1833, simple_loss=0.2393, pruned_loss=0.04704, ctc_loss=0.0989, cr_loss=0.3381, over 34498.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.2653, pruned_loss=0.05757, ctc_loss=0.122, cr_loss=0.3964, over 6698233.62 frames. ], batch size: 85, lr: 3.75e-03, grad_scale: 16.0 2024-09-19 02:48:27,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2024-09-19 02:48:33,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=564988.6666666666, ans=0.0 2024-09-19 02:48:45,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=565035.3333333334, ans=0.125 2024-09-19 02:48:48,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=565035.3333333334, ans=0.025 2024-09-19 02:49:01,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=565082.0, ans=0.025 2024-09-19 02:49:04,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=565082.0, ans=0.0 2024-09-19 02:49:30,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=565175.3333333334, ans=0.1 2024-09-19 02:49:32,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.27 vs. limit=15.0 2024-09-19 02:49:33,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=565175.3333333334, ans=0.1 2024-09-19 02:49:36,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=565175.3333333334, ans=0.04949747468305833 2024-09-19 02:49:38,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=565175.3333333334, ans=0.125 2024-09-19 02:49:46,159 INFO [train.py:1198] (1/2) Epoch 32, batch 950, loss[loss=0.199, simple_loss=0.251, pruned_loss=0.05444, ctc_loss=0.1149, cr_loss=0.3773, over 34700.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.2654, pruned_loss=0.05758, ctc_loss=0.1221, cr_loss=0.3968, over 6702020.42 frames. ], batch size: 87, lr: 3.75e-03, grad_scale: 16.0 2024-09-19 02:49:46,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=565222.0, ans=0.02 2024-09-19 02:49:48,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.32 vs. limit=22.5 2024-09-19 02:49:54,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=565222.0, ans=0.2 2024-09-19 02:50:07,482 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.133e+02 2.603e+02 2.955e+02 3.351e+02 6.141e+02, threshold=5.910e+02, percent-clipped=2.0 2024-09-19 02:50:22,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=565315.3333333334, ans=0.2 2024-09-19 02:50:50,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=565362.0, ans=0.1 2024-09-19 02:50:58,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.82 vs. limit=22.5 2024-09-19 02:51:05,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=565408.6666666666, ans=0.1 2024-09-19 02:51:10,350 INFO [train.py:1198] (1/2) Epoch 32, batch 1000, loss[loss=0.203, simple_loss=0.2545, pruned_loss=0.05632, ctc_loss=0.1184, cr_loss=0.3797, over 34469.00 frames. ], tot_loss[loss=0.2115, simple_loss=0.2664, pruned_loss=0.05805, ctc_loss=0.1231, cr_loss=0.3986, over 6695605.84 frames. ], batch size: 90, lr: 3.75e-03, grad_scale: 16.0 2024-09-19 02:51:17,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=565455.3333333334, ans=0.125 2024-09-19 02:51:17,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.96 vs. limit=22.5 2024-09-19 02:51:30,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=565502.0, ans=0.0 2024-09-19 02:51:40,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2024-09-19 02:51:48,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=565548.6666666666, ans=0.125 2024-09-19 02:51:55,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=565548.6666666666, ans=0.07 2024-09-19 02:52:21,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=565642.0, ans=0.0 2024-09-19 02:52:28,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=565642.0, ans=0.05 2024-09-19 02:52:28,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=565642.0, ans=0.125 2024-09-19 02:52:32,744 INFO [train.py:1198] (1/2) Epoch 32, batch 1050, loss[loss=0.2039, simple_loss=0.2643, pruned_loss=0.05292, ctc_loss=0.1143, cr_loss=0.3713, over 34538.00 frames. ], tot_loss[loss=0.2111, simple_loss=0.2657, pruned_loss=0.05802, ctc_loss=0.1229, cr_loss=0.3977, over 6704117.75 frames. ], batch size: 99, lr: 3.75e-03, grad_scale: 16.0 2024-09-19 02:52:42,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=565688.6666666666, ans=0.0 2024-09-19 02:52:52,426 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.023e+02 2.634e+02 3.114e+02 4.415e+02 7.431e+02, threshold=6.228e+02, percent-clipped=4.0 2024-09-19 02:53:38,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.89 vs. limit=15.0 2024-09-19 02:53:51,844 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=10.47 vs. limit=12.0 2024-09-19 02:53:59,346 INFO [train.py:1198] (1/2) Epoch 32, batch 1100, loss[loss=0.2101, simple_loss=0.265, pruned_loss=0.05753, ctc_loss=0.1228, cr_loss=0.3904, over 34323.00 frames. ], tot_loss[loss=0.2108, simple_loss=0.2655, pruned_loss=0.05786, ctc_loss=0.1226, cr_loss=0.3969, over 6716832.28 frames. ], batch size: 91, lr: 3.74e-03, grad_scale: 16.0 2024-09-19 02:54:06,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=565922.0, ans=0.0 2024-09-19 02:54:41,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=566015.3333333334, ans=0.0 2024-09-19 02:54:57,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=566062.0, ans=0.0 2024-09-19 02:55:13,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.16 vs. limit=15.0 2024-09-19 02:55:18,001 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=7.71 vs. limit=15.0 2024-09-19 02:55:22,048 INFO [train.py:1198] (1/2) Epoch 32, batch 1150, loss[loss=0.2039, simple_loss=0.2567, pruned_loss=0.05588, ctc_loss=0.1197, cr_loss=0.3871, over 34722.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.2657, pruned_loss=0.05807, ctc_loss=0.1231, cr_loss=0.3977, over 6714839.69 frames. ], batch size: 92, lr: 3.74e-03, grad_scale: 16.0 2024-09-19 02:55:32,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=566155.3333333334, ans=0.125 2024-09-19 02:55:34,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=566155.3333333334, ans=0.125 2024-09-19 02:55:41,973 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.947e+02 2.460e+02 2.734e+02 3.239e+02 5.348e+02, threshold=5.467e+02, percent-clipped=0.0 2024-09-19 02:55:45,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=566202.0, ans=0.0 2024-09-19 02:55:55,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=566248.6666666666, ans=0.025 2024-09-19 02:56:10,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=566295.3333333334, ans=0.125 2024-09-19 02:56:21,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=566295.3333333334, ans=0.1 2024-09-19 02:56:30,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.29 vs. limit=15.0 2024-09-19 02:56:32,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2024-09-19 02:56:44,443 INFO [train.py:1198] (1/2) Epoch 32, batch 1200, loss[loss=0.2174, simple_loss=0.2758, pruned_loss=0.05894, ctc_loss=0.1236, cr_loss=0.4115, over 34567.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2665, pruned_loss=0.05832, ctc_loss=0.1237, cr_loss=0.3991, over 6707449.29 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2024-09-19 02:56:48,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=566388.6666666666, ans=0.025 2024-09-19 02:56:53,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=566388.6666666666, ans=0.125 2024-09-19 02:57:09,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=566435.3333333334, ans=0.125 2024-09-19 02:57:36,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=566528.6666666666, ans=0.125 2024-09-19 02:57:40,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2024-09-19 02:58:11,513 INFO [train.py:1198] (1/2) Epoch 32, batch 1250, loss[loss=0.2314, simple_loss=0.2814, pruned_loss=0.06785, ctc_loss=0.1399, cr_loss=0.4451, over 34343.00 frames. ], tot_loss[loss=0.2125, simple_loss=0.2671, pruned_loss=0.05858, ctc_loss=0.1241, cr_loss=0.4002, over 6741510.33 frames. ], batch size: 107, lr: 3.74e-03, grad_scale: 32.0 2024-09-19 02:58:28,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=566668.6666666666, ans=0.2 2024-09-19 02:58:33,027 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.055e+02 2.503e+02 2.904e+02 3.785e+02 6.049e+02, threshold=5.808e+02, percent-clipped=1.0 2024-09-19 02:58:44,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=566715.3333333334, ans=0.125 2024-09-19 02:58:44,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=566715.3333333334, ans=0.0 2024-09-19 02:58:53,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.87 vs. limit=15.0 2024-09-19 02:59:01,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=566762.0, ans=0.95 2024-09-19 02:59:11,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=566762.0, ans=0.1 2024-09-19 02:59:13,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=566762.0, ans=10.0 2024-09-19 02:59:15,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2024-09-19 02:59:29,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=566808.6666666666, ans=0.125 2024-09-19 02:59:30,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2024-09-19 02:59:34,063 INFO [train.py:1198] (1/2) Epoch 32, batch 1300, loss[loss=0.2089, simple_loss=0.2722, pruned_loss=0.05369, ctc_loss=0.1156, cr_loss=0.3791, over 32975.00 frames. ], tot_loss[loss=0.2118, simple_loss=0.2664, pruned_loss=0.05823, ctc_loss=0.1235, cr_loss=0.3994, over 6744330.68 frames. ], batch size: 130, lr: 3.74e-03, grad_scale: 16.0 2024-09-19 02:59:40,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=566855.3333333334, ans=0.1 2024-09-19 02:59:57,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=22.5 2024-09-19 03:00:10,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=566948.6666666666, ans=0.0 2024-09-19 03:00:19,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.84 vs. limit=15.0 2024-09-19 03:00:58,257 INFO [train.py:1198] (1/2) Epoch 32, batch 1350, loss[loss=0.2144, simple_loss=0.2688, pruned_loss=0.05934, ctc_loss=0.1247, cr_loss=0.4122, over 34532.00 frames. ], tot_loss[loss=0.211, simple_loss=0.2658, pruned_loss=0.05788, ctc_loss=0.1229, cr_loss=0.3981, over 6763043.12 frames. ], batch size: 94, lr: 3.74e-03, grad_scale: 16.0 2024-09-19 03:01:06,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=567088.6666666666, ans=0.0 2024-09-19 03:01:16,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2024-09-19 03:01:21,536 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.108e+02 2.564e+02 2.981e+02 3.692e+02 7.854e+02, threshold=5.962e+02, percent-clipped=2.0 2024-09-19 03:01:24,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=15.0 2024-09-19 03:01:46,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=567182.0, ans=0.125 2024-09-19 03:02:06,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=567275.3333333334, ans=0.1 2024-09-19 03:02:22,521 INFO [train.py:1198] (1/2) Epoch 32, batch 1400, loss[loss=0.187, simple_loss=0.2429, pruned_loss=0.04808, ctc_loss=0.104, cr_loss=0.354, over 34291.00 frames. ], tot_loss[loss=0.211, simple_loss=0.2657, pruned_loss=0.05787, ctc_loss=0.1229, cr_loss=0.398, over 6775219.23 frames. ], batch size: 80, lr: 3.74e-03, grad_scale: 16.0 2024-09-19 03:02:26,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=567322.0, ans=0.125 2024-09-19 03:02:27,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=567322.0, ans=0.125 2024-09-19 03:02:31,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=567322.0, ans=0.2 2024-09-19 03:02:45,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=567368.6666666666, ans=0.2 2024-09-19 03:03:14,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=567462.0, ans=0.95 2024-09-19 03:03:29,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=567508.6666666666, ans=0.125 2024-09-19 03:03:40,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=567508.6666666666, ans=0.1 2024-09-19 03:03:45,207 INFO [train.py:1198] (1/2) Epoch 32, batch 1450, loss[loss=0.2238, simple_loss=0.2791, pruned_loss=0.06245, ctc_loss=0.1316, cr_loss=0.4336, over 34425.00 frames. ], tot_loss[loss=0.2113, simple_loss=0.2661, pruned_loss=0.05793, ctc_loss=0.1231, cr_loss=0.3982, over 6772557.95 frames. ], batch size: 110, lr: 3.74e-03, grad_scale: 16.0 2024-09-19 03:03:47,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=567555.3333333334, ans=0.1 2024-09-19 03:03:48,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=567555.3333333334, ans=0.125 2024-09-19 03:03:51,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.92 vs. limit=22.5 2024-09-19 03:03:51,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=567555.3333333334, ans=0.125 2024-09-19 03:03:59,416 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=15.0 2024-09-19 03:04:06,593 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.048e+02 2.563e+02 3.024e+02 3.601e+02 6.198e+02, threshold=6.049e+02, percent-clipped=2.0 2024-09-19 03:04:19,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=567648.6666666666, ans=0.025 2024-09-19 03:04:19,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=567648.6666666666, ans=0.1 2024-09-19 03:04:23,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.55 vs. limit=15.0 2024-09-19 03:04:28,490 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2024-09-19 03:04:31,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567648.6666666666, ans=0.1 2024-09-19 03:04:43,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=567695.3333333334, ans=0.02 2024-09-19 03:04:51,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=567742.0, ans=0.0 2024-09-19 03:05:01,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=567742.0, ans=0.125 2024-09-19 03:05:11,412 INFO [train.py:1198] (1/2) Epoch 32, batch 1500, loss[loss=0.2147, simple_loss=0.2732, pruned_loss=0.05791, ctc_loss=0.1228, cr_loss=0.3982, over 34445.00 frames. ], tot_loss[loss=0.2113, simple_loss=0.2664, pruned_loss=0.05786, ctc_loss=0.123, cr_loss=0.3983, over 6773046.21 frames. ], batch size: 100, lr: 3.74e-03, grad_scale: 16.0 2024-09-19 03:05:20,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=567788.6666666666, ans=0.0 2024-09-19 03:05:21,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=567788.6666666666, ans=0.125 2024-09-19 03:05:26,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=567835.3333333334, ans=0.0 2024-09-19 03:05:46,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=567882.0, ans=0.125 2024-09-19 03:05:52,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=567882.0, ans=0.025 2024-09-19 03:05:58,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=567882.0, ans=0.1 2024-09-19 03:06:00,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.61 vs. limit=15.0 2024-09-19 03:06:01,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=22.5 2024-09-19 03:06:13,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.79 vs. limit=15.0 2024-09-19 03:06:16,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567975.3333333334, ans=0.1 2024-09-19 03:06:22,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=567975.3333333334, ans=0.0 2024-09-19 03:06:33,855 INFO [train.py:1198] (1/2) Epoch 32, batch 1550, loss[loss=0.2198, simple_loss=0.276, pruned_loss=0.06082, ctc_loss=0.1293, cr_loss=0.402, over 34431.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.2666, pruned_loss=0.05807, ctc_loss=0.1234, cr_loss=0.399, over 6744935.72 frames. ], batch size: 105, lr: 3.74e-03, grad_scale: 16.0 2024-09-19 03:06:42,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.16 vs. limit=10.0 2024-09-19 03:06:54,993 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.525e+02 3.034e+02 3.706e+02 6.678e+02, threshold=6.068e+02, percent-clipped=1.0 2024-09-19 03:07:21,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568162.0, ans=0.1 2024-09-19 03:07:55,686 INFO [train.py:1198] (1/2) Epoch 32, batch 1600, loss[loss=0.2171, simple_loss=0.2757, pruned_loss=0.05873, ctc_loss=0.1237, cr_loss=0.4072, over 34559.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.2665, pruned_loss=0.05811, ctc_loss=0.1234, cr_loss=0.3988, over 6724235.53 frames. ], batch size: 99, lr: 3.74e-03, grad_scale: 32.0 2024-09-19 03:07:59,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=568255.3333333334, ans=0.0 2024-09-19 03:08:00,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=568255.3333333334, ans=0.2 2024-09-19 03:08:02,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=568255.3333333334, ans=0.0 2024-09-19 03:08:06,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.11 vs. limit=15.0 2024-09-19 03:08:09,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=568255.3333333334, ans=0.125 2024-09-19 03:08:16,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=568302.0, ans=0.125 2024-09-19 03:08:20,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2024-09-19 03:08:42,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=568348.6666666666, ans=0.0 2024-09-19 03:08:53,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=568395.3333333334, ans=0.0 2024-09-19 03:08:56,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=568395.3333333334, ans=0.125 2024-09-19 03:09:07,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2024-09-19 03:09:08,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=568442.0, ans=0.0 2024-09-19 03:09:11,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=568442.0, ans=0.125 2024-09-19 03:09:11,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=22.5 2024-09-19 03:09:22,271 INFO [train.py:1198] (1/2) Epoch 32, batch 1650, loss[loss=0.2095, simple_loss=0.2696, pruned_loss=0.05486, ctc_loss=0.1197, cr_loss=0.3949, over 34388.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.2662, pruned_loss=0.05786, ctc_loss=0.123, cr_loss=0.398, over 6717854.03 frames. ], batch size: 103, lr: 3.74e-03, grad_scale: 32.0 2024-09-19 03:09:32,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=568488.6666666666, ans=0.125 2024-09-19 03:09:39,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=568535.3333333334, ans=0.0 2024-09-19 03:09:41,409 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2024-09-19 03:09:43,447 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.517e+02 2.947e+02 3.770e+02 8.069e+02, threshold=5.893e+02, percent-clipped=4.0 2024-09-19 03:10:00,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=568582.0, ans=0.125 2024-09-19 03:10:09,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=568628.6666666666, ans=0.125 2024-09-19 03:10:12,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2024-09-19 03:10:20,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=568628.6666666666, ans=0.125 2024-09-19 03:10:22,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2024-09-19 03:10:44,121 INFO [train.py:1198] (1/2) Epoch 32, batch 1700, loss[loss=0.191, simple_loss=0.2416, pruned_loss=0.05211, ctc_loss=0.1099, cr_loss=0.3566, over 34301.00 frames. ], tot_loss[loss=0.2113, simple_loss=0.2662, pruned_loss=0.05791, ctc_loss=0.1231, cr_loss=0.3988, over 6742786.90 frames. ], batch size: 80, lr: 3.74e-03, grad_scale: 32.0 2024-09-19 03:10:47,776 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:10:49,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=568722.0, ans=0.0 2024-09-19 03:11:12,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=568768.6666666666, ans=0.125 2024-09-19 03:11:15,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=568815.3333333334, ans=0.025 2024-09-19 03:11:29,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=568815.3333333334, ans=0.025 2024-09-19 03:11:33,184 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.71 vs. limit=15.0 2024-09-19 03:11:52,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=568908.6666666666, ans=0.025 2024-09-19 03:11:53,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=568908.6666666666, ans=0.0 2024-09-19 03:12:08,484 INFO [train.py:1198] (1/2) Epoch 32, batch 1750, loss[loss=0.1967, simple_loss=0.2456, pruned_loss=0.05447, ctc_loss=0.1145, cr_loss=0.4008, over 34122.00 frames. ], tot_loss[loss=0.2107, simple_loss=0.2657, pruned_loss=0.05768, ctc_loss=0.1226, cr_loss=0.3978, over 6752508.76 frames. ], batch size: 78, lr: 3.73e-03, grad_scale: 32.0 2024-09-19 03:12:18,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=568955.3333333334, ans=0.05 2024-09-19 03:12:19,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=568955.3333333334, ans=0.0 2024-09-19 03:12:22,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=568955.3333333334, ans=0.125 2024-09-19 03:12:31,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.004e+02 2.420e+02 2.815e+02 3.327e+02 6.763e+02, threshold=5.630e+02, percent-clipped=1.0 2024-09-19 03:12:32,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=569002.0, ans=0.0 2024-09-19 03:12:35,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=569002.0, ans=0.0 2024-09-19 03:13:32,570 INFO [train.py:1198] (1/2) Epoch 32, batch 1800, loss[loss=0.2146, simple_loss=0.2761, pruned_loss=0.0566, ctc_loss=0.1216, cr_loss=0.3888, over 34700.00 frames. ], tot_loss[loss=0.2107, simple_loss=0.2657, pruned_loss=0.0576, ctc_loss=0.1225, cr_loss=0.3975, over 6755877.48 frames. ], batch size: 97, lr: 3.73e-03, grad_scale: 32.0 2024-09-19 03:13:32,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=569188.6666666666, ans=0.2 2024-09-19 03:13:34,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=569188.6666666666, ans=0.0 2024-09-19 03:13:52,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=569235.3333333334, ans=0.125 2024-09-19 03:13:59,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=569235.3333333334, ans=0.2 2024-09-19 03:14:09,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=569282.0, ans=0.125 2024-09-19 03:14:09,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.43 vs. limit=15.0 2024-09-19 03:14:15,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=569282.0, ans=0.2 2024-09-19 03:14:17,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=569282.0, ans=0.0 2024-09-19 03:14:22,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=569328.6666666666, ans=0.125 2024-09-19 03:14:29,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=569328.6666666666, ans=0.125 2024-09-19 03:14:31,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=569328.6666666666, ans=0.0 2024-09-19 03:14:36,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=569328.6666666666, ans=0.125 2024-09-19 03:14:55,769 INFO [train.py:1198] (1/2) Epoch 32, batch 1850, loss[loss=0.2151, simple_loss=0.2737, pruned_loss=0.05813, ctc_loss=0.1246, cr_loss=0.385, over 34439.00 frames. ], tot_loss[loss=0.2108, simple_loss=0.2657, pruned_loss=0.05771, ctc_loss=0.1226, cr_loss=0.398, over 6764354.23 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2024-09-19 03:15:02,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=569422.0, ans=0.125 2024-09-19 03:15:17,042 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.053e+02 2.590e+02 2.979e+02 4.050e+02 6.185e+02, threshold=5.957e+02, percent-clipped=3.0 2024-09-19 03:15:28,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.62 vs. limit=12.0 2024-09-19 03:15:29,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=569515.3333333334, ans=0.2 2024-09-19 03:15:34,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=569515.3333333334, ans=0.125 2024-09-19 03:15:57,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=569562.0, ans=0.125 2024-09-19 03:16:01,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=569562.0, ans=0.025 2024-09-19 03:16:11,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2024-09-19 03:16:14,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=569608.6666666666, ans=0.125 2024-09-19 03:16:19,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=569608.6666666666, ans=0.125 2024-09-19 03:16:22,071 INFO [train.py:1198] (1/2) Epoch 32, batch 1900, loss[loss=0.2218, simple_loss=0.2813, pruned_loss=0.05998, ctc_loss=0.1295, cr_loss=0.4084, over 34381.00 frames. ], tot_loss[loss=0.2113, simple_loss=0.2663, pruned_loss=0.05788, ctc_loss=0.1229, cr_loss=0.3987, over 6773807.27 frames. ], batch size: 103, lr: 3.73e-03, grad_scale: 32.0 2024-09-19 03:16:25,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=569655.3333333334, ans=0.125 2024-09-19 03:16:45,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=569702.0, ans=0.0 2024-09-19 03:16:48,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=569702.0, ans=0.125 2024-09-19 03:16:57,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=569748.6666666666, ans=0.125 2024-09-19 03:17:15,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=569795.3333333334, ans=0.0 2024-09-19 03:17:16,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=569795.3333333334, ans=0.0 2024-09-19 03:17:21,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=569795.3333333334, ans=0.125 2024-09-19 03:17:25,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.72 vs. limit=10.0 2024-09-19 03:17:39,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=569842.0, ans=0.125 2024-09-19 03:17:44,533 INFO [train.py:1198] (1/2) Epoch 32, batch 1950, loss[loss=0.2117, simple_loss=0.2654, pruned_loss=0.05894, ctc_loss=0.123, cr_loss=0.3911, over 34330.00 frames. ], tot_loss[loss=0.2126, simple_loss=0.2676, pruned_loss=0.05839, ctc_loss=0.1238, cr_loss=0.4007, over 6790015.96 frames. ], batch size: 91, lr: 3.73e-03, grad_scale: 32.0 2024-09-19 03:18:01,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=569935.3333333334, ans=0.1 2024-09-19 03:18:03,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=569935.3333333334, ans=0.0 2024-09-19 03:18:06,254 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.147e+02 2.527e+02 2.898e+02 3.819e+02 6.466e+02, threshold=5.796e+02, percent-clipped=2.0 2024-09-19 03:18:32,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=570028.6666666666, ans=0.125 2024-09-19 03:18:44,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=570028.6666666666, ans=0.125 2024-09-19 03:18:50,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=570075.3333333334, ans=0.015 2024-09-19 03:18:57,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=570075.3333333334, ans=0.2 2024-09-19 03:19:06,949 INFO [train.py:1198] (1/2) Epoch 32, batch 2000, loss[loss=0.187, simple_loss=0.2387, pruned_loss=0.04948, ctc_loss=0.1094, cr_loss=0.3634, over 34186.00 frames. ], tot_loss[loss=0.2129, simple_loss=0.2679, pruned_loss=0.05852, ctc_loss=0.1241, cr_loss=0.4011, over 6764810.65 frames. ], batch size: 78, lr: 3.73e-03, grad_scale: 32.0 2024-09-19 03:19:09,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=570122.0, ans=0.125 2024-09-19 03:19:15,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=570122.0, ans=0.1 2024-09-19 03:19:19,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=570122.0, ans=0.125 2024-09-19 03:19:54,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.60 vs. limit=15.0 2024-09-19 03:19:57,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=570215.3333333334, ans=0.0 2024-09-19 03:20:02,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=570262.0, ans=0.125 2024-09-19 03:20:07,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=570262.0, ans=0.1 2024-09-19 03:20:19,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=570308.6666666666, ans=0.125 2024-09-19 03:20:33,795 INFO [train.py:1198] (1/2) Epoch 32, batch 2050, loss[loss=0.1865, simple_loss=0.2434, pruned_loss=0.04769, ctc_loss=0.1016, cr_loss=0.3486, over 34500.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2669, pruned_loss=0.05824, ctc_loss=0.1236, cr_loss=0.3994, over 6755472.84 frames. ], batch size: 82, lr: 3.73e-03, grad_scale: 32.0 2024-09-19 03:20:39,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=570355.3333333334, ans=0.0 2024-09-19 03:20:48,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=570402.0, ans=0.0 2024-09-19 03:20:55,176 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.011e+02 2.518e+02 2.978e+02 3.664e+02 6.757e+02, threshold=5.956e+02, percent-clipped=5.0 2024-09-19 03:21:02,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570402.0, ans=0.1 2024-09-19 03:21:03,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=570402.0, ans=0.1 2024-09-19 03:21:10,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=570448.6666666666, ans=0.125 2024-09-19 03:21:18,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=570448.6666666666, ans=0.0 2024-09-19 03:21:25,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=570495.3333333334, ans=0.125 2024-09-19 03:21:28,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=570495.3333333334, ans=0.125 2024-09-19 03:21:30,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.71 vs. limit=15.0 2024-09-19 03:21:55,939 INFO [train.py:1198] (1/2) Epoch 32, batch 2100, loss[loss=0.2154, simple_loss=0.2704, pruned_loss=0.05942, ctc_loss=0.1242, cr_loss=0.4144, over 34540.00 frames. ], tot_loss[loss=0.2116, simple_loss=0.2663, pruned_loss=0.05809, ctc_loss=0.1233, cr_loss=0.3993, over 6770492.24 frames. ], batch size: 94, lr: 3.73e-03, grad_scale: 32.0 2024-09-19 03:22:01,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=570588.6666666666, ans=0.125 2024-09-19 03:22:11,485 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.81 vs. limit=15.0 2024-09-19 03:22:21,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=570635.3333333334, ans=0.125 2024-09-19 03:22:29,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=570682.0, ans=0.025 2024-09-19 03:23:05,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.67 vs. limit=12.0 2024-09-19 03:23:17,790 INFO [train.py:1198] (1/2) Epoch 32, batch 2150, loss[loss=0.1925, simple_loss=0.2481, pruned_loss=0.04977, ctc_loss=0.1099, cr_loss=0.3831, over 34360.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2656, pruned_loss=0.05766, ctc_loss=0.1225, cr_loss=0.3977, over 6789384.17 frames. ], batch size: 91, lr: 3.73e-03, grad_scale: 32.0 2024-09-19 03:23:30,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=570822.0, ans=0.125 2024-09-19 03:23:35,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=570868.6666666666, ans=0.125 2024-09-19 03:23:43,490 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.456e+02 2.768e+02 3.491e+02 6.742e+02, threshold=5.536e+02, percent-clipped=2.0 2024-09-19 03:24:05,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=570915.3333333334, ans=0.125 2024-09-19 03:24:06,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570915.3333333334, ans=0.1 2024-09-19 03:24:13,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=570962.0, ans=0.0 2024-09-19 03:24:23,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=570962.0, ans=0.125 2024-09-19 03:24:24,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.77 vs. limit=15.0 2024-09-19 03:24:44,468 INFO [train.py:1198] (1/2) Epoch 32, batch 2200, loss[loss=0.2236, simple_loss=0.2821, pruned_loss=0.06076, ctc_loss=0.1311, cr_loss=0.435, over 34448.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2655, pruned_loss=0.05765, ctc_loss=0.1222, cr_loss=0.397, over 6784471.42 frames. ], batch size: 100, lr: 3.73e-03, grad_scale: 32.0 2024-09-19 03:24:57,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=571055.3333333334, ans=0.1 2024-09-19 03:25:00,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.58 vs. limit=15.0 2024-09-19 03:25:07,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=571102.0, ans=0.125 2024-09-19 03:25:12,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=571102.0, ans=0.125 2024-09-19 03:25:19,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=571148.6666666666, ans=0.125 2024-09-19 03:25:19,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=571148.6666666666, ans=0.1 2024-09-19 03:25:25,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=571148.6666666666, ans=0.125 2024-09-19 03:25:33,373 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.83 vs. limit=15.0 2024-09-19 03:25:39,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=571195.3333333334, ans=0.125 2024-09-19 03:25:51,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=571242.0, ans=0.125 2024-09-19 03:25:51,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=571242.0, ans=0.0 2024-09-19 03:25:54,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.20 vs. limit=15.0 2024-09-19 03:26:07,496 INFO [train.py:1198] (1/2) Epoch 32, batch 2250, loss[loss=0.2229, simple_loss=0.2796, pruned_loss=0.0615, ctc_loss=0.1327, cr_loss=0.4182, over 34394.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2655, pruned_loss=0.05767, ctc_loss=0.1224, cr_loss=0.3968, over 6779931.69 frames. ], batch size: 95, lr: 3.73e-03, grad_scale: 32.0 2024-09-19 03:26:11,755 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.84 vs. limit=22.5 2024-09-19 03:26:14,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=571288.6666666666, ans=0.125 2024-09-19 03:26:16,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=571288.6666666666, ans=0.125 2024-09-19 03:26:24,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=571335.3333333334, ans=0.125 2024-09-19 03:26:28,961 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.035e+02 2.550e+02 3.016e+02 3.940e+02 7.251e+02, threshold=6.032e+02, percent-clipped=4.0 2024-09-19 03:26:34,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=571335.3333333334, ans=0.125 2024-09-19 03:26:50,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=571382.0, ans=0.125 2024-09-19 03:27:02,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=571428.6666666666, ans=0.07 2024-09-19 03:27:17,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=571475.3333333334, ans=0.125 2024-09-19 03:27:34,046 INFO [train.py:1198] (1/2) Epoch 32, batch 2300, loss[loss=0.1963, simple_loss=0.2484, pruned_loss=0.05289, ctc_loss=0.116, cr_loss=0.3818, over 34258.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.2647, pruned_loss=0.05749, ctc_loss=0.122, cr_loss=0.3955, over 6765406.70 frames. ], batch size: 83, lr: 3.73e-03, grad_scale: 32.0 2024-09-19 03:27:37,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=571522.0, ans=0.1 2024-09-19 03:27:55,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=571568.6666666666, ans=0.0 2024-09-19 03:28:31,265 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2024-09-19 03:28:38,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=571708.6666666666, ans=0.025 2024-09-19 03:28:56,378 INFO [train.py:1198] (1/2) Epoch 32, batch 2350, loss[loss=0.2188, simple_loss=0.2773, pruned_loss=0.05938, ctc_loss=0.1274, cr_loss=0.4019, over 34692.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.2651, pruned_loss=0.05773, ctc_loss=0.1224, cr_loss=0.3965, over 6771192.24 frames. ], batch size: 97, lr: 3.73e-03, grad_scale: 32.0 2024-09-19 03:29:00,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.65 vs. limit=22.5 2024-09-19 03:29:17,629 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.483e+02 2.832e+02 3.330e+02 4.828e+02, threshold=5.665e+02, percent-clipped=0.0 2024-09-19 03:30:15,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=571942.0, ans=0.2 2024-09-19 03:30:18,609 INFO [train.py:1198] (1/2) Epoch 32, batch 2400, loss[loss=0.2014, simple_loss=0.2532, pruned_loss=0.05508, ctc_loss=0.1186, cr_loss=0.3926, over 34594.00 frames. ], tot_loss[loss=0.2108, simple_loss=0.2655, pruned_loss=0.05783, ctc_loss=0.1226, cr_loss=0.3972, over 6775058.12 frames. ], batch size: 89, lr: 3.72e-03, grad_scale: 32.0 2024-09-19 03:30:20,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=571988.6666666666, ans=0.125 2024-09-19 03:30:22,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=571988.6666666666, ans=0.125 2024-09-19 03:30:23,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=571988.6666666666, ans=0.0 2024-09-19 03:30:23,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=571988.6666666666, ans=0.0 2024-09-19 03:30:37,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.84 vs. limit=15.0 2024-09-19 03:30:52,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=12.0 2024-09-19 03:31:14,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=572128.6666666666, ans=0.125 2024-09-19 03:31:32,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=572175.3333333334, ans=0.0 2024-09-19 03:31:39,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=572175.3333333334, ans=0.0 2024-09-19 03:31:45,450 INFO [train.py:1198] (1/2) Epoch 32, batch 2450, loss[loss=0.2282, simple_loss=0.2833, pruned_loss=0.06439, ctc_loss=0.1364, cr_loss=0.4231, over 34411.00 frames. ], tot_loss[loss=0.2115, simple_loss=0.2662, pruned_loss=0.05807, ctc_loss=0.1231, cr_loss=0.398, over 6750118.29 frames. ], batch size: 95, lr: 3.72e-03, grad_scale: 32.0 2024-09-19 03:31:50,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=572222.0, ans=0.1 2024-09-19 03:32:08,343 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.062e+02 2.630e+02 3.006e+02 3.439e+02 5.471e+02, threshold=6.012e+02, percent-clipped=0.0 2024-09-19 03:32:15,916 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.93 vs. limit=15.0 2024-09-19 03:32:20,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=572315.3333333334, ans=0.125 2024-09-19 03:32:25,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=572315.3333333334, ans=0.0 2024-09-19 03:32:38,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=572362.0, ans=0.125 2024-09-19 03:32:40,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=572362.0, ans=0.0 2024-09-19 03:32:58,857 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.99 vs. limit=10.0 2024-09-19 03:33:07,941 INFO [train.py:1198] (1/2) Epoch 32, batch 2500, loss[loss=0.224, simple_loss=0.2778, pruned_loss=0.0634, ctc_loss=0.1327, cr_loss=0.4234, over 34428.00 frames. ], tot_loss[loss=0.2113, simple_loss=0.2661, pruned_loss=0.05798, ctc_loss=0.123, cr_loss=0.3977, over 6762597.15 frames. ], batch size: 100, lr: 3.72e-03, grad_scale: 32.0 2024-09-19 03:33:09,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=572455.3333333334, ans=0.125 2024-09-19 03:33:18,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=572455.3333333334, ans=0.125 2024-09-19 03:33:46,650 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2024-09-19 03:33:54,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=572548.6666666666, ans=0.125 2024-09-19 03:34:01,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=21.36 vs. limit=22.5 2024-09-19 03:34:16,001 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:34:30,541 INFO [train.py:1198] (1/2) Epoch 32, batch 2550, loss[loss=0.1775, simple_loss=0.2335, pruned_loss=0.04466, ctc_loss=0.09719, cr_loss=0.3202, over 34135.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.2661, pruned_loss=0.05794, ctc_loss=0.1229, cr_loss=0.3977, over 6765510.96 frames. ], batch size: 78, lr: 3.72e-03, grad_scale: 32.0 2024-09-19 03:34:43,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.05 vs. limit=15.0 2024-09-19 03:34:44,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572688.6666666666, ans=0.1 2024-09-19 03:34:50,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=572735.3333333334, ans=0.2 2024-09-19 03:34:53,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.73 vs. limit=15.0 2024-09-19 03:34:55,163 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.060e+02 2.489e+02 2.758e+02 3.484e+02 6.008e+02, threshold=5.516e+02, percent-clipped=0.0 2024-09-19 03:34:55,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=572735.3333333334, ans=0.04949747468305833 2024-09-19 03:34:58,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=572735.3333333334, ans=0.125 2024-09-19 03:35:09,978 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.60 vs. limit=22.5 2024-09-19 03:35:40,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=572875.3333333334, ans=0.125 2024-09-19 03:35:56,696 INFO [train.py:1198] (1/2) Epoch 32, batch 2600, loss[loss=0.2077, simple_loss=0.2563, pruned_loss=0.05936, ctc_loss=0.1231, cr_loss=0.3935, over 34384.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.2662, pruned_loss=0.05792, ctc_loss=0.1228, cr_loss=0.3977, over 6762514.30 frames. ], batch size: 91, lr: 3.72e-03, grad_scale: 32.0 2024-09-19 03:36:06,086 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.35 vs. limit=15.0 2024-09-19 03:36:08,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=572922.0, ans=0.125 2024-09-19 03:36:21,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=572968.6666666666, ans=0.125 2024-09-19 03:36:51,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=573062.0, ans=0.125 2024-09-19 03:36:54,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=573062.0, ans=0.1 2024-09-19 03:37:09,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=573108.6666666666, ans=0.0 2024-09-19 03:37:18,731 INFO [train.py:1198] (1/2) Epoch 32, batch 2650, loss[loss=0.2201, simple_loss=0.2806, pruned_loss=0.05961, ctc_loss=0.1248, cr_loss=0.3848, over 34188.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.2667, pruned_loss=0.05808, ctc_loss=0.123, cr_loss=0.3984, over 6769651.60 frames. ], batch size: 117, lr: 3.72e-03, grad_scale: 32.0 2024-09-19 03:37:41,627 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.140e+02 2.468e+02 2.835e+02 3.438e+02 5.524e+02, threshold=5.671e+02, percent-clipped=1.0 2024-09-19 03:38:06,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=573295.3333333334, ans=0.125 2024-09-19 03:38:38,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=573342.0, ans=0.04949747468305833 2024-09-19 03:38:42,613 INFO [train.py:1198] (1/2) Epoch 32, batch 2700, loss[loss=0.2214, simple_loss=0.2763, pruned_loss=0.06164, ctc_loss=0.1339, cr_loss=0.4106, over 34622.00 frames. ], tot_loss[loss=0.2123, simple_loss=0.2671, pruned_loss=0.05838, ctc_loss=0.1236, cr_loss=0.3993, over 6764546.17 frames. ], batch size: 102, lr: 3.72e-03, grad_scale: 32.0 2024-09-19 03:38:44,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.37 vs. limit=15.0 2024-09-19 03:39:03,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=573435.3333333334, ans=0.0 2024-09-19 03:39:10,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.60 vs. limit=6.0 2024-09-19 03:39:13,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.36 vs. limit=10.0 2024-09-19 03:39:18,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=573482.0, ans=0.1 2024-09-19 03:39:44,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=573528.6666666666, ans=0.2 2024-09-19 03:39:46,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=573528.6666666666, ans=0.125 2024-09-19 03:39:46,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=573528.6666666666, ans=0.0 2024-09-19 03:39:46,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2024-09-19 03:40:05,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=573622.0, ans=0.125 2024-09-19 03:40:07,214 INFO [train.py:1198] (1/2) Epoch 32, batch 2750, loss[loss=0.2018, simple_loss=0.2533, pruned_loss=0.05562, ctc_loss=0.1168, cr_loss=0.3932, over 34614.00 frames. ], tot_loss[loss=0.2111, simple_loss=0.2659, pruned_loss=0.05793, ctc_loss=0.1229, cr_loss=0.3982, over 6760921.26 frames. ], batch size: 88, lr: 3.72e-03, grad_scale: 32.0 2024-09-19 03:40:07,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=573622.0, ans=0.0 2024-09-19 03:40:12,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=573622.0, ans=0.125 2024-09-19 03:40:30,326 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.157e+02 2.674e+02 3.110e+02 4.031e+02 7.807e+02, threshold=6.221e+02, percent-clipped=5.0 2024-09-19 03:40:32,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.81 vs. limit=10.0 2024-09-19 03:40:40,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=573715.3333333334, ans=0.2 2024-09-19 03:40:42,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=573715.3333333334, ans=0.0 2024-09-19 03:40:43,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=573715.3333333334, ans=0.0 2024-09-19 03:40:51,933 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:40:55,470 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:41:07,243 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.591e-02 2024-09-19 03:41:07,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.05 vs. limit=15.0 2024-09-19 03:41:08,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=573762.0, ans=0.2 2024-09-19 03:41:12,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=573808.6666666666, ans=0.1 2024-09-19 03:41:20,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=573808.6666666666, ans=0.125 2024-09-19 03:41:23,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=573808.6666666666, ans=0.0 2024-09-19 03:41:30,006 INFO [train.py:1198] (1/2) Epoch 32, batch 2800, loss[loss=0.2478, simple_loss=0.2877, pruned_loss=0.07902, ctc_loss=0.1646, cr_loss=0.4261, over 23332.00 frames. ], tot_loss[loss=0.2114, simple_loss=0.266, pruned_loss=0.05814, ctc_loss=0.1233, cr_loss=0.3986, over 6738204.76 frames. ], batch size: 244, lr: 3.72e-03, grad_scale: 32.0 2024-09-19 03:42:04,945 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:42:11,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=573948.6666666666, ans=0.0 2024-09-19 03:42:17,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.13 vs. limit=15.0 2024-09-19 03:42:25,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=573995.3333333334, ans=0.125 2024-09-19 03:42:28,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=573995.3333333334, ans=0.0 2024-09-19 03:42:47,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=574042.0, ans=0.125 2024-09-19 03:42:52,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=574042.0, ans=0.1 2024-09-19 03:42:56,472 INFO [train.py:1198] (1/2) Epoch 32, batch 2850, loss[loss=0.2106, simple_loss=0.2614, pruned_loss=0.05947, ctc_loss=0.1225, cr_loss=0.4074, over 34501.00 frames. ], tot_loss[loss=0.2119, simple_loss=0.2665, pruned_loss=0.05828, ctc_loss=0.1236, cr_loss=0.3996, over 6722475.04 frames. ], batch size: 90, lr: 3.72e-03, grad_scale: 32.0 2024-09-19 03:43:08,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=574088.6666666666, ans=0.125 2024-09-19 03:43:18,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.80 vs. limit=15.0 2024-09-19 03:43:19,711 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.121e+02 2.491e+02 3.083e+02 3.835e+02 6.645e+02, threshold=6.167e+02, percent-clipped=2.0 2024-09-19 03:44:17,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=574322.0, ans=0.2 2024-09-19 03:44:19,080 INFO [train.py:1198] (1/2) Epoch 32, batch 2900, loss[loss=0.2142, simple_loss=0.2697, pruned_loss=0.05903, ctc_loss=0.124, cr_loss=0.3951, over 34521.00 frames. ], tot_loss[loss=0.2131, simple_loss=0.2679, pruned_loss=0.0587, ctc_loss=0.1244, cr_loss=0.4022, over 6753500.11 frames. ], batch size: 94, lr: 3.72e-03, grad_scale: 16.0 2024-09-19 03:44:44,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=574368.6666666666, ans=0.0 2024-09-19 03:44:55,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=574415.3333333334, ans=0.125 2024-09-19 03:45:27,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.36 vs. limit=6.0 2024-09-19 03:45:30,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=574508.6666666666, ans=0.125 2024-09-19 03:45:39,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=15.0 2024-09-19 03:45:43,120 INFO [train.py:1198] (1/2) Epoch 32, batch 2950, loss[loss=0.2002, simple_loss=0.2524, pruned_loss=0.05508, ctc_loss=0.114, cr_loss=0.3775, over 34622.00 frames. ], tot_loss[loss=0.2114, simple_loss=0.2661, pruned_loss=0.05807, ctc_loss=0.1232, cr_loss=0.3991, over 6747669.08 frames. ], batch size: 88, lr: 3.72e-03, grad_scale: 16.0 2024-09-19 03:46:06,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=574602.0, ans=0.2 2024-09-19 03:46:06,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=574602.0, ans=0.1 2024-09-19 03:46:07,817 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.139e+02 2.531e+02 3.003e+02 3.735e+02 8.558e+02, threshold=6.007e+02, percent-clipped=4.0 2024-09-19 03:46:10,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=574602.0, ans=0.125 2024-09-19 03:46:13,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=574602.0, ans=0.1 2024-09-19 03:46:31,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=22.5 2024-09-19 03:46:43,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=574695.3333333334, ans=0.0 2024-09-19 03:46:45,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=574695.3333333334, ans=0.0 2024-09-19 03:46:53,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2024-09-19 03:47:00,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=574742.0, ans=0.125 2024-09-19 03:47:08,176 INFO [train.py:1198] (1/2) Epoch 32, batch 3000, loss[loss=0.2124, simple_loss=0.2667, pruned_loss=0.0583, ctc_loss=0.1243, cr_loss=0.4174, over 34550.00 frames. ], tot_loss[loss=0.2115, simple_loss=0.2661, pruned_loss=0.05809, ctc_loss=0.1233, cr_loss=0.3994, over 6746936.77 frames. ], batch size: 94, lr: 3.72e-03, grad_scale: 16.0 2024-09-19 03:47:08,177 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 03:47:25,109 INFO [train.py:1230] (1/2) Epoch 32, validation: loss=0.1489, simple_loss=0.2434, pruned_loss=0.02318, ctc_loss=0.03985, cr_loss=2.123e-14, over 944034.00 frames. 2024-09-19 03:47:25,109 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 03:47:35,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=574788.6666666666, ans=0.1 2024-09-19 03:48:06,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=574882.0, ans=0.125 2024-09-19 03:48:26,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=574928.6666666666, ans=0.125 2024-09-19 03:48:39,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=574975.3333333334, ans=0.95 2024-09-19 03:48:47,272 INFO [train.py:1198] (1/2) Epoch 32, batch 3050, loss[loss=0.1914, simple_loss=0.2468, pruned_loss=0.04983, ctc_loss=0.1075, cr_loss=0.3708, over 34588.00 frames. ], tot_loss[loss=0.212, simple_loss=0.2668, pruned_loss=0.05823, ctc_loss=0.1236, cr_loss=0.4003, over 6739826.32 frames. ], batch size: 89, lr: 3.72e-03, grad_scale: 16.0 2024-09-19 03:49:11,863 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.411e+02 2.700e+02 3.335e+02 5.972e+02, threshold=5.401e+02, percent-clipped=0.0 2024-09-19 03:49:12,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=575068.6666666666, ans=0.1 2024-09-19 03:49:28,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=575115.3333333334, ans=0.2 2024-09-19 03:49:44,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=575162.0, ans=0.125 2024-09-19 03:49:49,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=575162.0, ans=0.1 2024-09-19 03:50:08,487 INFO [train.py:1198] (1/2) Epoch 32, batch 3100, loss[loss=0.226, simple_loss=0.2812, pruned_loss=0.06334, ctc_loss=0.1344, cr_loss=0.4324, over 34255.00 frames. ], tot_loss[loss=0.2118, simple_loss=0.2664, pruned_loss=0.05821, ctc_loss=0.1235, cr_loss=0.4, over 6740150.53 frames. ], batch size: 117, lr: 3.71e-03, grad_scale: 16.0 2024-09-19 03:50:22,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.27 vs. limit=15.0 2024-09-19 03:50:36,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=575302.0, ans=0.125 2024-09-19 03:50:44,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=575348.6666666666, ans=0.1 2024-09-19 03:50:50,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=575348.6666666666, ans=0.1 2024-09-19 03:50:50,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=575348.6666666666, ans=0.125 2024-09-19 03:50:55,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=575348.6666666666, ans=0.0 2024-09-19 03:51:30,829 INFO [train.py:1198] (1/2) Epoch 32, batch 3150, loss[loss=0.2316, simple_loss=0.2852, pruned_loss=0.06666, ctc_loss=0.1358, cr_loss=0.4387, over 33823.00 frames. ], tot_loss[loss=0.2118, simple_loss=0.2665, pruned_loss=0.05816, ctc_loss=0.1234, cr_loss=0.3999, over 6747103.33 frames. ], batch size: 122, lr: 3.71e-03, grad_scale: 16.0 2024-09-19 03:51:31,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=575488.6666666666, ans=0.125 2024-09-19 03:51:52,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=575535.3333333334, ans=0.2 2024-09-19 03:51:56,865 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.505e+02 2.946e+02 3.796e+02 6.616e+02, threshold=5.892e+02, percent-clipped=6.0 2024-09-19 03:52:08,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=575582.0, ans=0.125 2024-09-19 03:52:12,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.16 vs. limit=6.0 2024-09-19 03:52:27,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=575628.6666666666, ans=0.125 2024-09-19 03:52:31,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=575628.6666666666, ans=0.1 2024-09-19 03:52:45,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=575675.3333333334, ans=0.2 2024-09-19 03:52:48,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.61 vs. limit=15.0 2024-09-19 03:52:53,710 INFO [train.py:1198] (1/2) Epoch 32, batch 3200, loss[loss=0.2093, simple_loss=0.2642, pruned_loss=0.05718, ctc_loss=0.1229, cr_loss=0.3851, over 34533.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.2664, pruned_loss=0.05815, ctc_loss=0.1234, cr_loss=0.3997, over 6760998.70 frames. ], batch size: 94, lr: 3.71e-03, grad_scale: 32.0 2024-09-19 03:52:53,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=575722.0, ans=0.125 2024-09-19 03:52:54,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=575722.0, ans=0.125 2024-09-19 03:53:03,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=575722.0, ans=0.125 2024-09-19 03:53:18,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=575768.6666666666, ans=0.125 2024-09-19 03:53:34,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=575815.3333333334, ans=0.125 2024-09-19 03:54:00,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=575908.6666666666, ans=0.125 2024-09-19 03:54:02,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=575908.6666666666, ans=15.0 2024-09-19 03:54:03,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=575908.6666666666, ans=0.2 2024-09-19 03:54:14,935 INFO [train.py:1198] (1/2) Epoch 32, batch 3250, loss[loss=0.2253, simple_loss=0.284, pruned_loss=0.06186, ctc_loss=0.13, cr_loss=0.4209, over 34664.00 frames. ], tot_loss[loss=0.2122, simple_loss=0.267, pruned_loss=0.05829, ctc_loss=0.1237, cr_loss=0.4002, over 6770575.04 frames. ], batch size: 98, lr: 3.71e-03, grad_scale: 32.0 2024-09-19 03:54:21,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=575955.3333333334, ans=0.1 2024-09-19 03:54:36,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=576002.0, ans=0.125 2024-09-19 03:54:39,100 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.042e+02 2.453e+02 2.776e+02 3.569e+02 5.424e+02, threshold=5.552e+02, percent-clipped=0.0 2024-09-19 03:55:02,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=576095.3333333334, ans=0.0 2024-09-19 03:55:21,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=576142.0, ans=0.125 2024-09-19 03:55:35,814 INFO [train.py:1198] (1/2) Epoch 32, batch 3300, loss[loss=0.2138, simple_loss=0.2694, pruned_loss=0.05836, ctc_loss=0.1251, cr_loss=0.4099, over 33145.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2656, pruned_loss=0.05765, ctc_loss=0.1225, cr_loss=0.3971, over 6768698.02 frames. ], batch size: 130, lr: 3.71e-03, grad_scale: 32.0 2024-09-19 03:55:39,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=576188.6666666666, ans=0.125 2024-09-19 03:55:47,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=576188.6666666666, ans=0.125 2024-09-19 03:55:50,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=576235.3333333334, ans=0.0 2024-09-19 03:56:28,068 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2024-09-19 03:56:30,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=576328.6666666666, ans=0.125 2024-09-19 03:56:32,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=576328.6666666666, ans=0.2 2024-09-19 03:56:32,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=12.0 2024-09-19 03:56:46,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=15.0 2024-09-19 03:56:46,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2024-09-19 03:56:58,030 INFO [train.py:1198] (1/2) Epoch 32, batch 3350, loss[loss=0.2232, simple_loss=0.2787, pruned_loss=0.06224, ctc_loss=0.1319, cr_loss=0.4211, over 33838.00 frames. ], tot_loss[loss=0.2114, simple_loss=0.2664, pruned_loss=0.05792, ctc_loss=0.1229, cr_loss=0.3985, over 6746122.37 frames. ], batch size: 122, lr: 3.71e-03, grad_scale: 32.0 2024-09-19 03:56:59,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=576422.0, ans=0.125 2024-09-19 03:56:59,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=576422.0, ans=0.0 2024-09-19 03:57:11,285 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:57:14,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=576468.6666666666, ans=0.125 2024-09-19 03:57:19,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=576468.6666666666, ans=0.0 2024-09-19 03:57:23,766 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.423e+02 2.793e+02 3.326e+02 6.066e+02, threshold=5.586e+02, percent-clipped=1.0 2024-09-19 03:57:25,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=576468.6666666666, ans=0.2 2024-09-19 03:57:32,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=15.0 2024-09-19 03:57:48,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=576562.0, ans=0.0 2024-09-19 03:57:48,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=576562.0, ans=0.125 2024-09-19 03:57:50,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.88 vs. limit=12.0 2024-09-19 03:57:56,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=576562.0, ans=0.025 2024-09-19 03:58:11,034 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 03:58:20,136 INFO [train.py:1198] (1/2) Epoch 32, batch 3400, loss[loss=0.1813, simple_loss=0.2356, pruned_loss=0.04602, ctc_loss=0.1032, cr_loss=0.3559, over 34177.00 frames. ], tot_loss[loss=0.2107, simple_loss=0.2657, pruned_loss=0.0577, ctc_loss=0.1225, cr_loss=0.3977, over 6734682.08 frames. ], batch size: 78, lr: 3.71e-03, grad_scale: 32.0 2024-09-19 03:58:34,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=576702.0, ans=0.0 2024-09-19 03:58:39,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=576702.0, ans=0.125 2024-09-19 03:58:59,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=15.02 vs. limit=22.5 2024-09-19 03:59:10,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=15.0 2024-09-19 03:59:40,649 INFO [train.py:1198] (1/2) Epoch 32, batch 3450, loss[loss=0.2175, simple_loss=0.2767, pruned_loss=0.05837, ctc_loss=0.1255, cr_loss=0.4109, over 32956.00 frames. ], tot_loss[loss=0.2107, simple_loss=0.2658, pruned_loss=0.0576, ctc_loss=0.1223, cr_loss=0.3979, over 6746572.09 frames. ], batch size: 130, lr: 3.71e-03, grad_scale: 32.0 2024-09-19 03:59:44,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=576888.6666666666, ans=0.2 2024-09-19 03:59:48,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.93 vs. limit=6.0 2024-09-19 04:00:04,893 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.972e+02 2.459e+02 2.943e+02 3.477e+02 5.545e+02, threshold=5.885e+02, percent-clipped=0.0 2024-09-19 04:00:21,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=576982.0, ans=0.125 2024-09-19 04:00:29,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=577028.6666666666, ans=0.1 2024-09-19 04:00:38,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=577028.6666666666, ans=0.125 2024-09-19 04:00:42,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.71 vs. limit=15.0 2024-09-19 04:00:59,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=577075.3333333334, ans=0.0 2024-09-19 04:01:02,183 INFO [train.py:1198] (1/2) Epoch 32, batch 3500, loss[loss=0.187, simple_loss=0.2434, pruned_loss=0.04803, ctc_loss=0.104, cr_loss=0.3449, over 34482.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.2654, pruned_loss=0.05755, ctc_loss=0.1221, cr_loss=0.3976, over 6749169.61 frames. ], batch size: 85, lr: 3.71e-03, grad_scale: 32.0 2024-09-19 04:01:10,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=577122.0, ans=0.0 2024-09-19 04:01:43,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=577215.3333333334, ans=0.125 2024-09-19 04:01:56,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=577262.0, ans=0.125 2024-09-19 04:02:16,175 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.67 vs. limit=15.0 2024-09-19 04:02:17,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=577308.6666666666, ans=0.025 2024-09-19 04:02:22,953 INFO [train.py:1198] (1/2) Epoch 32, batch 3550, loss[loss=0.2346, simple_loss=0.2916, pruned_loss=0.06605, ctc_loss=0.1394, cr_loss=0.4408, over 34402.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2656, pruned_loss=0.05763, ctc_loss=0.1223, cr_loss=0.3977, over 6758825.59 frames. ], batch size: 103, lr: 3.71e-03, grad_scale: 32.0 2024-09-19 04:02:28,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=577355.3333333334, ans=0.125 2024-09-19 04:02:30,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.78 vs. limit=22.5 2024-09-19 04:02:36,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=577355.3333333334, ans=0.1 2024-09-19 04:02:46,816 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.130e+02 2.548e+02 2.963e+02 3.484e+02 5.968e+02, threshold=5.926e+02, percent-clipped=1.0 2024-09-19 04:03:07,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=22.5 2024-09-19 04:03:11,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2024-09-19 04:03:24,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=577495.3333333334, ans=0.125 2024-09-19 04:03:29,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.72 vs. limit=15.0 2024-09-19 04:03:43,152 INFO [train.py:1198] (1/2) Epoch 32, batch 3600, loss[loss=0.2083, simple_loss=0.2631, pruned_loss=0.05661, ctc_loss=0.1215, cr_loss=0.3986, over 34488.00 frames. ], tot_loss[loss=0.2105, simple_loss=0.2656, pruned_loss=0.05753, ctc_loss=0.122, cr_loss=0.3967, over 6768022.22 frames. ], batch size: 90, lr: 3.71e-03, grad_scale: 32.0 2024-09-19 04:04:08,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=577635.3333333334, ans=0.125 2024-09-19 04:04:18,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=577682.0, ans=0.025 2024-09-19 04:04:46,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=577775.3333333334, ans=0.125 2024-09-19 04:04:48,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=577775.3333333334, ans=0.125 2024-09-19 04:04:51,803 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:04:51,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=577775.3333333334, ans=0.07 2024-09-19 04:05:04,392 INFO [train.py:1198] (1/2) Epoch 32, batch 3650, loss[loss=0.2228, simple_loss=0.2806, pruned_loss=0.06137, ctc_loss=0.1294, cr_loss=0.4075, over 34426.00 frames. ], tot_loss[loss=0.2096, simple_loss=0.2648, pruned_loss=0.05717, ctc_loss=0.1214, cr_loss=0.395, over 6770647.24 frames. ], batch size: 110, lr: 3.71e-03, grad_scale: 16.0 2024-09-19 04:05:05,140 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=22.5 2024-09-19 04:05:30,140 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.075e+02 2.503e+02 2.838e+02 3.634e+02 6.453e+02, threshold=5.677e+02, percent-clipped=2.0 2024-09-19 04:05:33,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=577868.6666666666, ans=0.0 2024-09-19 04:05:37,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=577915.3333333334, ans=0.125 2024-09-19 04:05:47,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=22.5 2024-09-19 04:06:05,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-09-19 04:06:25,587 INFO [train.py:1198] (1/2) Epoch 32, batch 3700, loss[loss=0.2222, simple_loss=0.2829, pruned_loss=0.05968, ctc_loss=0.1273, cr_loss=0.4156, over 34630.00 frames. ], tot_loss[loss=0.2093, simple_loss=0.2649, pruned_loss=0.0569, ctc_loss=0.121, cr_loss=0.3943, over 6784702.55 frames. ], batch size: 102, lr: 3.71e-03, grad_scale: 16.0 2024-09-19 04:06:42,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=578102.0, ans=0.0 2024-09-19 04:06:46,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=578102.0, ans=0.1 2024-09-19 04:07:09,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.21 vs. limit=22.5 2024-09-19 04:07:09,830 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2024-09-19 04:07:38,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=578242.0, ans=0.2 2024-09-19 04:07:38,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=578242.0, ans=0.0 2024-09-19 04:07:46,714 INFO [train.py:1198] (1/2) Epoch 32, batch 3750, loss[loss=0.2179, simple_loss=0.2725, pruned_loss=0.06073, ctc_loss=0.1271, cr_loss=0.411, over 34324.00 frames. ], tot_loss[loss=0.2123, simple_loss=0.2679, pruned_loss=0.05808, ctc_loss=0.1233, cr_loss=0.3998, over 6785717.56 frames. ], batch size: 113, lr: 3.70e-03, grad_scale: 16.0 2024-09-19 04:07:53,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=578288.6666666666, ans=0.125 2024-09-19 04:08:12,521 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.132e+02 2.419e+02 2.671e+02 3.151e+02 6.756e+02, threshold=5.341e+02, percent-clipped=2.0 2024-09-19 04:08:26,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=22.5 2024-09-19 04:09:05,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=12.0 2024-09-19 04:09:07,810 INFO [train.py:1198] (1/2) Epoch 32, batch 3800, loss[loss=0.2323, simple_loss=0.276, pruned_loss=0.0707, ctc_loss=0.147, cr_loss=0.4461, over 30085.00 frames. ], tot_loss[loss=0.2155, simple_loss=0.2705, pruned_loss=0.05953, ctc_loss=0.126, cr_loss=0.405, over 6674258.50 frames. ], batch size: 176, lr: 3.70e-03, grad_scale: 16.0 2024-09-19 04:09:19,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=578522.0, ans=0.1 2024-09-19 04:09:48,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=578615.3333333334, ans=0.07 2024-09-19 04:10:26,978 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2024-09-19 04:10:37,836 INFO [train.py:1198] (1/2) Epoch 32, batch 3850, loss[loss=0.2302, simple_loss=0.2752, pruned_loss=0.06969, ctc_loss=0.1453, cr_loss=0.4205, over 23658.00 frames. ], tot_loss[loss=0.2188, simple_loss=0.2726, pruned_loss=0.06137, ctc_loss=0.1299, cr_loss=0.409, over 6248971.07 frames. ], batch size: 244, lr: 3.70e-03, grad_scale: 16.0 2024-09-19 04:11:04,325 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.101e+02 2.496e+02 2.697e+02 2.887e+02 7.000e+02, threshold=5.394e+02, percent-clipped=1.0 2024-09-19 04:11:06,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=578802.0, ans=0.125 2024-09-19 04:12:09,707 INFO [train.py:1198] (1/2) Epoch 33, batch 0, loss[loss=0.1873, simple_loss=0.2471, pruned_loss=0.04634, ctc_loss=0.1048, cr_loss=0.3431, over 34492.00 frames. ], tot_loss[loss=0.1873, simple_loss=0.2471, pruned_loss=0.04634, ctc_loss=0.1048, cr_loss=0.3431, over 34492.00 frames. ], batch size: 85, lr: 3.65e-03, grad_scale: 32.0 2024-09-19 04:12:09,707 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 04:12:26,609 INFO [train.py:1230] (1/2) Epoch 33, validation: loss=0.1492, simple_loss=0.2444, pruned_loss=0.02301, ctc_loss=0.04002, cr_loss=2.108e-14, over 944034.00 frames. 2024-09-19 04:12:26,609 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 04:12:32,595 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.13 vs. limit=22.5 2024-09-19 04:12:43,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=578923.3333333334, ans=0.07 2024-09-19 04:13:06,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=578970.0, ans=10.0 2024-09-19 04:13:06,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=578970.0, ans=0.125 2024-09-19 04:13:15,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=579016.6666666666, ans=0.125 2024-09-19 04:13:31,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=579063.3333333334, ans=0.125 2024-09-19 04:13:36,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=579063.3333333334, ans=0.125 2024-09-19 04:13:40,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=579063.3333333334, ans=0.125 2024-09-19 04:13:44,259 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-19 04:13:45,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=579063.3333333334, ans=0.125 2024-09-19 04:13:51,741 INFO [train.py:1198] (1/2) Epoch 33, batch 50, loss[loss=0.1852, simple_loss=0.2409, pruned_loss=0.04701, ctc_loss=0.107, cr_loss=0.3515, over 34508.00 frames. ], tot_loss[loss=0.2137, simple_loss=0.2689, pruned_loss=0.05878, ctc_loss=0.1245, cr_loss=0.4024, over 1479854.19 frames. ], batch size: 82, lr: 3.64e-03, grad_scale: 32.0 2024-09-19 04:14:15,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=579156.6666666666, ans=0.0 2024-09-19 04:14:18,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=579156.6666666666, ans=0.2 2024-09-19 04:14:18,684 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:14:20,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=579156.6666666666, ans=0.125 2024-09-19 04:14:46,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=579250.0, ans=0.0 2024-09-19 04:14:52,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.95 vs. limit=15.0 2024-09-19 04:14:57,767 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.061e+02 2.578e+02 2.902e+02 3.544e+02 5.702e+02, threshold=5.805e+02, percent-clipped=2.0 2024-09-19 04:15:14,446 INFO [train.py:1198] (1/2) Epoch 33, batch 100, loss[loss=0.195, simple_loss=0.2499, pruned_loss=0.05145, ctc_loss=0.113, cr_loss=0.365, over 34576.00 frames. ], tot_loss[loss=0.214, simple_loss=0.2697, pruned_loss=0.05868, ctc_loss=0.1244, cr_loss=0.4025, over 2629197.04 frames. ], batch size: 89, lr: 3.64e-03, grad_scale: 32.0 2024-09-19 04:15:18,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=579343.3333333334, ans=0.125 2024-09-19 04:15:30,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.69 vs. limit=15.0 2024-09-19 04:15:47,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=579436.6666666666, ans=0.125 2024-09-19 04:15:54,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=579436.6666666666, ans=0.025 2024-09-19 04:16:26,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.31 vs. limit=22.5 2024-09-19 04:16:37,184 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.26 vs. limit=12.0 2024-09-19 04:16:38,151 INFO [train.py:1198] (1/2) Epoch 33, batch 150, loss[loss=0.1888, simple_loss=0.2469, pruned_loss=0.04745, ctc_loss=0.1064, cr_loss=0.3632, over 34462.00 frames. ], tot_loss[loss=0.2116, simple_loss=0.2673, pruned_loss=0.05774, ctc_loss=0.1227, cr_loss=0.3992, over 3557411.34 frames. ], batch size: 82, lr: 3.64e-03, grad_scale: 32.0 2024-09-19 04:16:45,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=579576.6666666666, ans=0.05 2024-09-19 04:16:53,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=579623.3333333334, ans=0.2 2024-09-19 04:17:25,546 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.57 vs. limit=15.0 2024-09-19 04:17:35,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=579716.6666666666, ans=0.05 2024-09-19 04:17:46,695 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.505e+02 2.888e+02 3.722e+02 5.805e+02, threshold=5.776e+02, percent-clipped=1.0 2024-09-19 04:17:49,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.02 vs. limit=6.0 2024-09-19 04:17:55,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=579763.3333333334, ans=0.0 2024-09-19 04:18:00,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=579763.3333333334, ans=0.125 2024-09-19 04:18:03,154 INFO [train.py:1198] (1/2) Epoch 33, batch 200, loss[loss=0.2244, simple_loss=0.2758, pruned_loss=0.06439, ctc_loss=0.1365, cr_loss=0.421, over 31956.00 frames. ], tot_loss[loss=0.211, simple_loss=0.2662, pruned_loss=0.05767, ctc_loss=0.1224, cr_loss=0.3978, over 4270474.25 frames. ], batch size: 145, lr: 3.64e-03, grad_scale: 32.0 2024-09-19 04:18:13,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=579810.0, ans=0.07 2024-09-19 04:18:18,783 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.272e-02 2024-09-19 04:18:49,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=579903.3333333334, ans=0.0 2024-09-19 04:18:51,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=579950.0, ans=0.025 2024-09-19 04:18:51,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.89 vs. limit=12.0 2024-09-19 04:18:55,570 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2024-09-19 04:19:06,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=579950.0, ans=0.025 2024-09-19 04:19:27,981 INFO [train.py:1198] (1/2) Epoch 33, batch 250, loss[loss=0.2231, simple_loss=0.2819, pruned_loss=0.06108, ctc_loss=0.1282, cr_loss=0.4145, over 34233.00 frames. ], tot_loss[loss=0.2105, simple_loss=0.266, pruned_loss=0.05739, ctc_loss=0.122, cr_loss=0.3976, over 4833475.96 frames. ], batch size: 117, lr: 3.64e-03, grad_scale: 32.0 2024-09-19 04:19:38,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=580043.3333333334, ans=0.05 2024-09-19 04:19:49,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=580090.0, ans=0.1 2024-09-19 04:19:56,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=580090.0, ans=6.0 2024-09-19 04:20:14,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=580136.6666666666, ans=0.125 2024-09-19 04:20:24,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=580183.3333333334, ans=0.1 2024-09-19 04:20:30,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=580183.3333333334, ans=0.125 2024-09-19 04:20:33,422 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.006e+02 2.546e+02 3.145e+02 4.016e+02 7.853e+02, threshold=6.290e+02, percent-clipped=6.0 2024-09-19 04:20:49,833 INFO [train.py:1198] (1/2) Epoch 33, batch 300, loss[loss=0.226, simple_loss=0.2788, pruned_loss=0.06454, ctc_loss=0.1343, cr_loss=0.434, over 34356.00 frames. ], tot_loss[loss=0.2102, simple_loss=0.2656, pruned_loss=0.05726, ctc_loss=0.1217, cr_loss=0.3964, over 5263199.71 frames. ], batch size: 107, lr: 3.64e-03, grad_scale: 32.0 2024-09-19 04:21:05,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.54 vs. limit=6.0 2024-09-19 04:21:09,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=580323.3333333334, ans=0.2 2024-09-19 04:21:14,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=580323.3333333334, ans=0.125 2024-09-19 04:21:20,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=580323.3333333334, ans=0.0 2024-09-19 04:21:27,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=580370.0, ans=0.0 2024-09-19 04:21:41,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=580416.6666666666, ans=0.0 2024-09-19 04:21:55,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=580416.6666666666, ans=0.125 2024-09-19 04:22:14,301 INFO [train.py:1198] (1/2) Epoch 33, batch 350, loss[loss=0.1853, simple_loss=0.2403, pruned_loss=0.04783, ctc_loss=0.1034, cr_loss=0.3478, over 34291.00 frames. ], tot_loss[loss=0.2109, simple_loss=0.2662, pruned_loss=0.05758, ctc_loss=0.1224, cr_loss=0.3981, over 5598279.99 frames. ], batch size: 83, lr: 3.64e-03, grad_scale: 32.0 2024-09-19 04:22:14,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=580510.0, ans=0.125 2024-09-19 04:23:02,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=22.5 2024-09-19 04:23:19,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer_ff2.min_abs, batch_count=580650.0, ans=0.1 2024-09-19 04:23:21,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=580696.6666666666, ans=0.04949747468305833 2024-09-19 04:23:22,926 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.042e+02 2.457e+02 2.922e+02 3.992e+02 5.698e+02, threshold=5.843e+02, percent-clipped=0.0 2024-09-19 04:23:31,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=580696.6666666666, ans=0.1 2024-09-19 04:23:36,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=580696.6666666666, ans=0.125 2024-09-19 04:23:39,197 INFO [train.py:1198] (1/2) Epoch 33, batch 400, loss[loss=0.2203, simple_loss=0.2704, pruned_loss=0.06304, ctc_loss=0.1327, cr_loss=0.4382, over 34422.00 frames. ], tot_loss[loss=0.2103, simple_loss=0.2656, pruned_loss=0.05735, ctc_loss=0.1218, cr_loss=0.3969, over 5865627.71 frames. ], batch size: 95, lr: 3.64e-03, grad_scale: 32.0 2024-09-19 04:23:42,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=580743.3333333334, ans=0.0 2024-09-19 04:23:53,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2024-09-19 04:23:59,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=580790.0, ans=0.125 2024-09-19 04:24:06,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=580790.0, ans=0.125 2024-09-19 04:24:14,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=580836.6666666666, ans=0.0 2024-09-19 04:24:22,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=580836.6666666666, ans=0.0 2024-09-19 04:24:48,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=580930.0, ans=0.0 2024-09-19 04:24:55,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=580930.0, ans=0.1 2024-09-19 04:24:58,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=580930.0, ans=0.025 2024-09-19 04:25:00,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=580976.6666666666, ans=0.025 2024-09-19 04:25:01,900 INFO [train.py:1198] (1/2) Epoch 33, batch 450, loss[loss=0.2109, simple_loss=0.2707, pruned_loss=0.05617, ctc_loss=0.1189, cr_loss=0.3762, over 34711.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.2656, pruned_loss=0.05748, ctc_loss=0.122, cr_loss=0.397, over 6054678.22 frames. ], batch size: 97, lr: 3.64e-03, grad_scale: 32.0 2024-09-19 04:25:07,300 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:25:12,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=580976.6666666666, ans=0.125 2024-09-19 04:25:19,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=581023.3333333334, ans=0.125 2024-09-19 04:25:37,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=581070.0, ans=0.125 2024-09-19 04:25:37,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=581070.0, ans=0.125 2024-09-19 04:25:47,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=581070.0, ans=0.0 2024-09-19 04:25:59,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=581116.6666666666, ans=0.0 2024-09-19 04:26:03,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2024-09-19 04:26:10,413 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.454e+02 2.784e+02 3.449e+02 6.013e+02, threshold=5.567e+02, percent-clipped=1.0 2024-09-19 04:26:18,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=581163.3333333334, ans=0.125 2024-09-19 04:26:26,664 INFO [train.py:1198] (1/2) Epoch 33, batch 500, loss[loss=0.2265, simple_loss=0.2843, pruned_loss=0.06228, ctc_loss=0.1335, cr_loss=0.4364, over 34446.00 frames. ], tot_loss[loss=0.2097, simple_loss=0.2649, pruned_loss=0.05719, ctc_loss=0.1215, cr_loss=0.3966, over 6220632.17 frames. ], batch size: 110, lr: 3.64e-03, grad_scale: 32.0 2024-09-19 04:26:39,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.74 vs. limit=10.0 2024-09-19 04:27:07,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.13 vs. limit=22.5 2024-09-19 04:27:27,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=581350.0, ans=0.0 2024-09-19 04:27:51,371 INFO [train.py:1198] (1/2) Epoch 33, batch 550, loss[loss=0.2253, simple_loss=0.2824, pruned_loss=0.06242, ctc_loss=0.1308, cr_loss=0.4267, over 33802.00 frames. ], tot_loss[loss=0.2098, simple_loss=0.265, pruned_loss=0.05718, ctc_loss=0.1216, cr_loss=0.3968, over 6330498.86 frames. ], batch size: 122, lr: 3.64e-03, grad_scale: 32.0 2024-09-19 04:28:07,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=581490.0, ans=0.125 2024-09-19 04:28:12,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=581490.0, ans=0.0 2024-09-19 04:28:13,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=581490.0, ans=0.2 2024-09-19 04:28:17,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=581490.0, ans=0.0 2024-09-19 04:28:20,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=581490.0, ans=0.025 2024-09-19 04:28:41,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=581583.3333333334, ans=0.0 2024-09-19 04:28:46,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=581583.3333333334, ans=0.1 2024-09-19 04:28:51,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=581583.3333333334, ans=0.1 2024-09-19 04:28:57,961 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.075e+02 2.488e+02 2.755e+02 3.391e+02 5.943e+02, threshold=5.511e+02, percent-clipped=2.0 2024-09-19 04:28:58,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=581630.0, ans=0.125 2024-09-19 04:28:59,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=581630.0, ans=0.5 2024-09-19 04:29:16,667 INFO [train.py:1198] (1/2) Epoch 33, batch 600, loss[loss=0.2326, simple_loss=0.2874, pruned_loss=0.0662, ctc_loss=0.1389, cr_loss=0.4416, over 34285.00 frames. ], tot_loss[loss=0.2101, simple_loss=0.2653, pruned_loss=0.05734, ctc_loss=0.1218, cr_loss=0.3971, over 6431521.46 frames. ], batch size: 117, lr: 3.64e-03, grad_scale: 32.0 2024-09-19 04:29:27,761 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.21 vs. limit=15.0 2024-09-19 04:29:48,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=581770.0, ans=0.1 2024-09-19 04:30:12,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_positive, batch_count=581816.6666666666, ans=0.05 2024-09-19 04:30:40,921 INFO [train.py:1198] (1/2) Epoch 33, batch 650, loss[loss=0.2296, simple_loss=0.2813, pruned_loss=0.06676, ctc_loss=0.1365, cr_loss=0.4282, over 34522.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.2646, pruned_loss=0.05692, ctc_loss=0.121, cr_loss=0.3944, over 6522899.50 frames. ], batch size: 94, lr: 3.64e-03, grad_scale: 16.0 2024-09-19 04:30:43,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=7.72 vs. limit=15.0 2024-09-19 04:30:47,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=581910.0, ans=0.0 2024-09-19 04:31:01,395 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.65 vs. limit=10.0 2024-09-19 04:31:11,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=581956.6666666666, ans=0.2 2024-09-19 04:31:12,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=582003.3333333334, ans=0.125 2024-09-19 04:31:48,184 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.410e+02 2.730e+02 3.610e+02 1.046e+03, threshold=5.461e+02, percent-clipped=8.0 2024-09-19 04:32:03,195 INFO [train.py:1198] (1/2) Epoch 33, batch 700, loss[loss=0.2066, simple_loss=0.2606, pruned_loss=0.05617, ctc_loss=0.1208, cr_loss=0.4008, over 34604.00 frames. ], tot_loss[loss=0.2093, simple_loss=0.2648, pruned_loss=0.05688, ctc_loss=0.1209, cr_loss=0.3947, over 6580902.23 frames. ], batch size: 89, lr: 3.64e-03, grad_scale: 16.0 2024-09-19 04:32:29,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=582190.0, ans=0.125 2024-09-19 04:32:39,814 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.171e-02 2024-09-19 04:32:46,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=582236.6666666666, ans=0.125 2024-09-19 04:32:58,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=582283.3333333334, ans=0.125 2024-09-19 04:33:13,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=582330.0, ans=0.125 2024-09-19 04:33:13,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=582330.0, ans=0.025 2024-09-19 04:33:18,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=582330.0, ans=0.025 2024-09-19 04:33:25,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.73 vs. limit=12.0 2024-09-19 04:33:27,949 INFO [train.py:1198] (1/2) Epoch 33, batch 750, loss[loss=0.2181, simple_loss=0.2758, pruned_loss=0.05932, ctc_loss=0.1257, cr_loss=0.417, over 34369.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2645, pruned_loss=0.0568, ctc_loss=0.1207, cr_loss=0.3939, over 6623390.22 frames. ], batch size: 95, lr: 3.63e-03, grad_scale: 16.0 2024-09-19 04:33:28,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=582376.6666666666, ans=0.0 2024-09-19 04:33:38,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=582376.6666666666, ans=0.0 2024-09-19 04:34:24,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=582516.6666666666, ans=0.025 2024-09-19 04:34:37,818 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.156e+02 2.494e+02 2.896e+02 3.722e+02 7.848e+02, threshold=5.792e+02, percent-clipped=10.0 2024-09-19 04:34:49,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=582563.3333333334, ans=0.1 2024-09-19 04:34:51,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=582610.0, ans=0.125 2024-09-19 04:34:52,676 INFO [train.py:1198] (1/2) Epoch 33, batch 800, loss[loss=0.185, simple_loss=0.243, pruned_loss=0.04646, ctc_loss=0.101, cr_loss=0.3481, over 34483.00 frames. ], tot_loss[loss=0.2089, simple_loss=0.2644, pruned_loss=0.05679, ctc_loss=0.1206, cr_loss=0.394, over 6660127.78 frames. ], batch size: 85, lr: 3.63e-03, grad_scale: 32.0 2024-09-19 04:35:23,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.65 vs. limit=22.5 2024-09-19 04:35:24,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=582703.3333333334, ans=0.2 2024-09-19 04:35:26,852 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2024-09-19 04:35:37,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=582703.3333333334, ans=0.125 2024-09-19 04:35:39,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=582703.3333333334, ans=0.125 2024-09-19 04:35:57,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=582796.6666666666, ans=0.0 2024-09-19 04:36:02,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=582796.6666666666, ans=0.125 2024-09-19 04:36:03,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=582796.6666666666, ans=0.125 2024-09-19 04:36:07,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=582796.6666666666, ans=0.125 2024-09-19 04:36:14,817 INFO [train.py:1198] (1/2) Epoch 33, batch 850, loss[loss=0.2217, simple_loss=0.2746, pruned_loss=0.0625, ctc_loss=0.132, cr_loss=0.4338, over 34374.00 frames. ], tot_loss[loss=0.2087, simple_loss=0.2641, pruned_loss=0.05668, ctc_loss=0.1204, cr_loss=0.3937, over 6692931.31 frames. ], batch size: 103, lr: 3.63e-03, grad_scale: 32.0 2024-09-19 04:36:30,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=582890.0, ans=0.2 2024-09-19 04:36:31,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=582890.0, ans=0.0 2024-09-19 04:36:31,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=582890.0, ans=0.0 2024-09-19 04:36:41,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=582890.0, ans=0.125 2024-09-19 04:36:43,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.55 vs. limit=12.0 2024-09-19 04:37:24,658 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.125e+02 2.492e+02 2.916e+02 3.420e+02 5.584e+02, threshold=5.832e+02, percent-clipped=0.0 2024-09-19 04:37:39,558 INFO [train.py:1198] (1/2) Epoch 33, batch 900, loss[loss=0.184, simple_loss=0.2403, pruned_loss=0.04683, ctc_loss=0.1013, cr_loss=0.3439, over 34487.00 frames. ], tot_loss[loss=0.2093, simple_loss=0.2648, pruned_loss=0.0569, ctc_loss=0.1208, cr_loss=0.3947, over 6700030.72 frames. ], batch size: 85, lr: 3.63e-03, grad_scale: 32.0 2024-09-19 04:37:44,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=583076.6666666666, ans=0.0 2024-09-19 04:37:47,295 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.44 vs. limit=15.0 2024-09-19 04:38:08,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=583123.3333333334, ans=0.025 2024-09-19 04:38:18,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=583170.0, ans=0.5 2024-09-19 04:38:41,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583216.6666666666, ans=0.1 2024-09-19 04:38:52,678 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:38:54,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=583263.3333333334, ans=0.125 2024-09-19 04:39:03,653 INFO [train.py:1198] (1/2) Epoch 33, batch 950, loss[loss=0.1837, simple_loss=0.2425, pruned_loss=0.04562, ctc_loss=0.1014, cr_loss=0.3344, over 34711.00 frames. ], tot_loss[loss=0.2093, simple_loss=0.2647, pruned_loss=0.05699, ctc_loss=0.121, cr_loss=0.3946, over 6703398.25 frames. ], batch size: 87, lr: 3.63e-03, grad_scale: 16.0 2024-09-19 04:39:15,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583310.0, ans=0.1 2024-09-19 04:39:20,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=583356.6666666666, ans=0.125 2024-09-19 04:39:25,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583356.6666666666, ans=0.1 2024-09-19 04:39:55,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=583450.0, ans=0.0 2024-09-19 04:39:58,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583450.0, ans=0.1 2024-09-19 04:40:13,419 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 2.622e+02 3.229e+02 4.093e+02 9.316e+02, threshold=6.458e+02, percent-clipped=4.0 2024-09-19 04:40:15,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=583496.6666666666, ans=0.125 2024-09-19 04:40:20,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583496.6666666666, ans=0.1 2024-09-19 04:40:20,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=583496.6666666666, ans=0.125 2024-09-19 04:40:23,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583496.6666666666, ans=0.1 2024-09-19 04:40:26,365 INFO [train.py:1198] (1/2) Epoch 33, batch 1000, loss[loss=0.2112, simple_loss=0.2649, pruned_loss=0.05749, ctc_loss=0.1269, cr_loss=0.431, over 34506.00 frames. ], tot_loss[loss=0.21, simple_loss=0.2653, pruned_loss=0.05725, ctc_loss=0.1218, cr_loss=0.3958, over 6696601.79 frames. ], batch size: 90, lr: 3.63e-03, grad_scale: 16.0 2024-09-19 04:40:33,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=583543.3333333334, ans=0.125 2024-09-19 04:40:34,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2024-09-19 04:40:41,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=583543.3333333334, ans=0.125 2024-09-19 04:40:53,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=583590.0, ans=0.025 2024-09-19 04:41:05,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=583636.6666666666, ans=0.0 2024-09-19 04:41:07,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=583636.6666666666, ans=0.0 2024-09-19 04:41:15,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=583636.6666666666, ans=0.0 2024-09-19 04:41:45,254 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:41:48,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.71 vs. limit=12.0 2024-09-19 04:41:52,981 INFO [train.py:1198] (1/2) Epoch 33, batch 1050, loss[loss=0.2134, simple_loss=0.2726, pruned_loss=0.0567, ctc_loss=0.1237, cr_loss=0.4025, over 34563.00 frames. ], tot_loss[loss=0.2096, simple_loss=0.2647, pruned_loss=0.0572, ctc_loss=0.1215, cr_loss=0.3955, over 6706190.70 frames. ], batch size: 99, lr: 3.63e-03, grad_scale: 16.0 2024-09-19 04:41:54,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=583776.6666666666, ans=0.0 2024-09-19 04:42:07,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2024-09-19 04:42:10,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=583823.3333333334, ans=0.05 2024-09-19 04:42:18,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=583823.3333333334, ans=0.125 2024-09-19 04:42:29,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=583870.0, ans=0.125 2024-09-19 04:43:01,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=583963.3333333334, ans=0.5 2024-09-19 04:43:02,545 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 2.492e+02 2.801e+02 3.669e+02 5.626e+02, threshold=5.603e+02, percent-clipped=0.0 2024-09-19 04:43:15,961 INFO [train.py:1198] (1/2) Epoch 33, batch 1100, loss[loss=0.1924, simple_loss=0.2491, pruned_loss=0.0498, ctc_loss=0.1084, cr_loss=0.3619, over 34333.00 frames. ], tot_loss[loss=0.2093, simple_loss=0.2645, pruned_loss=0.057, ctc_loss=0.1213, cr_loss=0.3953, over 6717840.67 frames. ], batch size: 91, lr: 3.63e-03, grad_scale: 16.0 2024-09-19 04:43:22,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=584010.0, ans=0.0 2024-09-19 04:43:22,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=584010.0, ans=0.1 2024-09-19 04:44:06,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=584150.0, ans=0.125 2024-09-19 04:44:22,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=584196.6666666666, ans=0.0 2024-09-19 04:44:41,121 INFO [train.py:1198] (1/2) Epoch 33, batch 1150, loss[loss=0.1977, simple_loss=0.2542, pruned_loss=0.05161, ctc_loss=0.114, cr_loss=0.3778, over 34369.00 frames. ], tot_loss[loss=0.2094, simple_loss=0.2646, pruned_loss=0.05704, ctc_loss=0.1214, cr_loss=0.3952, over 6716725.92 frames. ], batch size: 91, lr: 3.63e-03, grad_scale: 16.0 2024-09-19 04:44:53,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=584243.3333333334, ans=0.125 2024-09-19 04:44:53,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=584243.3333333334, ans=0.2 2024-09-19 04:45:31,885 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:45:33,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=584383.3333333334, ans=0.0 2024-09-19 04:45:35,583 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.43 vs. limit=6.0 2024-09-19 04:45:40,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=584383.3333333334, ans=0.1 2024-09-19 04:45:45,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=584383.3333333334, ans=0.0 2024-09-19 04:45:52,894 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.534e+02 2.885e+02 3.561e+02 6.940e+02, threshold=5.771e+02, percent-clipped=1.0 2024-09-19 04:45:58,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=584430.0, ans=0.2 2024-09-19 04:45:59,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=584430.0, ans=0.025 2024-09-19 04:46:05,971 INFO [train.py:1198] (1/2) Epoch 33, batch 1200, loss[loss=0.2164, simple_loss=0.2704, pruned_loss=0.0603, ctc_loss=0.1276, cr_loss=0.4038, over 34575.00 frames. ], tot_loss[loss=0.2102, simple_loss=0.2654, pruned_loss=0.05731, ctc_loss=0.122, cr_loss=0.3967, over 6708423.80 frames. ], batch size: 99, lr: 3.63e-03, grad_scale: 32.0 2024-09-19 04:46:08,530 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.91 vs. limit=15.0 2024-09-19 04:46:22,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=584523.3333333334, ans=0.025 2024-09-19 04:46:29,622 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:47:00,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=584616.6666666666, ans=0.125 2024-09-19 04:47:25,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=584663.3333333334, ans=0.025 2024-09-19 04:47:28,587 INFO [train.py:1198] (1/2) Epoch 33, batch 1250, loss[loss=0.22, simple_loss=0.2744, pruned_loss=0.06181, ctc_loss=0.1266, cr_loss=0.4176, over 34328.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2659, pruned_loss=0.05747, ctc_loss=0.1222, cr_loss=0.3974, over 6742528.57 frames. ], batch size: 107, lr: 3.63e-03, grad_scale: 32.0 2024-09-19 04:47:31,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.83 vs. limit=15.0 2024-09-19 04:47:37,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=584710.0, ans=0.0 2024-09-19 04:47:45,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=584756.6666666666, ans=0.025 2024-09-19 04:47:57,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.05 vs. limit=15.0 2024-09-19 04:48:20,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=584850.0, ans=0.125 2024-09-19 04:48:39,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2024-09-19 04:48:40,577 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.184e+02 2.419e+02 2.798e+02 3.545e+02 5.436e+02, threshold=5.595e+02, percent-clipped=0.0 2024-09-19 04:48:50,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=584896.6666666666, ans=0.1 2024-09-19 04:48:53,685 INFO [train.py:1198] (1/2) Epoch 33, batch 1300, loss[loss=0.2182, simple_loss=0.2787, pruned_loss=0.0587, ctc_loss=0.1223, cr_loss=0.3978, over 33121.00 frames. ], tot_loss[loss=0.2102, simple_loss=0.2653, pruned_loss=0.05742, ctc_loss=0.122, cr_loss=0.3969, over 6744424.49 frames. ], batch size: 130, lr: 3.63e-03, grad_scale: 32.0 2024-09-19 04:49:23,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=584990.0, ans=0.125 2024-09-19 04:49:35,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=585036.6666666666, ans=0.1 2024-09-19 04:50:14,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2024-09-19 04:50:18,693 INFO [train.py:1198] (1/2) Epoch 33, batch 1350, loss[loss=0.23, simple_loss=0.2811, pruned_loss=0.06691, ctc_loss=0.1395, cr_loss=0.4321, over 34541.00 frames. ], tot_loss[loss=0.21, simple_loss=0.2652, pruned_loss=0.05728, ctc_loss=0.1219, cr_loss=0.3973, over 6763187.62 frames. ], batch size: 94, lr: 3.63e-03, grad_scale: 32.0 2024-09-19 04:50:25,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=585176.6666666666, ans=0.125 2024-09-19 04:50:35,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=585223.3333333334, ans=0.0 2024-09-19 04:50:42,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=585223.3333333334, ans=0.2 2024-09-19 04:51:12,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.91 vs. limit=10.0 2024-09-19 04:51:25,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.08 vs. limit=15.0 2024-09-19 04:51:28,251 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.084e+02 2.586e+02 2.985e+02 3.948e+02 7.307e+02, threshold=5.970e+02, percent-clipped=2.0 2024-09-19 04:51:28,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=585363.3333333334, ans=0.0 2024-09-19 04:51:41,533 INFO [train.py:1198] (1/2) Epoch 33, batch 1400, loss[loss=0.1845, simple_loss=0.2367, pruned_loss=0.04762, ctc_loss=0.108, cr_loss=0.3878, over 34276.00 frames. ], tot_loss[loss=0.2097, simple_loss=0.2649, pruned_loss=0.05715, ctc_loss=0.1216, cr_loss=0.3968, over 6775269.63 frames. ], batch size: 80, lr: 3.62e-03, grad_scale: 32.0 2024-09-19 04:51:51,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=585410.0, ans=0.0 2024-09-19 04:51:53,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=585410.0, ans=0.0 2024-09-19 04:51:55,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=585410.0, ans=0.025 2024-09-19 04:52:06,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2024-09-19 04:52:12,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.09 vs. limit=12.0 2024-09-19 04:52:37,649 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.81 vs. limit=15.0 2024-09-19 04:52:44,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=585550.0, ans=0.0 2024-09-19 04:53:00,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=585596.6666666666, ans=0.125 2024-09-19 04:53:08,039 INFO [train.py:1198] (1/2) Epoch 33, batch 1450, loss[loss=0.2176, simple_loss=0.2756, pruned_loss=0.05879, ctc_loss=0.1242, cr_loss=0.4286, over 34435.00 frames. ], tot_loss[loss=0.2098, simple_loss=0.2652, pruned_loss=0.05712, ctc_loss=0.1215, cr_loss=0.3966, over 6773263.79 frames. ], batch size: 110, lr: 3.62e-03, grad_scale: 32.0 2024-09-19 04:53:13,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=585643.3333333334, ans=0.0 2024-09-19 04:53:44,250 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 04:53:44,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=585736.6666666666, ans=0.125 2024-09-19 04:54:15,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=585830.0, ans=0.0 2024-09-19 04:54:16,260 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.141e+02 2.478e+02 2.799e+02 3.225e+02 4.164e+02, threshold=5.598e+02, percent-clipped=0.0 2024-09-19 04:54:29,754 INFO [train.py:1198] (1/2) Epoch 33, batch 1500, loss[loss=0.2155, simple_loss=0.2757, pruned_loss=0.05766, ctc_loss=0.1195, cr_loss=0.404, over 34471.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.2654, pruned_loss=0.05713, ctc_loss=0.1216, cr_loss=0.3972, over 6773402.67 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2024-09-19 04:54:42,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.78 vs. limit=6.0 2024-09-19 04:54:50,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=585923.3333333334, ans=0.125 2024-09-19 04:54:56,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=585923.3333333334, ans=0.2 2024-09-19 04:55:19,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=586016.6666666666, ans=0.0 2024-09-19 04:55:45,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=586063.3333333334, ans=0.125 2024-09-19 04:55:54,236 INFO [train.py:1198] (1/2) Epoch 33, batch 1550, loss[loss=0.2154, simple_loss=0.271, pruned_loss=0.05973, ctc_loss=0.1223, cr_loss=0.3986, over 34419.00 frames. ], tot_loss[loss=0.21, simple_loss=0.2652, pruned_loss=0.05724, ctc_loss=0.1218, cr_loss=0.3973, over 6746270.41 frames. ], batch size: 105, lr: 3.62e-03, grad_scale: 32.0 2024-09-19 04:55:57,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=586110.0, ans=0.1 2024-09-19 04:56:01,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=586110.0, ans=0.2 2024-09-19 04:56:43,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=586203.3333333334, ans=0.025 2024-09-19 04:57:06,024 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.058e+02 2.442e+02 2.918e+02 3.683e+02 6.609e+02, threshold=5.837e+02, percent-clipped=6.0 2024-09-19 04:57:19,330 INFO [train.py:1198] (1/2) Epoch 33, batch 1600, loss[loss=0.2128, simple_loss=0.2695, pruned_loss=0.05721, ctc_loss=0.125, cr_loss=0.4171, over 34568.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.2651, pruned_loss=0.05726, ctc_loss=0.1218, cr_loss=0.3971, over 6726066.69 frames. ], batch size: 99, lr: 3.62e-03, grad_scale: 32.0 2024-09-19 04:57:22,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=586343.3333333334, ans=0.125 2024-09-19 04:57:32,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=586343.3333333334, ans=0.04949747468305833 2024-09-19 04:57:41,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=586390.0, ans=0.125 2024-09-19 04:58:02,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=586436.6666666666, ans=0.125 2024-09-19 04:58:15,074 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.48 vs. limit=12.0 2024-09-19 04:58:32,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=586530.0, ans=0.125 2024-09-19 04:58:34,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=586530.0, ans=0.125 2024-09-19 04:58:42,159 INFO [train.py:1198] (1/2) Epoch 33, batch 1650, loss[loss=0.2228, simple_loss=0.283, pruned_loss=0.06, ctc_loss=0.1269, cr_loss=0.4305, over 34393.00 frames. ], tot_loss[loss=0.2096, simple_loss=0.2649, pruned_loss=0.05709, ctc_loss=0.1215, cr_loss=0.3958, over 6719275.87 frames. ], batch size: 103, lr: 3.62e-03, grad_scale: 32.0 2024-09-19 04:59:27,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=586670.0, ans=0.09899494936611666 2024-09-19 04:59:49,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=586763.3333333334, ans=0.125 2024-09-19 04:59:53,582 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.153e+02 2.511e+02 2.856e+02 3.470e+02 6.657e+02, threshold=5.712e+02, percent-clipped=2.0 2024-09-19 04:59:54,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=586763.3333333334, ans=0.1 2024-09-19 05:00:03,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=586763.3333333334, ans=0.0 2024-09-19 05:00:06,678 INFO [train.py:1198] (1/2) Epoch 33, batch 1700, loss[loss=0.1781, simple_loss=0.2344, pruned_loss=0.04461, ctc_loss=0.09547, cr_loss=0.3339, over 34320.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.2646, pruned_loss=0.05685, ctc_loss=0.1211, cr_loss=0.3954, over 6745708.89 frames. ], batch size: 80, lr: 3.62e-03, grad_scale: 32.0 2024-09-19 05:00:16,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=586810.0, ans=0.125 2024-09-19 05:00:33,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=586856.6666666666, ans=0.125 2024-09-19 05:00:56,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=586950.0, ans=0.09899494936611666 2024-09-19 05:01:00,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.25 vs. limit=15.0 2024-09-19 05:01:12,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=586950.0, ans=22.5 2024-09-19 05:01:31,000 INFO [train.py:1198] (1/2) Epoch 33, batch 1750, loss[loss=0.1912, simple_loss=0.2435, pruned_loss=0.05147, ctc_loss=0.1077, cr_loss=0.3594, over 34145.00 frames. ], tot_loss[loss=0.2093, simple_loss=0.2646, pruned_loss=0.05693, ctc_loss=0.1212, cr_loss=0.3956, over 6755884.41 frames. ], batch size: 78, lr: 3.62e-03, grad_scale: 32.0 2024-09-19 05:01:31,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.06 vs. limit=10.0 2024-09-19 05:02:01,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=587090.0, ans=0.125 2024-09-19 05:02:16,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=22.5 2024-09-19 05:02:22,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=587183.3333333334, ans=0.125 2024-09-19 05:02:40,035 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.065e+02 2.515e+02 2.928e+02 3.639e+02 6.523e+02, threshold=5.856e+02, percent-clipped=2.0 2024-09-19 05:02:41,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=587230.0, ans=0.2 2024-09-19 05:02:45,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=587230.0, ans=0.0 2024-09-19 05:02:53,090 INFO [train.py:1198] (1/2) Epoch 33, batch 1800, loss[loss=0.2177, simple_loss=0.2767, pruned_loss=0.05865, ctc_loss=0.1255, cr_loss=0.4047, over 34696.00 frames. ], tot_loss[loss=0.2096, simple_loss=0.2648, pruned_loss=0.05707, ctc_loss=0.1215, cr_loss=0.3961, over 6758936.24 frames. ], batch size: 97, lr: 3.62e-03, grad_scale: 32.0 2024-09-19 05:02:53,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=587276.6666666666, ans=0.0 2024-09-19 05:03:05,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=587276.6666666666, ans=0.0 2024-09-19 05:03:16,723 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:03:21,914 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=4.825e-02 2024-09-19 05:03:51,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=587416.6666666666, ans=0.125 2024-09-19 05:04:19,755 INFO [train.py:1198] (1/2) Epoch 33, batch 1850, loss[loss=0.2063, simple_loss=0.2671, pruned_loss=0.05338, ctc_loss=0.1169, cr_loss=0.3829, over 34465.00 frames. ], tot_loss[loss=0.2094, simple_loss=0.2648, pruned_loss=0.05697, ctc_loss=0.1213, cr_loss=0.3963, over 6766219.38 frames. ], batch size: 100, lr: 3.62e-03, grad_scale: 32.0 2024-09-19 05:04:37,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=587556.6666666666, ans=0.1 2024-09-19 05:04:48,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2024-09-19 05:05:10,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=587650.0, ans=0.025 2024-09-19 05:05:28,304 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.149e+02 2.850e+02 3.291e+02 4.333e+02 6.553e+02, threshold=6.582e+02, percent-clipped=3.0 2024-09-19 05:05:31,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=587696.6666666666, ans=0.0 2024-09-19 05:05:34,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=12.0 2024-09-19 05:05:41,250 INFO [train.py:1198] (1/2) Epoch 33, batch 1900, loss[loss=0.2344, simple_loss=0.2912, pruned_loss=0.06621, ctc_loss=0.1391, cr_loss=0.4372, over 34386.00 frames. ], tot_loss[loss=0.2105, simple_loss=0.2658, pruned_loss=0.05743, ctc_loss=0.1222, cr_loss=0.3981, over 6774792.93 frames. ], batch size: 103, lr: 3.62e-03, grad_scale: 32.0 2024-09-19 05:05:54,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=587743.3333333334, ans=0.025 2024-09-19 05:05:56,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-19 05:06:15,991 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=587836.6666666666, ans=0.125 2024-09-19 05:06:33,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=587883.3333333334, ans=0.125 2024-09-19 05:06:50,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=587930.0, ans=0.125 2024-09-19 05:06:52,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.72 vs. limit=6.0 2024-09-19 05:07:00,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=587930.0, ans=0.0 2024-09-19 05:07:03,293 INFO [train.py:1198] (1/2) Epoch 33, batch 1950, loss[loss=0.1985, simple_loss=0.2546, pruned_loss=0.05247, ctc_loss=0.1112, cr_loss=0.3809, over 34350.00 frames. ], tot_loss[loss=0.2109, simple_loss=0.2665, pruned_loss=0.05747, ctc_loss=0.1223, cr_loss=0.3992, over 6791280.02 frames. ], batch size: 91, lr: 3.62e-03, grad_scale: 32.0 2024-09-19 05:07:16,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=587976.6666666666, ans=0.2 2024-09-19 05:07:47,646 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=7.89 vs. limit=15.0 2024-09-19 05:07:52,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=588070.0, ans=0.1 2024-09-19 05:08:03,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=588116.6666666666, ans=0.025 2024-09-19 05:08:11,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=588163.3333333334, ans=0.125 2024-09-19 05:08:15,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=588163.3333333334, ans=0.0 2024-09-19 05:08:16,470 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.105e+02 2.385e+02 2.700e+02 3.288e+02 4.913e+02, threshold=5.400e+02, percent-clipped=0.0 2024-09-19 05:08:18,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=588163.3333333334, ans=0.0 2024-09-19 05:08:29,626 INFO [train.py:1198] (1/2) Epoch 33, batch 2000, loss[loss=0.1786, simple_loss=0.2319, pruned_loss=0.04594, ctc_loss=0.09998, cr_loss=0.336, over 34169.00 frames. ], tot_loss[loss=0.2114, simple_loss=0.2669, pruned_loss=0.05771, ctc_loss=0.1227, cr_loss=0.3994, over 6766725.71 frames. ], batch size: 78, lr: 3.62e-03, grad_scale: 32.0 2024-09-19 05:08:38,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=588210.0, ans=0.1 2024-09-19 05:08:39,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.90 vs. limit=10.0 2024-09-19 05:09:01,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=588303.3333333334, ans=0.2 2024-09-19 05:09:06,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=588303.3333333334, ans=0.125 2024-09-19 05:09:08,626 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=588303.3333333334, ans=0.125 2024-09-19 05:09:28,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.94 vs. limit=15.0 2024-09-19 05:09:35,335 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.86 vs. limit=15.0 2024-09-19 05:09:49,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=588396.6666666666, ans=0.0 2024-09-19 05:09:52,330 INFO [train.py:1198] (1/2) Epoch 33, batch 2050, loss[loss=0.1841, simple_loss=0.2398, pruned_loss=0.04737, ctc_loss=0.1013, cr_loss=0.3358, over 34459.00 frames. ], tot_loss[loss=0.2105, simple_loss=0.266, pruned_loss=0.05736, ctc_loss=0.1219, cr_loss=0.3974, over 6757617.20 frames. ], batch size: 82, lr: 3.62e-03, grad_scale: 32.0 2024-09-19 05:09:55,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=588443.3333333334, ans=0.2 2024-09-19 05:10:14,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=588490.0, ans=0.1 2024-09-19 05:10:17,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=588490.0, ans=0.09899494936611666 2024-09-19 05:10:27,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=588536.6666666666, ans=0.125 2024-09-19 05:10:35,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=588536.6666666666, ans=0.0 2024-09-19 05:10:45,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=588583.3333333334, ans=0.0 2024-09-19 05:10:50,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=588583.3333333334, ans=0.125 2024-09-19 05:11:00,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=588630.0, ans=0.125 2024-09-19 05:11:03,694 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.052e+02 2.442e+02 2.736e+02 3.247e+02 6.101e+02, threshold=5.472e+02, percent-clipped=3.0 2024-09-19 05:11:16,838 INFO [train.py:1198] (1/2) Epoch 33, batch 2100, loss[loss=0.2084, simple_loss=0.2656, pruned_loss=0.05552, ctc_loss=0.1196, cr_loss=0.4087, over 34527.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.2654, pruned_loss=0.05711, ctc_loss=0.1214, cr_loss=0.3967, over 6771381.82 frames. ], batch size: 94, lr: 3.61e-03, grad_scale: 32.0 2024-09-19 05:11:35,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.39 vs. limit=15.0 2024-09-19 05:11:41,785 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:12:04,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=15.45 vs. limit=15.0 2024-09-19 05:12:30,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=588863.3333333334, ans=0.125 2024-09-19 05:12:38,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=588910.0, ans=0.0 2024-09-19 05:12:40,301 INFO [train.py:1198] (1/2) Epoch 33, batch 2150, loss[loss=0.1989, simple_loss=0.255, pruned_loss=0.05244, ctc_loss=0.1134, cr_loss=0.3825, over 34345.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.2648, pruned_loss=0.05685, ctc_loss=0.1209, cr_loss=0.3956, over 6789256.90 frames. ], batch size: 91, lr: 3.61e-03, grad_scale: 32.0 2024-09-19 05:12:48,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2024-09-19 05:13:07,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=588956.6666666666, ans=0.2 2024-09-19 05:13:20,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=589003.3333333334, ans=0.0 2024-09-19 05:13:23,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=589003.3333333334, ans=0.09899494936611666 2024-09-19 05:13:25,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=589003.3333333334, ans=0.1 2024-09-19 05:13:35,898 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:13:50,376 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.077e+02 2.438e+02 2.875e+02 3.967e+02 7.571e+02, threshold=5.749e+02, percent-clipped=9.0 2024-09-19 05:13:54,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=589096.6666666666, ans=0.0 2024-09-19 05:14:02,360 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:14:03,499 INFO [train.py:1198] (1/2) Epoch 33, batch 2200, loss[loss=0.2214, simple_loss=0.2784, pruned_loss=0.06175, ctc_loss=0.1251, cr_loss=0.3987, over 34431.00 frames. ], tot_loss[loss=0.2095, simple_loss=0.2647, pruned_loss=0.05704, ctc_loss=0.1213, cr_loss=0.3963, over 6784784.77 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2024-09-19 05:14:23,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=589190.0, ans=0.0 2024-09-19 05:14:28,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589190.0, ans=0.1 2024-09-19 05:14:30,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=589190.0, ans=0.125 2024-09-19 05:14:54,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.56 vs. limit=15.0 2024-09-19 05:15:03,879 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.42 vs. limit=12.0 2024-09-19 05:15:11,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=589330.0, ans=0.1 2024-09-19 05:15:21,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2024-09-19 05:15:21,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=589330.0, ans=0.125 2024-09-19 05:15:28,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=589376.6666666666, ans=0.125 2024-09-19 05:15:29,716 INFO [train.py:1198] (1/2) Epoch 33, batch 2250, loss[loss=0.2052, simple_loss=0.2639, pruned_loss=0.05391, ctc_loss=0.1162, cr_loss=0.3833, over 34432.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.2647, pruned_loss=0.05686, ctc_loss=0.1209, cr_loss=0.3952, over 6781823.57 frames. ], batch size: 95, lr: 3.61e-03, grad_scale: 32.0 2024-09-19 05:15:30,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=589376.6666666666, ans=0.2 2024-09-19 05:15:43,860 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.69 vs. limit=12.0 2024-09-19 05:15:54,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589423.3333333334, ans=0.1 2024-09-19 05:15:56,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=589423.3333333334, ans=0.0 2024-09-19 05:16:01,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=589470.0, ans=0.025 2024-09-19 05:16:12,006 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.65 vs. limit=15.0 2024-09-19 05:16:17,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=589516.6666666666, ans=0.125 2024-09-19 05:16:20,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=22.5 2024-09-19 05:16:20,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=589516.6666666666, ans=0.125 2024-09-19 05:16:38,761 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.987e+02 2.578e+02 3.006e+02 3.671e+02 6.569e+02, threshold=6.012e+02, percent-clipped=1.0 2024-09-19 05:16:39,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=589563.3333333334, ans=0.1 2024-09-19 05:16:48,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=589563.3333333334, ans=0.07 2024-09-19 05:16:49,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=589563.3333333334, ans=0.125 2024-09-19 05:16:51,878 INFO [train.py:1198] (1/2) Epoch 33, batch 2300, loss[loss=0.1852, simple_loss=0.2374, pruned_loss=0.04893, ctc_loss=0.1054, cr_loss=0.3544, over 34261.00 frames. ], tot_loss[loss=0.2085, simple_loss=0.2638, pruned_loss=0.05665, ctc_loss=0.1205, cr_loss=0.3935, over 6765901.56 frames. ], batch size: 83, lr: 3.61e-03, grad_scale: 32.0 2024-09-19 05:16:55,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=589610.0, ans=0.125 2024-09-19 05:17:12,154 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-09-19 05:17:18,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=589656.6666666666, ans=0.2 2024-09-19 05:17:24,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=589703.3333333334, ans=0.125 2024-09-19 05:17:39,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=589750.0, ans=0.125 2024-09-19 05:17:53,057 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:18:12,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=589843.3333333334, ans=0.0 2024-09-19 05:18:12,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=589843.3333333334, ans=0.0 2024-09-19 05:18:13,871 INFO [train.py:1198] (1/2) Epoch 33, batch 2350, loss[loss=0.2186, simple_loss=0.2734, pruned_loss=0.06088, ctc_loss=0.1272, cr_loss=0.4129, over 34690.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2641, pruned_loss=0.05691, ctc_loss=0.1209, cr_loss=0.3943, over 6771270.93 frames. ], batch size: 97, lr: 3.61e-03, grad_scale: 32.0 2024-09-19 05:18:22,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=589843.3333333334, ans=0.125 2024-09-19 05:18:29,797 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.50 vs. limit=15.0 2024-09-19 05:19:12,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=589983.3333333334, ans=0.125 2024-09-19 05:19:27,367 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.076e+02 2.470e+02 2.817e+02 3.333e+02 6.066e+02, threshold=5.634e+02, percent-clipped=1.0 2024-09-19 05:19:37,640 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=590030.0, ans=0.0 2024-09-19 05:19:40,672 INFO [train.py:1198] (1/2) Epoch 33, batch 2400, loss[loss=0.203, simple_loss=0.254, pruned_loss=0.05664, ctc_loss=0.1173, cr_loss=0.3808, over 34590.00 frames. ], tot_loss[loss=0.2094, simple_loss=0.2647, pruned_loss=0.05701, ctc_loss=0.1212, cr_loss=0.3954, over 6775995.70 frames. ], batch size: 89, lr: 3.61e-03, grad_scale: 32.0 2024-09-19 05:20:01,170 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:20:08,282 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.29 vs. limit=15.0 2024-09-19 05:20:08,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=22.5 2024-09-19 05:20:19,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.52 vs. limit=15.0 2024-09-19 05:20:29,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=590216.6666666666, ans=10.0 2024-09-19 05:20:32,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=590216.6666666666, ans=0.125 2024-09-19 05:20:32,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=590216.6666666666, ans=0.125 2024-09-19 05:21:03,805 INFO [train.py:1198] (1/2) Epoch 33, batch 2450, loss[loss=0.2066, simple_loss=0.2616, pruned_loss=0.05617, ctc_loss=0.1173, cr_loss=0.3932, over 34411.00 frames. ], tot_loss[loss=0.2105, simple_loss=0.2657, pruned_loss=0.05749, ctc_loss=0.1221, cr_loss=0.3972, over 6749248.00 frames. ], batch size: 95, lr: 3.61e-03, grad_scale: 32.0 2024-09-19 05:21:08,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.45 vs. limit=15.0 2024-09-19 05:21:18,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=590356.6666666666, ans=0.125 2024-09-19 05:21:28,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=590356.6666666666, ans=0.1 2024-09-19 05:22:08,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=590496.6666666666, ans=0.125 2024-09-19 05:22:10,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=22.5 2024-09-19 05:22:13,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=590496.6666666666, ans=0.025 2024-09-19 05:22:14,226 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.209e+02 2.496e+02 2.907e+02 3.714e+02 5.859e+02, threshold=5.813e+02, percent-clipped=1.0 2024-09-19 05:22:19,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=590496.6666666666, ans=0.125 2024-09-19 05:22:23,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2024-09-19 05:22:25,832 INFO [train.py:1198] (1/2) Epoch 33, batch 2500, loss[loss=0.2209, simple_loss=0.2781, pruned_loss=0.06059, ctc_loss=0.1305, cr_loss=0.4117, over 34465.00 frames. ], tot_loss[loss=0.2103, simple_loss=0.2655, pruned_loss=0.05742, ctc_loss=0.122, cr_loss=0.3973, over 6762232.64 frames. ], batch size: 100, lr: 3.61e-03, grad_scale: 32.0 2024-09-19 05:22:58,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.55 vs. limit=6.0 2024-09-19 05:23:07,136 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.01 vs. limit=22.5 2024-09-19 05:23:08,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=590636.6666666666, ans=0.125 2024-09-19 05:23:11,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=590636.6666666666, ans=0.125 2024-09-19 05:23:21,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=590683.3333333334, ans=0.0 2024-09-19 05:23:46,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=590730.0, ans=0.125 2024-09-19 05:23:52,332 INFO [train.py:1198] (1/2) Epoch 33, batch 2550, loss[loss=0.176, simple_loss=0.2344, pruned_loss=0.04252, ctc_loss=0.09639, cr_loss=0.3336, over 34133.00 frames. ], tot_loss[loss=0.2102, simple_loss=0.2653, pruned_loss=0.05738, ctc_loss=0.1219, cr_loss=0.3972, over 6765710.75 frames. ], batch size: 78, lr: 3.61e-03, grad_scale: 32.0 2024-09-19 05:24:43,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=590916.6666666666, ans=0.025 2024-09-19 05:25:00,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=590963.3333333334, ans=0.125 2024-09-19 05:25:03,537 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.957e+02 2.492e+02 2.861e+02 3.738e+02 6.501e+02, threshold=5.723e+02, percent-clipped=2.0 2024-09-19 05:25:14,964 INFO [train.py:1198] (1/2) Epoch 33, batch 2600, loss[loss=0.2139, simple_loss=0.2651, pruned_loss=0.06052, ctc_loss=0.1273, cr_loss=0.406, over 34357.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2657, pruned_loss=0.05753, ctc_loss=0.1222, cr_loss=0.3977, over 6761489.06 frames. ], batch size: 91, lr: 3.61e-03, grad_scale: 32.0 2024-09-19 05:25:24,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=22.5 2024-09-19 05:26:40,661 INFO [train.py:1198] (1/2) Epoch 33, batch 2650, loss[loss=0.224, simple_loss=0.2813, pruned_loss=0.06207, ctc_loss=0.1291, cr_loss=0.42, over 34233.00 frames. ], tot_loss[loss=0.2109, simple_loss=0.266, pruned_loss=0.05765, ctc_loss=0.1224, cr_loss=0.3981, over 6768667.45 frames. ], batch size: 117, lr: 3.61e-03, grad_scale: 32.0 2024-09-19 05:26:47,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=591243.3333333334, ans=0.125 2024-09-19 05:26:47,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=591243.3333333334, ans=0.125 2024-09-19 05:26:58,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.26 vs. limit=10.0 2024-09-19 05:27:09,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=22.5 2024-09-19 05:27:18,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=591336.6666666666, ans=0.125 2024-09-19 05:27:25,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=591336.6666666666, ans=0.2 2024-09-19 05:27:36,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=591383.3333333334, ans=0.0 2024-09-19 05:27:38,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=591383.3333333334, ans=0.0 2024-09-19 05:27:51,186 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.140e+02 2.515e+02 3.040e+02 3.809e+02 6.229e+02, threshold=6.081e+02, percent-clipped=2.0 2024-09-19 05:27:51,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=591430.0, ans=0.125 2024-09-19 05:28:00,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.87 vs. limit=15.0 2024-09-19 05:28:02,515 INFO [train.py:1198] (1/2) Epoch 33, batch 2700, loss[loss=0.2066, simple_loss=0.2696, pruned_loss=0.05289, ctc_loss=0.1145, cr_loss=0.3715, over 34626.00 frames. ], tot_loss[loss=0.2114, simple_loss=0.2665, pruned_loss=0.05788, ctc_loss=0.1228, cr_loss=0.399, over 6763541.75 frames. ], batch size: 102, lr: 3.61e-03, grad_scale: 32.0 2024-09-19 05:28:28,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=15.0 2024-09-19 05:28:39,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=591570.0, ans=0.125 2024-09-19 05:28:45,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=591570.0, ans=0.0 2024-09-19 05:29:15,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=591663.3333333334, ans=0.025 2024-09-19 05:29:25,247 INFO [train.py:1198] (1/2) Epoch 33, batch 2750, loss[loss=0.2031, simple_loss=0.2528, pruned_loss=0.05689, ctc_loss=0.1193, cr_loss=0.393, over 34607.00 frames. ], tot_loss[loss=0.2102, simple_loss=0.2653, pruned_loss=0.0574, ctc_loss=0.122, cr_loss=0.397, over 6761032.19 frames. ], batch size: 88, lr: 3.61e-03, grad_scale: 32.0 2024-09-19 05:29:25,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=591710.0, ans=0.125 2024-09-19 05:29:39,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=591710.0, ans=0.125 2024-09-19 05:29:44,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=591756.6666666666, ans=0.1 2024-09-19 05:29:46,282 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2024-09-19 05:30:09,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=591803.3333333334, ans=0.09899494936611666 2024-09-19 05:30:32,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=591896.6666666666, ans=0.125 2024-09-19 05:30:38,846 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.047e+02 2.522e+02 3.056e+02 3.886e+02 6.360e+02, threshold=6.112e+02, percent-clipped=2.0 2024-09-19 05:30:50,504 INFO [train.py:1198] (1/2) Epoch 33, batch 2800, loss[loss=0.234, simple_loss=0.2788, pruned_loss=0.07127, ctc_loss=0.1499, cr_loss=0.416, over 23616.00 frames. ], tot_loss[loss=0.2105, simple_loss=0.2655, pruned_loss=0.05758, ctc_loss=0.1223, cr_loss=0.3972, over 6737718.48 frames. ], batch size: 245, lr: 3.60e-03, grad_scale: 32.0 2024-09-19 05:30:55,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=591943.3333333334, ans=0.025 2024-09-19 05:31:17,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.76 vs. limit=15.0 2024-09-19 05:31:18,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=591990.0, ans=0.0 2024-09-19 05:31:48,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=592083.3333333334, ans=0.125 2024-09-19 05:31:48,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=592083.3333333334, ans=0.1 2024-09-19 05:31:49,108 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.83 vs. limit=15.0 2024-09-19 05:32:06,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=592130.0, ans=0.125 2024-09-19 05:32:08,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=592130.0, ans=0.0 2024-09-19 05:32:12,640 INFO [train.py:1198] (1/2) Epoch 33, batch 2850, loss[loss=0.1952, simple_loss=0.2481, pruned_loss=0.05269, ctc_loss=0.1099, cr_loss=0.3744, over 34465.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.2656, pruned_loss=0.05749, ctc_loss=0.1221, cr_loss=0.3969, over 6721707.26 frames. ], batch size: 90, lr: 3.60e-03, grad_scale: 16.0 2024-09-19 05:32:14,846 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:32:31,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=592223.3333333334, ans=0.125 2024-09-19 05:32:36,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=592223.3333333334, ans=0.0 2024-09-19 05:32:49,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=592270.0, ans=0.025 2024-09-19 05:32:59,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=592270.0, ans=0.0 2024-09-19 05:33:14,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.30 vs. limit=15.0 2024-09-19 05:33:19,128 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:33:25,429 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.431e+02 2.934e+02 4.013e+02 7.259e+02, threshold=5.868e+02, percent-clipped=1.0 2024-09-19 05:33:28,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=592363.3333333334, ans=0.125 2024-09-19 05:33:35,169 INFO [train.py:1198] (1/2) Epoch 33, batch 2900, loss[loss=0.2084, simple_loss=0.2627, pruned_loss=0.05723, ctc_loss=0.1201, cr_loss=0.3886, over 34535.00 frames. ], tot_loss[loss=0.2117, simple_loss=0.267, pruned_loss=0.0579, ctc_loss=0.1228, cr_loss=0.3988, over 6752933.90 frames. ], batch size: 94, lr: 3.60e-03, grad_scale: 16.0 2024-09-19 05:33:43,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=592410.0, ans=0.125 2024-09-19 05:33:55,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=592456.6666666666, ans=0.2 2024-09-19 05:34:40,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=592550.0, ans=0.0 2024-09-19 05:34:54,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.19 vs. limit=15.0 2024-09-19 05:35:01,965 INFO [train.py:1198] (1/2) Epoch 33, batch 2950, loss[loss=0.1982, simple_loss=0.2485, pruned_loss=0.0547, ctc_loss=0.1169, cr_loss=0.3788, over 34622.00 frames. ], tot_loss[loss=0.2103, simple_loss=0.2656, pruned_loss=0.05742, ctc_loss=0.1218, cr_loss=0.3966, over 6748892.11 frames. ], batch size: 88, lr: 3.60e-03, grad_scale: 16.0 2024-09-19 05:35:10,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=592643.3333333334, ans=0.125 2024-09-19 05:35:19,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=592690.0, ans=0.0 2024-09-19 05:35:22,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=592690.0, ans=0.2 2024-09-19 05:35:49,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=592736.6666666666, ans=0.0 2024-09-19 05:35:55,810 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:36:02,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=592783.3333333334, ans=0.2 2024-09-19 05:36:15,306 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.142e+02 2.607e+02 3.017e+02 4.340e+02 6.212e+02, threshold=6.034e+02, percent-clipped=2.0 2024-09-19 05:36:25,355 INFO [train.py:1198] (1/2) Epoch 33, batch 3000, loss[loss=0.2004, simple_loss=0.2597, pruned_loss=0.05143, ctc_loss=0.1139, cr_loss=0.3845, over 34547.00 frames. ], tot_loss[loss=0.21, simple_loss=0.2654, pruned_loss=0.05722, ctc_loss=0.1215, cr_loss=0.3958, over 6748998.41 frames. ], batch size: 94, lr: 3.60e-03, grad_scale: 16.0 2024-09-19 05:36:25,356 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 05:36:42,255 INFO [train.py:1230] (1/2) Epoch 33, validation: loss=0.1489, simple_loss=0.2432, pruned_loss=0.02331, ctc_loss=0.03965, cr_loss=2.07e-14, over 944034.00 frames. 2024-09-19 05:36:42,255 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 05:36:56,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=592876.6666666666, ans=0.0 2024-09-19 05:37:08,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.35 vs. limit=15.0 2024-09-19 05:37:14,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=592970.0, ans=0.05 2024-09-19 05:37:28,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=592970.0, ans=0.0 2024-09-19 05:38:07,955 INFO [train.py:1198] (1/2) Epoch 33, batch 3050, loss[loss=0.2019, simple_loss=0.2531, pruned_loss=0.05555, ctc_loss=0.1192, cr_loss=0.396, over 34589.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2659, pruned_loss=0.0575, ctc_loss=0.1219, cr_loss=0.3966, over 6743520.77 frames. ], batch size: 89, lr: 3.60e-03, grad_scale: 16.0 2024-09-19 05:38:21,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=593110.0, ans=0.2 2024-09-19 05:38:40,680 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:39:02,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=593250.0, ans=0.1 2024-09-19 05:39:15,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=593296.6666666666, ans=0.125 2024-09-19 05:39:18,866 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.016e+02 2.445e+02 2.865e+02 3.554e+02 6.242e+02, threshold=5.730e+02, percent-clipped=1.0 2024-09-19 05:39:28,615 INFO [train.py:1198] (1/2) Epoch 33, batch 3100, loss[loss=0.2181, simple_loss=0.2776, pruned_loss=0.05887, ctc_loss=0.1259, cr_loss=0.3923, over 34225.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2658, pruned_loss=0.05755, ctc_loss=0.1222, cr_loss=0.3974, over 6741811.65 frames. ], batch size: 117, lr: 3.60e-03, grad_scale: 16.0 2024-09-19 05:39:43,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.94 vs. limit=22.5 2024-09-19 05:39:57,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=593390.0, ans=0.1 2024-09-19 05:40:27,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=593483.3333333334, ans=0.125 2024-09-19 05:40:43,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=593530.0, ans=0.125 2024-09-19 05:40:45,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=593530.0, ans=0.2 2024-09-19 05:40:49,709 INFO [train.py:1198] (1/2) Epoch 33, batch 3150, loss[loss=0.2182, simple_loss=0.2766, pruned_loss=0.0591, ctc_loss=0.1271, cr_loss=0.4026, over 33813.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.2657, pruned_loss=0.05743, ctc_loss=0.122, cr_loss=0.397, over 6749101.08 frames. ], batch size: 122, lr: 3.60e-03, grad_scale: 16.0 2024-09-19 05:41:00,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=593576.6666666666, ans=10.0 2024-09-19 05:41:00,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=593576.6666666666, ans=0.125 2024-09-19 05:41:10,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=593623.3333333334, ans=0.025 2024-09-19 05:41:14,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=593623.3333333334, ans=0.125 2024-09-19 05:41:21,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=593670.0, ans=0.125 2024-09-19 05:41:35,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=593670.0, ans=0.1 2024-09-19 05:41:47,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.69 vs. limit=15.0 2024-09-19 05:42:01,390 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.516e+02 2.945e+02 3.743e+02 6.919e+02, threshold=5.890e+02, percent-clipped=4.0 2024-09-19 05:42:11,232 INFO [train.py:1198] (1/2) Epoch 33, batch 3200, loss[loss=0.2108, simple_loss=0.2675, pruned_loss=0.05712, ctc_loss=0.1217, cr_loss=0.3902, over 34515.00 frames. ], tot_loss[loss=0.2098, simple_loss=0.2652, pruned_loss=0.05718, ctc_loss=0.1215, cr_loss=0.396, over 6762486.23 frames. ], batch size: 94, lr: 3.60e-03, grad_scale: 32.0 2024-09-19 05:42:19,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=593810.0, ans=0.0 2024-09-19 05:42:41,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=593856.6666666666, ans=0.2 2024-09-19 05:42:49,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=593903.3333333334, ans=0.05 2024-09-19 05:42:50,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=593903.3333333334, ans=0.125 2024-09-19 05:43:02,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=593950.0, ans=0.0 2024-09-19 05:43:05,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2024-09-19 05:43:11,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=593950.0, ans=0.125 2024-09-19 05:43:32,514 INFO [train.py:1198] (1/2) Epoch 33, batch 3250, loss[loss=0.2219, simple_loss=0.2765, pruned_loss=0.06235, ctc_loss=0.1296, cr_loss=0.4148, over 34662.00 frames. ], tot_loss[loss=0.2101, simple_loss=0.2657, pruned_loss=0.05721, ctc_loss=0.1217, cr_loss=0.3965, over 6771113.96 frames. ], batch size: 98, lr: 3.60e-03, grad_scale: 32.0 2024-09-19 05:43:42,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=594043.3333333334, ans=0.125 2024-09-19 05:44:02,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.35 vs. limit=15.0 2024-09-19 05:44:07,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.75 vs. limit=15.0 2024-09-19 05:44:14,749 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.07 vs. limit=15.0 2024-09-19 05:44:25,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=594183.3333333334, ans=0.1 2024-09-19 05:44:27,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=594183.3333333334, ans=0.125 2024-09-19 05:44:34,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2024-09-19 05:44:41,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=594230.0, ans=0.0 2024-09-19 05:44:46,457 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.067e+02 2.439e+02 2.721e+02 3.306e+02 7.117e+02, threshold=5.441e+02, percent-clipped=2.0 2024-09-19 05:44:56,349 INFO [train.py:1198] (1/2) Epoch 33, batch 3300, loss[loss=0.2146, simple_loss=0.2726, pruned_loss=0.05795, ctc_loss=0.1236, cr_loss=0.3982, over 33115.00 frames. ], tot_loss[loss=0.2087, simple_loss=0.2643, pruned_loss=0.0566, ctc_loss=0.1205, cr_loss=0.394, over 6769371.71 frames. ], batch size: 130, lr: 3.60e-03, grad_scale: 32.0 2024-09-19 05:45:09,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=594276.6666666666, ans=0.0 2024-09-19 05:45:24,223 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:45:40,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=594370.0, ans=0.0 2024-09-19 05:45:40,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=594370.0, ans=0.1 2024-09-19 05:45:41,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=594370.0, ans=0.125 2024-09-19 05:45:43,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.45 vs. limit=15.0 2024-09-19 05:45:51,590 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:46:11,288 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.47 vs. limit=22.5 2024-09-19 05:46:16,939 INFO [train.py:1198] (1/2) Epoch 33, batch 3350, loss[loss=0.2187, simple_loss=0.278, pruned_loss=0.059, ctc_loss=0.1243, cr_loss=0.4152, over 33707.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.2647, pruned_loss=0.05684, ctc_loss=0.121, cr_loss=0.3951, over 6743363.88 frames. ], batch size: 122, lr: 3.60e-03, grad_scale: 32.0 2024-09-19 05:46:22,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=594510.0, ans=0.2 2024-09-19 05:46:28,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=594510.0, ans=0.0 2024-09-19 05:46:29,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.87 vs. limit=15.0 2024-09-19 05:46:32,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=594556.6666666666, ans=0.125 2024-09-19 05:46:41,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=594556.6666666666, ans=0.0 2024-09-19 05:46:51,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=594603.3333333334, ans=0.1 2024-09-19 05:46:59,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=594603.3333333334, ans=0.125 2024-09-19 05:47:04,452 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:47:07,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=594650.0, ans=0.025 2024-09-19 05:47:22,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=594696.6666666666, ans=0.0 2024-09-19 05:47:28,305 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.060e+02 2.487e+02 2.747e+02 3.228e+02 5.309e+02, threshold=5.494e+02, percent-clipped=0.0 2024-09-19 05:47:28,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=594696.6666666666, ans=0.125 2024-09-19 05:47:38,129 INFO [train.py:1198] (1/2) Epoch 33, batch 3400, loss[loss=0.1751, simple_loss=0.2294, pruned_loss=0.04453, ctc_loss=0.09503, cr_loss=0.319, over 34175.00 frames. ], tot_loss[loss=0.2095, simple_loss=0.2648, pruned_loss=0.05709, ctc_loss=0.1214, cr_loss=0.3956, over 6733565.68 frames. ], batch size: 78, lr: 3.60e-03, grad_scale: 32.0 2024-09-19 05:47:54,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=594790.0, ans=0.07 2024-09-19 05:48:04,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2024-09-19 05:48:27,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=594883.3333333334, ans=0.125 2024-09-19 05:48:34,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=594883.3333333334, ans=0.09899494936611666 2024-09-19 05:48:57,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=594930.0, ans=0.125 2024-09-19 05:48:58,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=594976.6666666666, ans=0.125 2024-09-19 05:48:59,947 INFO [train.py:1198] (1/2) Epoch 33, batch 3450, loss[loss=0.2099, simple_loss=0.2702, pruned_loss=0.05529, ctc_loss=0.1191, cr_loss=0.382, over 33027.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.2653, pruned_loss=0.05718, ctc_loss=0.1216, cr_loss=0.3963, over 6745833.12 frames. ], batch size: 130, lr: 3.60e-03, grad_scale: 32.0 2024-09-19 05:49:04,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=594976.6666666666, ans=0.025 2024-09-19 05:49:32,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=595070.0, ans=0.1 2024-09-19 05:49:48,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=12.0 2024-09-19 05:50:06,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=595163.3333333334, ans=0.125 2024-09-19 05:50:11,508 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.436e+02 2.829e+02 3.408e+02 5.579e+02, threshold=5.659e+02, percent-clipped=1.0 2024-09-19 05:50:16,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=595163.3333333334, ans=0.1 2024-09-19 05:50:21,153 INFO [train.py:1198] (1/2) Epoch 33, batch 3500, loss[loss=0.1874, simple_loss=0.2462, pruned_loss=0.04685, ctc_loss=0.1033, cr_loss=0.3547, over 34484.00 frames. ], tot_loss[loss=0.2094, simple_loss=0.2647, pruned_loss=0.05699, ctc_loss=0.1213, cr_loss=0.3953, over 6747709.87 frames. ], batch size: 85, lr: 3.60e-03, grad_scale: 32.0 2024-09-19 05:50:59,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.36 vs. limit=15.0 2024-09-19 05:51:05,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2024-09-19 05:51:08,574 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:51:21,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=595350.0, ans=0.0 2024-09-19 05:51:23,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.60 vs. limit=15.0 2024-09-19 05:51:32,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=595396.6666666666, ans=0.0 2024-09-19 05:51:33,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=595396.6666666666, ans=0.125 2024-09-19 05:51:41,605 INFO [train.py:1198] (1/2) Epoch 33, batch 3550, loss[loss=0.2301, simple_loss=0.2859, pruned_loss=0.06485, ctc_loss=0.1344, cr_loss=0.4444, over 34352.00 frames. ], tot_loss[loss=0.2098, simple_loss=0.2652, pruned_loss=0.05716, ctc_loss=0.1216, cr_loss=0.3964, over 6756660.66 frames. ], batch size: 103, lr: 3.59e-03, grad_scale: 32.0 2024-09-19 05:51:45,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.76 vs. limit=22.5 2024-09-19 05:51:55,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=595443.3333333334, ans=0.1 2024-09-19 05:51:59,826 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:52:04,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=595490.0, ans=0.2 2024-09-19 05:52:25,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=595536.6666666666, ans=0.0 2024-09-19 05:52:40,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595583.3333333334, ans=0.1 2024-09-19 05:52:53,395 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.166e+02 2.518e+02 3.052e+02 3.985e+02 6.598e+02, threshold=6.103e+02, percent-clipped=3.0 2024-09-19 05:53:03,078 INFO [train.py:1198] (1/2) Epoch 33, batch 3600, loss[loss=0.2024, simple_loss=0.2578, pruned_loss=0.05434, ctc_loss=0.1168, cr_loss=0.374, over 34484.00 frames. ], tot_loss[loss=0.21, simple_loss=0.2653, pruned_loss=0.0572, ctc_loss=0.1217, cr_loss=0.397, over 6766994.14 frames. ], batch size: 90, lr: 3.59e-03, grad_scale: 32.0 2024-09-19 05:53:03,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=595676.6666666666, ans=0.0 2024-09-19 05:53:25,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=595723.3333333334, ans=0.125 2024-09-19 05:53:51,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=595816.6666666666, ans=0.0 2024-09-19 05:53:58,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=595816.6666666666, ans=0.125 2024-09-19 05:54:13,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=595863.3333333334, ans=0.2 2024-09-19 05:54:24,368 INFO [train.py:1198] (1/2) Epoch 33, batch 3650, loss[loss=0.2193, simple_loss=0.2784, pruned_loss=0.05942, ctc_loss=0.1264, cr_loss=0.4035, over 34426.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.2646, pruned_loss=0.05689, ctc_loss=0.1211, cr_loss=0.3955, over 6770586.95 frames. ], batch size: 110, lr: 3.59e-03, grad_scale: 32.0 2024-09-19 05:54:37,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=595910.0, ans=0.125 2024-09-19 05:55:00,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=596003.3333333334, ans=0.125 2024-09-19 05:55:34,796 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.040e+02 2.598e+02 3.144e+02 3.916e+02 7.861e+02, threshold=6.287e+02, percent-clipped=3.0 2024-09-19 05:55:35,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=596096.6666666666, ans=0.1 2024-09-19 05:55:44,544 INFO [train.py:1198] (1/2) Epoch 33, batch 3700, loss[loss=0.2079, simple_loss=0.265, pruned_loss=0.05551, ctc_loss=0.1189, cr_loss=0.4015, over 34646.00 frames. ], tot_loss[loss=0.2096, simple_loss=0.2651, pruned_loss=0.05698, ctc_loss=0.1213, cr_loss=0.3955, over 6785618.50 frames. ], batch size: 102, lr: 3.59e-03, grad_scale: 32.0 2024-09-19 05:55:49,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=596143.3333333334, ans=0.125 2024-09-19 05:55:52,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=596143.3333333334, ans=0.1 2024-09-19 05:55:58,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=596143.3333333334, ans=0.125 2024-09-19 05:56:16,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=596236.6666666666, ans=0.125 2024-09-19 05:56:46,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=596283.3333333334, ans=0.125 2024-09-19 05:56:49,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=596330.0, ans=0.1 2024-09-19 05:57:07,061 INFO [train.py:1198] (1/2) Epoch 33, batch 3750, loss[loss=0.2207, simple_loss=0.2802, pruned_loss=0.0598, ctc_loss=0.1272, cr_loss=0.4054, over 34342.00 frames. ], tot_loss[loss=0.2123, simple_loss=0.2679, pruned_loss=0.05803, ctc_loss=0.1232, cr_loss=0.4003, over 6787489.07 frames. ], batch size: 113, lr: 3.59e-03, grad_scale: 16.0 2024-09-19 05:57:07,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=22.5 2024-09-19 05:57:12,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=596376.6666666666, ans=0.1 2024-09-19 05:57:18,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=596376.6666666666, ans=0.0 2024-09-19 05:57:33,748 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2024-09-19 05:57:49,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=596470.0, ans=0.125 2024-09-19 05:57:51,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=596470.0, ans=0.0 2024-09-19 05:58:12,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=596563.3333333334, ans=0.2 2024-09-19 05:58:20,065 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.939e+02 2.401e+02 2.579e+02 2.756e+02 4.707e+02, threshold=5.159e+02, percent-clipped=0.0 2024-09-19 05:58:24,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.46 vs. limit=22.5 2024-09-19 05:58:28,795 INFO [train.py:1198] (1/2) Epoch 33, batch 3800, loss[loss=0.2444, simple_loss=0.2901, pruned_loss=0.07468, ctc_loss=0.1534, cr_loss=0.4676, over 30031.00 frames. ], tot_loss[loss=0.215, simple_loss=0.27, pruned_loss=0.05931, ctc_loss=0.1256, cr_loss=0.4051, over 6675099.67 frames. ], batch size: 175, lr: 3.59e-03, grad_scale: 16.0 2024-09-19 05:58:34,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=596610.0, ans=0.0 2024-09-19 05:58:46,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.74 vs. limit=15.0 2024-09-19 05:59:16,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=596703.3333333334, ans=0.025 2024-09-19 05:59:30,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.61 vs. limit=12.0 2024-09-19 05:59:33,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=596750.0, ans=0.125 2024-09-19 05:59:38,329 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 05:59:51,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=596843.3333333334, ans=0.0 2024-09-19 05:59:52,803 INFO [train.py:1198] (1/2) Epoch 33, batch 3850, loss[loss=0.2465, simple_loss=0.2897, pruned_loss=0.07705, ctc_loss=0.1578, cr_loss=0.4435, over 23586.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.2723, pruned_loss=0.06116, ctc_loss=0.1295, cr_loss=0.4096, over 6246607.75 frames. ], batch size: 244, lr: 3.59e-03, grad_scale: 16.0 2024-09-19 06:00:01,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=596843.3333333334, ans=10.0 2024-09-19 06:00:04,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=596843.3333333334, ans=0.0 2024-09-19 06:00:08,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=596890.0, ans=0.1 2024-09-19 06:01:24,331 INFO [train.py:1198] (1/2) Epoch 34, batch 0, loss[loss=0.1947, simple_loss=0.2484, pruned_loss=0.05169, ctc_loss=0.1125, cr_loss=0.378, over 34480.00 frames. ], tot_loss[loss=0.1947, simple_loss=0.2484, pruned_loss=0.05169, ctc_loss=0.1125, cr_loss=0.378, over 34480.00 frames. ], batch size: 85, lr: 3.54e-03, grad_scale: 32.0 2024-09-19 06:01:24,332 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 06:01:30,320 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([6.1690, 5.5370, 5.6484, 5.7432], device='cuda:1') 2024-09-19 06:01:41,297 INFO [train.py:1230] (1/2) Epoch 34, validation: loss=0.1485, simple_loss=0.244, pruned_loss=0.02257, ctc_loss=0.03923, cr_loss=2.114e-14, over 944034.00 frames. 2024-09-19 06:01:41,297 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 06:02:14,715 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.680e+02 2.911e+02 3.228e+02 5.781e+02, threshold=5.822e+02, percent-clipped=3.0 2024-09-19 06:02:21,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=597058.0, ans=0.125 2024-09-19 06:02:47,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=597151.3333333334, ans=0.125 2024-09-19 06:02:48,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=597151.3333333334, ans=0.125 2024-09-19 06:02:54,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=597151.3333333334, ans=0.0 2024-09-19 06:02:59,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=597151.3333333334, ans=0.125 2024-09-19 06:03:01,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=597151.3333333334, ans=0.125 2024-09-19 06:03:04,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=22.5 2024-09-19 06:03:05,731 INFO [train.py:1198] (1/2) Epoch 34, batch 50, loss[loss=0.1861, simple_loss=0.2403, pruned_loss=0.04831, ctc_loss=0.1055, cr_loss=0.3558, over 34517.00 frames. ], tot_loss[loss=0.211, simple_loss=0.2662, pruned_loss=0.05775, ctc_loss=0.1225, cr_loss=0.3971, over 1479028.48 frames. ], batch size: 82, lr: 3.53e-03, grad_scale: 32.0 2024-09-19 06:03:11,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=597198.0, ans=0.025 2024-09-19 06:03:12,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=597198.0, ans=0.0 2024-09-19 06:03:14,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=597198.0, ans=0.2 2024-09-19 06:03:30,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=597244.6666666666, ans=0.0 2024-09-19 06:03:32,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=597244.6666666666, ans=0.125 2024-09-19 06:03:37,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=597244.6666666666, ans=0.1 2024-09-19 06:03:47,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=597291.3333333334, ans=0.125 2024-09-19 06:03:54,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.00 vs. limit=22.5 2024-09-19 06:04:36,117 INFO [train.py:1198] (1/2) Epoch 34, batch 100, loss[loss=0.2011, simple_loss=0.2548, pruned_loss=0.05463, ctc_loss=0.1149, cr_loss=0.3799, over 34589.00 frames. ], tot_loss[loss=0.213, simple_loss=0.2685, pruned_loss=0.05836, ctc_loss=0.1237, cr_loss=0.4016, over 2630298.62 frames. ], batch size: 89, lr: 3.53e-03, grad_scale: 32.0 2024-09-19 06:04:36,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=597431.3333333334, ans=0.0 2024-09-19 06:04:49,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=597431.3333333334, ans=0.0 2024-09-19 06:05:02,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=597478.0, ans=0.125 2024-09-19 06:05:07,152 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.097e+02 2.428e+02 2.861e+02 3.435e+02 7.105e+02, threshold=5.722e+02, percent-clipped=3.0 2024-09-19 06:05:09,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=597524.6666666666, ans=0.5 2024-09-19 06:05:10,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=597524.6666666666, ans=0.125 2024-09-19 06:05:20,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=597524.6666666666, ans=0.0 2024-09-19 06:05:32,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=597571.3333333334, ans=0.0 2024-09-19 06:05:37,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=597571.3333333334, ans=0.125 2024-09-19 06:05:42,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=597618.0, ans=0.07 2024-09-19 06:06:00,438 INFO [train.py:1198] (1/2) Epoch 34, batch 150, loss[loss=0.1898, simple_loss=0.2415, pruned_loss=0.05088, ctc_loss=0.1074, cr_loss=0.3738, over 34484.00 frames. ], tot_loss[loss=0.2098, simple_loss=0.2657, pruned_loss=0.05688, ctc_loss=0.1209, cr_loss=0.396, over 3557977.46 frames. ], batch size: 82, lr: 3.53e-03, grad_scale: 32.0 2024-09-19 06:06:07,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=597664.6666666666, ans=0.125 2024-09-19 06:06:12,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=597664.6666666666, ans=0.125 2024-09-19 06:06:25,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=597711.3333333334, ans=0.125 2024-09-19 06:06:48,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=597804.6666666666, ans=0.125 2024-09-19 06:06:55,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=597804.6666666666, ans=0.125 2024-09-19 06:07:02,139 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.19 vs. limit=15.0 2024-09-19 06:07:24,336 INFO [train.py:1198] (1/2) Epoch 34, batch 200, loss[loss=0.2147, simple_loss=0.2732, pruned_loss=0.05788, ctc_loss=0.1219, cr_loss=0.4006, over 31753.00 frames. ], tot_loss[loss=0.2091, simple_loss=0.2647, pruned_loss=0.05675, ctc_loss=0.1206, cr_loss=0.395, over 4272524.89 frames. ], batch size: 145, lr: 3.53e-03, grad_scale: 32.0 2024-09-19 06:07:25,743 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.27 vs. limit=8.0 2024-09-19 06:07:43,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.65 vs. limit=15.0 2024-09-19 06:07:54,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=597944.6666666666, ans=0.125 2024-09-19 06:07:54,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=597944.6666666666, ans=0.125 2024-09-19 06:07:55,866 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.098e+02 2.534e+02 3.089e+02 4.139e+02 8.649e+02, threshold=6.178e+02, percent-clipped=7.0 2024-09-19 06:08:34,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=598084.6666666666, ans=0.125 2024-09-19 06:08:46,869 INFO [train.py:1198] (1/2) Epoch 34, batch 250, loss[loss=0.2271, simple_loss=0.2848, pruned_loss=0.06272, ctc_loss=0.1316, cr_loss=0.4393, over 34176.00 frames. ], tot_loss[loss=0.2094, simple_loss=0.2651, pruned_loss=0.05677, ctc_loss=0.1208, cr_loss=0.3963, over 4834283.45 frames. ], batch size: 117, lr: 3.53e-03, grad_scale: 32.0 2024-09-19 06:08:48,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=598131.3333333334, ans=0.0 2024-09-19 06:09:21,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=598224.6666666666, ans=0.0 2024-09-19 06:09:36,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=598224.6666666666, ans=0.125 2024-09-19 06:09:37,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598271.3333333334, ans=0.1 2024-09-19 06:09:49,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598271.3333333334, ans=0.1 2024-09-19 06:09:51,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=598271.3333333334, ans=0.0 2024-09-19 06:10:07,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=598318.0, ans=0.125 2024-09-19 06:10:12,232 INFO [train.py:1198] (1/2) Epoch 34, batch 300, loss[loss=0.2296, simple_loss=0.2849, pruned_loss=0.06506, ctc_loss=0.1342, cr_loss=0.4324, over 34360.00 frames. ], tot_loss[loss=0.2084, simple_loss=0.2643, pruned_loss=0.05636, ctc_loss=0.1202, cr_loss=0.3947, over 5262698.04 frames. ], batch size: 107, lr: 3.53e-03, grad_scale: 16.0 2024-09-19 06:10:20,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=598364.6666666666, ans=0.2 2024-09-19 06:10:24,912 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.72 vs. limit=15.0 2024-09-19 06:10:47,082 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.053e+02 2.454e+02 2.763e+02 3.728e+02 5.167e+02, threshold=5.527e+02, percent-clipped=1.0 2024-09-19 06:10:55,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=598458.0, ans=0.125 2024-09-19 06:11:07,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=598504.6666666666, ans=0.125 2024-09-19 06:11:30,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=598551.3333333334, ans=0.1 2024-09-19 06:11:36,603 INFO [train.py:1198] (1/2) Epoch 34, batch 350, loss[loss=0.1889, simple_loss=0.246, pruned_loss=0.04779, ctc_loss=0.1073, cr_loss=0.3679, over 34248.00 frames. ], tot_loss[loss=0.2089, simple_loss=0.2648, pruned_loss=0.05652, ctc_loss=0.1204, cr_loss=0.3957, over 5598053.10 frames. ], batch size: 83, lr: 3.53e-03, grad_scale: 16.0 2024-09-19 06:11:37,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2024-09-19 06:11:40,374 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:12:14,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=598691.3333333334, ans=0.0 2024-09-19 06:12:23,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=598691.3333333334, ans=0.0 2024-09-19 06:12:42,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=598784.6666666666, ans=0.025 2024-09-19 06:12:45,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.13 vs. limit=15.0 2024-09-19 06:12:47,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=598784.6666666666, ans=0.125 2024-09-19 06:12:47,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=598784.6666666666, ans=0.125 2024-09-19 06:12:58,678 INFO [train.py:1198] (1/2) Epoch 34, batch 400, loss[loss=0.2218, simple_loss=0.2797, pruned_loss=0.06063, ctc_loss=0.1285, cr_loss=0.4225, over 34430.00 frames. ], tot_loss[loss=0.2087, simple_loss=0.2646, pruned_loss=0.05645, ctc_loss=0.1203, cr_loss=0.3952, over 5865807.06 frames. ], batch size: 95, lr: 3.53e-03, grad_scale: 32.0 2024-09-19 06:13:23,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.73 vs. limit=10.0 2024-09-19 06:13:26,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=598878.0, ans=0.1 2024-09-19 06:13:34,296 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.449e+02 2.809e+02 3.702e+02 9.101e+02, threshold=5.619e+02, percent-clipped=4.0 2024-09-19 06:13:34,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=598924.6666666666, ans=10.0 2024-09-19 06:13:57,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=598971.3333333334, ans=0.1 2024-09-19 06:14:25,751 INFO [train.py:1198] (1/2) Epoch 34, batch 450, loss[loss=0.2067, simple_loss=0.2638, pruned_loss=0.05487, ctc_loss=0.1214, cr_loss=0.3909, over 34690.00 frames. ], tot_loss[loss=0.2088, simple_loss=0.2646, pruned_loss=0.05654, ctc_loss=0.1205, cr_loss=0.3956, over 6056395.98 frames. ], batch size: 97, lr: 3.53e-03, grad_scale: 32.0 2024-09-19 06:15:02,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=599158.0, ans=0.125 2024-09-19 06:15:22,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=599204.6666666666, ans=0.1 2024-09-19 06:15:34,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=599251.3333333334, ans=0.2 2024-09-19 06:15:34,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=599251.3333333334, ans=0.125 2024-09-19 06:15:45,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=599251.3333333334, ans=0.1 2024-09-19 06:15:48,879 INFO [train.py:1198] (1/2) Epoch 34, batch 500, loss[loss=0.2281, simple_loss=0.2818, pruned_loss=0.06497, ctc_loss=0.1369, cr_loss=0.4262, over 34508.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.2635, pruned_loss=0.05618, ctc_loss=0.1197, cr_loss=0.3937, over 6222896.33 frames. ], batch size: 110, lr: 3.53e-03, grad_scale: 32.0 2024-09-19 06:15:59,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=599298.0, ans=0.0 2024-09-19 06:15:59,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=599298.0, ans=0.2 2024-09-19 06:16:22,132 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.024e+02 2.428e+02 2.793e+02 3.859e+02 5.619e+02, threshold=5.587e+02, percent-clipped=1.0 2024-09-19 06:16:22,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=599391.3333333334, ans=0.125 2024-09-19 06:16:22,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=599391.3333333334, ans=0.0 2024-09-19 06:16:38,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=599438.0, ans=0.1 2024-09-19 06:16:44,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=599438.0, ans=0.1 2024-09-19 06:16:46,613 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=9.96 vs. limit=22.5 2024-09-19 06:17:14,004 INFO [train.py:1198] (1/2) Epoch 34, batch 550, loss[loss=0.2125, simple_loss=0.2706, pruned_loss=0.05685, ctc_loss=0.1226, cr_loss=0.407, over 33888.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.2635, pruned_loss=0.05622, ctc_loss=0.1198, cr_loss=0.3935, over 6331565.87 frames. ], batch size: 122, lr: 3.53e-03, grad_scale: 32.0 2024-09-19 06:17:26,312 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2024-09-19 06:17:59,614 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.51 vs. limit=10.0 2024-09-19 06:18:05,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=599671.3333333334, ans=0.0 2024-09-19 06:18:20,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=599718.0, ans=0.125 2024-09-19 06:18:20,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=599718.0, ans=0.025 2024-09-19 06:18:38,566 INFO [train.py:1198] (1/2) Epoch 34, batch 600, loss[loss=0.2299, simple_loss=0.2891, pruned_loss=0.06307, ctc_loss=0.134, cr_loss=0.4437, over 34250.00 frames. ], tot_loss[loss=0.2085, simple_loss=0.2643, pruned_loss=0.05646, ctc_loss=0.1202, cr_loss=0.3948, over 6432218.03 frames. ], batch size: 117, lr: 3.53e-03, grad_scale: 32.0 2024-09-19 06:18:42,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.59 vs. limit=22.5 2024-09-19 06:18:48,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=599764.6666666666, ans=0.2 2024-09-19 06:18:56,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.09 vs. limit=10.0 2024-09-19 06:19:02,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=599811.3333333334, ans=0.125 2024-09-19 06:19:05,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=599811.3333333334, ans=0.2 2024-09-19 06:19:10,752 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.29 vs. limit=22.5 2024-09-19 06:19:11,494 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.150e+02 2.476e+02 2.858e+02 3.434e+02 7.748e+02, threshold=5.716e+02, percent-clipped=3.0 2024-09-19 06:19:52,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=599951.3333333334, ans=0.1 2024-09-19 06:19:54,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=599951.3333333334, ans=0.1 2024-09-19 06:20:00,371 INFO [train.py:1198] (1/2) Epoch 34, batch 650, loss[loss=0.2142, simple_loss=0.2702, pruned_loss=0.05822, ctc_loss=0.1263, cr_loss=0.4129, over 34518.00 frames. ], tot_loss[loss=0.2076, simple_loss=0.2636, pruned_loss=0.05597, ctc_loss=0.1194, cr_loss=0.3933, over 6524163.98 frames. ], batch size: 94, lr: 3.53e-03, grad_scale: 32.0 2024-09-19 06:20:07,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=599998.0, ans=0.0 2024-09-19 06:20:15,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=600044.6666666666, ans=0.125 2024-09-19 06:20:41,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=600091.3333333334, ans=0.125 2024-09-19 06:20:46,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=600091.3333333334, ans=0.2 2024-09-19 06:21:02,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=600138.0, ans=0.125 2024-09-19 06:21:06,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=600138.0, ans=0.025 2024-09-19 06:21:25,567 INFO [train.py:1198] (1/2) Epoch 34, batch 700, loss[loss=0.2085, simple_loss=0.2622, pruned_loss=0.05772, ctc_loss=0.1203, cr_loss=0.3826, over 34582.00 frames. ], tot_loss[loss=0.2081, simple_loss=0.2642, pruned_loss=0.05617, ctc_loss=0.1198, cr_loss=0.3942, over 6580889.38 frames. ], batch size: 89, lr: 3.53e-03, grad_scale: 32.0 2024-09-19 06:21:34,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=600231.3333333334, ans=10.0 2024-09-19 06:21:46,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=600278.0, ans=0.125 2024-09-19 06:22:00,348 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.008e+02 2.441e+02 3.013e+02 3.984e+02 5.947e+02, threshold=6.025e+02, percent-clipped=3.0 2024-09-19 06:22:02,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=600324.6666666666, ans=0.2 2024-09-19 06:22:03,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=600324.6666666666, ans=0.125 2024-09-19 06:22:35,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=600418.0, ans=0.0 2024-09-19 06:22:49,665 INFO [train.py:1198] (1/2) Epoch 34, batch 750, loss[loss=0.2116, simple_loss=0.2724, pruned_loss=0.05573, ctc_loss=0.1176, cr_loss=0.394, over 34426.00 frames. ], tot_loss[loss=0.2077, simple_loss=0.2637, pruned_loss=0.056, ctc_loss=0.1195, cr_loss=0.3931, over 6622941.99 frames. ], batch size: 95, lr: 3.53e-03, grad_scale: 32.0 2024-09-19 06:22:50,800 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=15.0 2024-09-19 06:23:04,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=600511.3333333334, ans=0.125 2024-09-19 06:23:35,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=600558.0, ans=0.2 2024-09-19 06:23:39,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=600604.6666666666, ans=0.0 2024-09-19 06:23:44,241 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:23:45,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=600604.6666666666, ans=0.0 2024-09-19 06:23:47,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=600604.6666666666, ans=0.07 2024-09-19 06:24:11,950 INFO [train.py:1198] (1/2) Epoch 34, batch 800, loss[loss=0.1931, simple_loss=0.2477, pruned_loss=0.05114, ctc_loss=0.1083, cr_loss=0.364, over 34493.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.2636, pruned_loss=0.05613, ctc_loss=0.1198, cr_loss=0.394, over 6659883.44 frames. ], batch size: 85, lr: 3.52e-03, grad_scale: 32.0 2024-09-19 06:24:28,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=600744.6666666666, ans=0.125 2024-09-19 06:24:28,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=600744.6666666666, ans=0.1 2024-09-19 06:24:44,919 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.025e+02 2.450e+02 2.806e+02 3.406e+02 4.895e+02, threshold=5.611e+02, percent-clipped=0.0 2024-09-19 06:24:58,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=600791.3333333334, ans=10.0 2024-09-19 06:25:08,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=600838.0, ans=0.125 2024-09-19 06:25:23,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=600884.6666666666, ans=0.125 2024-09-19 06:25:23,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=600884.6666666666, ans=0.0 2024-09-19 06:25:34,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=600931.3333333334, ans=0.95 2024-09-19 06:25:36,189 INFO [train.py:1198] (1/2) Epoch 34, batch 850, loss[loss=0.2188, simple_loss=0.2785, pruned_loss=0.05879, ctc_loss=0.1229, cr_loss=0.4214, over 34376.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2632, pruned_loss=0.05593, ctc_loss=0.1195, cr_loss=0.3934, over 6692920.39 frames. ], batch size: 103, lr: 3.52e-03, grad_scale: 32.0 2024-09-19 06:25:38,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.15 vs. limit=10.0 2024-09-19 06:25:41,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=600931.3333333334, ans=0.125 2024-09-19 06:25:48,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=600931.3333333334, ans=0.125 2024-09-19 06:26:26,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=601071.3333333334, ans=0.0 2024-09-19 06:26:40,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2024-09-19 06:26:48,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=601118.0, ans=0.0 2024-09-19 06:27:00,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=601164.6666666666, ans=0.125 2024-09-19 06:27:01,432 INFO [train.py:1198] (1/2) Epoch 34, batch 900, loss[loss=0.1836, simple_loss=0.2411, pruned_loss=0.04588, ctc_loss=0.101, cr_loss=0.354, over 34448.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.2636, pruned_loss=0.05615, ctc_loss=0.1199, cr_loss=0.3943, over 6700671.77 frames. ], batch size: 85, lr: 3.52e-03, grad_scale: 32.0 2024-09-19 06:27:01,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=601164.6666666666, ans=0.125 2024-09-19 06:27:05,151 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:27:13,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=601164.6666666666, ans=0.0 2024-09-19 06:27:14,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=601164.6666666666, ans=0.125 2024-09-19 06:27:34,121 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.968e+02 2.555e+02 2.892e+02 3.823e+02 7.772e+02, threshold=5.784e+02, percent-clipped=2.0 2024-09-19 06:27:48,135 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.04 vs. limit=22.5 2024-09-19 06:28:05,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=601351.3333333334, ans=0.125 2024-09-19 06:28:10,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2024-09-19 06:28:18,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=601351.3333333334, ans=0.125 2024-09-19 06:28:23,343 INFO [train.py:1198] (1/2) Epoch 34, batch 950, loss[loss=0.1924, simple_loss=0.2495, pruned_loss=0.04973, ctc_loss=0.1079, cr_loss=0.3562, over 34712.00 frames. ], tot_loss[loss=0.208, simple_loss=0.2637, pruned_loss=0.05628, ctc_loss=0.1201, cr_loss=0.3948, over 6702343.81 frames. ], batch size: 87, lr: 3.52e-03, grad_scale: 32.0 2024-09-19 06:28:35,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=601398.0, ans=0.125 2024-09-19 06:28:57,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=601491.3333333334, ans=0.0 2024-09-19 06:29:01,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=15.57 vs. limit=15.0 2024-09-19 06:29:10,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=601491.3333333334, ans=0.125 2024-09-19 06:29:29,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=601538.0, ans=0.0 2024-09-19 06:29:30,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=601538.0, ans=0.125 2024-09-19 06:29:34,051 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:29:43,816 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:29:48,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=601631.3333333334, ans=0.1 2024-09-19 06:29:49,792 INFO [train.py:1198] (1/2) Epoch 34, batch 1000, loss[loss=0.2064, simple_loss=0.2614, pruned_loss=0.0558, ctc_loss=0.121, cr_loss=0.3929, over 34488.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2646, pruned_loss=0.05672, ctc_loss=0.121, cr_loss=0.3967, over 6696446.70 frames. ], batch size: 90, lr: 3.52e-03, grad_scale: 32.0 2024-09-19 06:29:51,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=601631.3333333334, ans=0.04949747468305833 2024-09-19 06:29:56,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=601631.3333333334, ans=0.0 2024-09-19 06:30:06,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=601678.0, ans=0.125 2024-09-19 06:30:09,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=15.0 2024-09-19 06:30:21,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2024-09-19 06:30:23,024 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.059e+02 2.526e+02 2.874e+02 3.446e+02 5.871e+02, threshold=5.748e+02, percent-clipped=2.0 2024-09-19 06:30:38,902 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.87 vs. limit=15.0 2024-09-19 06:30:42,505 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.95 vs. limit=12.0 2024-09-19 06:31:12,633 INFO [train.py:1198] (1/2) Epoch 34, batch 1050, loss[loss=0.2223, simple_loss=0.2777, pruned_loss=0.06226, ctc_loss=0.128, cr_loss=0.4193, over 34589.00 frames. ], tot_loss[loss=0.2088, simple_loss=0.2641, pruned_loss=0.05672, ctc_loss=0.1209, cr_loss=0.3962, over 6705521.09 frames. ], batch size: 99, lr: 3.52e-03, grad_scale: 32.0 2024-09-19 06:31:12,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=601864.6666666666, ans=0.2 2024-09-19 06:31:33,004 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.54 vs. limit=15.0 2024-09-19 06:31:34,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=601911.3333333334, ans=0.0 2024-09-19 06:31:37,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=601911.3333333334, ans=0.0 2024-09-19 06:31:45,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=601958.0, ans=0.0 2024-09-19 06:31:50,985 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:32:17,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=602051.3333333334, ans=0.07 2024-09-19 06:32:20,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=602051.3333333334, ans=0.125 2024-09-19 06:32:29,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602051.3333333334, ans=0.1 2024-09-19 06:32:37,735 INFO [train.py:1198] (1/2) Epoch 34, batch 1100, loss[loss=0.2065, simple_loss=0.2611, pruned_loss=0.05604, ctc_loss=0.12, cr_loss=0.3966, over 34732.00 frames. ], tot_loss[loss=0.2085, simple_loss=0.264, pruned_loss=0.05658, ctc_loss=0.1205, cr_loss=0.3954, over 6719130.21 frames. ], batch size: 92, lr: 3.52e-03, grad_scale: 32.0 2024-09-19 06:32:40,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.00 vs. limit=22.5 2024-09-19 06:32:59,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=602144.6666666666, ans=0.125 2024-09-19 06:33:05,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=602144.6666666666, ans=0.025 2024-09-19 06:33:10,519 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.437e+02 2.924e+02 3.417e+02 6.997e+02, threshold=5.848e+02, percent-clipped=3.0 2024-09-19 06:33:31,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=602238.0, ans=0.0 2024-09-19 06:33:32,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=602238.0, ans=0.0 2024-09-19 06:34:02,241 INFO [train.py:1198] (1/2) Epoch 34, batch 1150, loss[loss=0.206, simple_loss=0.2638, pruned_loss=0.05488, ctc_loss=0.1146, cr_loss=0.3871, over 34741.00 frames. ], tot_loss[loss=0.2089, simple_loss=0.2643, pruned_loss=0.05672, ctc_loss=0.1209, cr_loss=0.3962, over 6717590.76 frames. ], batch size: 92, lr: 3.52e-03, grad_scale: 32.0 2024-09-19 06:34:30,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=602378.0, ans=0.025 2024-09-19 06:34:49,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=602424.6666666666, ans=0.125 2024-09-19 06:34:52,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=602471.3333333334, ans=0.07 2024-09-19 06:35:00,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=602471.3333333334, ans=0.07 2024-09-19 06:35:02,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=602471.3333333334, ans=0.1 2024-09-19 06:35:10,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.62 vs. limit=6.0 2024-09-19 06:35:17,694 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.73 vs. limit=22.5 2024-09-19 06:35:24,786 INFO [train.py:1198] (1/2) Epoch 34, batch 1200, loss[loss=0.2205, simple_loss=0.2797, pruned_loss=0.06015, ctc_loss=0.1248, cr_loss=0.4022, over 34569.00 frames. ], tot_loss[loss=0.2094, simple_loss=0.2649, pruned_loss=0.05685, ctc_loss=0.1212, cr_loss=0.3974, over 6709862.10 frames. ], batch size: 99, lr: 3.52e-03, grad_scale: 32.0 2024-09-19 06:35:39,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2024-09-19 06:35:41,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=602611.3333333334, ans=0.125 2024-09-19 06:35:42,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.88 vs. limit=15.0 2024-09-19 06:35:57,960 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.002e+02 2.486e+02 2.859e+02 3.939e+02 6.202e+02, threshold=5.719e+02, percent-clipped=2.0 2024-09-19 06:36:01,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=602658.0, ans=0.125 2024-09-19 06:36:28,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=602704.6666666666, ans=0.2 2024-09-19 06:36:32,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=602751.3333333334, ans=0.125 2024-09-19 06:36:45,570 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.84 vs. limit=15.0 2024-09-19 06:36:49,637 INFO [train.py:1198] (1/2) Epoch 34, batch 1250, loss[loss=0.226, simple_loss=0.2811, pruned_loss=0.06336, ctc_loss=0.1325, cr_loss=0.4409, over 34347.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.2655, pruned_loss=0.05702, ctc_loss=0.1215, cr_loss=0.3976, over 6743405.57 frames. ], batch size: 107, lr: 3.52e-03, grad_scale: 32.0 2024-09-19 06:37:48,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=602938.0, ans=0.1 2024-09-19 06:37:57,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.88 vs. limit=6.0 2024-09-19 06:38:08,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=602984.6666666666, ans=0.125 2024-09-19 06:38:14,678 INFO [train.py:1198] (1/2) Epoch 34, batch 1300, loss[loss=0.2102, simple_loss=0.2669, pruned_loss=0.05677, ctc_loss=0.1222, cr_loss=0.3907, over 33038.00 frames. ], tot_loss[loss=0.2091, simple_loss=0.2646, pruned_loss=0.05679, ctc_loss=0.1209, cr_loss=0.3962, over 6746244.64 frames. ], batch size: 130, lr: 3.52e-03, grad_scale: 16.0 2024-09-19 06:38:29,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=603078.0, ans=0.025 2024-09-19 06:38:41,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=603078.0, ans=0.125 2024-09-19 06:38:49,357 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.101e+02 2.521e+02 2.893e+02 3.587e+02 5.796e+02, threshold=5.786e+02, percent-clipped=1.0 2024-09-19 06:39:08,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=603171.3333333334, ans=0.0 2024-09-19 06:39:09,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=603171.3333333334, ans=0.1 2024-09-19 06:39:24,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=603218.0, ans=0.125 2024-09-19 06:39:27,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=603218.0, ans=0.125 2024-09-19 06:39:30,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2024-09-19 06:39:31,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=603218.0, ans=0.125 2024-09-19 06:39:37,267 INFO [train.py:1198] (1/2) Epoch 34, batch 1350, loss[loss=0.2155, simple_loss=0.2722, pruned_loss=0.05877, ctc_loss=0.1244, cr_loss=0.4106, over 34516.00 frames. ], tot_loss[loss=0.2087, simple_loss=0.2643, pruned_loss=0.05659, ctc_loss=0.1205, cr_loss=0.3952, over 6763665.81 frames. ], batch size: 94, lr: 3.52e-03, grad_scale: 16.0 2024-09-19 06:40:09,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=603311.3333333334, ans=0.125 2024-09-19 06:40:22,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2024-09-19 06:41:01,822 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:41:03,203 INFO [train.py:1198] (1/2) Epoch 34, batch 1400, loss[loss=0.1811, simple_loss=0.2361, pruned_loss=0.04629, ctc_loss=0.09962, cr_loss=0.341, over 34277.00 frames. ], tot_loss[loss=0.2086, simple_loss=0.2643, pruned_loss=0.05655, ctc_loss=0.1205, cr_loss=0.3957, over 6776584.73 frames. ], batch size: 80, lr: 3.52e-03, grad_scale: 16.0 2024-09-19 06:41:13,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=22.5 2024-09-19 06:41:37,842 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.055e+02 2.594e+02 3.169e+02 3.734e+02 7.153e+02, threshold=6.338e+02, percent-clipped=2.0 2024-09-19 06:42:12,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=603684.6666666666, ans=0.1 2024-09-19 06:42:12,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=603684.6666666666, ans=0.125 2024-09-19 06:42:16,293 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2024-09-19 06:42:16,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.59 vs. limit=15.0 2024-09-19 06:42:25,662 INFO [train.py:1198] (1/2) Epoch 34, batch 1450, loss[loss=0.2199, simple_loss=0.2737, pruned_loss=0.0617, ctc_loss=0.1271, cr_loss=0.4309, over 34435.00 frames. ], tot_loss[loss=0.2089, simple_loss=0.2646, pruned_loss=0.05661, ctc_loss=0.1206, cr_loss=0.3956, over 6773489.62 frames. ], batch size: 110, lr: 3.52e-03, grad_scale: 16.0 2024-09-19 06:42:27,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=603731.3333333334, ans=0.0 2024-09-19 06:42:35,034 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2024-09-19 06:42:45,901 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:42:49,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.68 vs. limit=15.0 2024-09-19 06:42:58,431 INFO [scaling.py:801] (1/2) Caught exception in Balancer backward: CUDA out of memory. Tried to allocate 3.78 GiB. GPU 1 has a total capacity of 79.17 GiB of which 3.66 GiB is free. Process 39810 has 75.51 GiB memory in use. Of the allocated memory 29.36 GiB is allocated by PyTorch, and 43.75 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables), size=[234, 384, 594, 19], will continue. 2024-09-19 06:43:08,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=603824.6666666666, ans=0.5 2024-09-19 06:43:14,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=22.5 2024-09-19 06:43:29,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2024-09-19 06:43:33,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=603918.0, ans=0.125 2024-09-19 06:43:33,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=603918.0, ans=0.125 2024-09-19 06:43:37,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=603918.0, ans=0.2 2024-09-19 06:43:42,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=603918.0, ans=0.2 2024-09-19 06:43:48,421 INFO [train.py:1198] (1/2) Epoch 34, batch 1500, loss[loss=0.2228, simple_loss=0.2835, pruned_loss=0.05998, ctc_loss=0.1302, cr_loss=0.4032, over 34482.00 frames. ], tot_loss[loss=0.2096, simple_loss=0.2654, pruned_loss=0.05685, ctc_loss=0.1211, cr_loss=0.3963, over 6774410.66 frames. ], batch size: 100, lr: 3.52e-03, grad_scale: 16.0 2024-09-19 06:43:57,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=603964.6666666666, ans=0.04949747468305833 2024-09-19 06:44:07,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=604011.3333333334, ans=0.05 2024-09-19 06:44:12,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=604011.3333333334, ans=0.09899494936611666 2024-09-19 06:44:13,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.79 vs. limit=15.0 2024-09-19 06:44:27,344 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.090e+02 2.468e+02 2.770e+02 3.368e+02 6.080e+02, threshold=5.539e+02, percent-clipped=0.0 2024-09-19 06:44:27,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=604058.0, ans=0.125 2024-09-19 06:44:47,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=604104.6666666666, ans=0.125 2024-09-19 06:44:57,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=604151.3333333334, ans=0.125 2024-09-19 06:44:59,217 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:45:04,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=604151.3333333334, ans=0.1 2024-09-19 06:45:15,193 INFO [train.py:1198] (1/2) Epoch 34, batch 1550, loss[loss=0.2326, simple_loss=0.286, pruned_loss=0.06665, ctc_loss=0.1381, cr_loss=0.4577, over 34421.00 frames. ], tot_loss[loss=0.2098, simple_loss=0.2654, pruned_loss=0.05705, ctc_loss=0.1215, cr_loss=0.3971, over 6745801.94 frames. ], batch size: 105, lr: 3.51e-03, grad_scale: 16.0 2024-09-19 06:45:39,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.93 vs. limit=15.0 2024-09-19 06:45:45,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.27 vs. limit=10.0 2024-09-19 06:45:46,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=604291.3333333334, ans=0.125 2024-09-19 06:45:58,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=604291.3333333334, ans=0.125 2024-09-19 06:46:37,513 INFO [train.py:1198] (1/2) Epoch 34, batch 1600, loss[loss=0.2191, simple_loss=0.2802, pruned_loss=0.05846, ctc_loss=0.1261, cr_loss=0.3983, over 34551.00 frames. ], tot_loss[loss=0.2096, simple_loss=0.2651, pruned_loss=0.05695, ctc_loss=0.1212, cr_loss=0.3962, over 6725010.47 frames. ], batch size: 99, lr: 3.51e-03, grad_scale: 32.0 2024-09-19 06:46:39,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=604431.3333333334, ans=0.025 2024-09-19 06:47:12,104 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.089e+02 2.498e+02 3.019e+02 3.652e+02 7.522e+02, threshold=6.037e+02, percent-clipped=7.0 2024-09-19 06:47:52,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=604618.0, ans=0.125 2024-09-19 06:47:52,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=604618.0, ans=0.1 2024-09-19 06:47:59,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=604618.0, ans=0.125 2024-09-19 06:48:00,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=604618.0, ans=0.07 2024-09-19 06:48:02,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=604664.6666666666, ans=0.0 2024-09-19 06:48:03,639 INFO [train.py:1198] (1/2) Epoch 34, batch 1650, loss[loss=0.2134, simple_loss=0.2716, pruned_loss=0.05709, ctc_loss=0.1223, cr_loss=0.4127, over 34378.00 frames. ], tot_loss[loss=0.2093, simple_loss=0.2649, pruned_loss=0.05685, ctc_loss=0.1211, cr_loss=0.3957, over 6716799.56 frames. ], batch size: 103, lr: 3.51e-03, grad_scale: 32.0 2024-09-19 06:48:10,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=604664.6666666666, ans=0.1 2024-09-19 06:48:17,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=604664.6666666666, ans=0.125 2024-09-19 06:48:19,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.14 vs. limit=15.0 2024-09-19 06:49:00,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=604804.6666666666, ans=0.09899494936611666 2024-09-19 06:49:13,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=604851.3333333334, ans=0.025 2024-09-19 06:49:19,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=604851.3333333334, ans=0.0 2024-09-19 06:49:25,815 INFO [train.py:1198] (1/2) Epoch 34, batch 1700, loss[loss=0.1779, simple_loss=0.2334, pruned_loss=0.0445, ctc_loss=0.09853, cr_loss=0.3418, over 34323.00 frames. ], tot_loss[loss=0.2088, simple_loss=0.2646, pruned_loss=0.05656, ctc_loss=0.1205, cr_loss=0.3946, over 6741950.98 frames. ], batch size: 80, lr: 3.51e-03, grad_scale: 32.0 2024-09-19 06:49:41,448 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2024-09-19 06:49:46,060 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:50:00,409 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.509e+02 3.098e+02 3.682e+02 7.779e+02, threshold=6.195e+02, percent-clipped=4.0 2024-09-19 06:50:00,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=604991.3333333334, ans=0.0 2024-09-19 06:50:05,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=604991.3333333334, ans=0.125 2024-09-19 06:50:09,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=604991.3333333334, ans=0.1 2024-09-19 06:50:09,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=604991.3333333334, ans=0.125 2024-09-19 06:50:18,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=605038.0, ans=0.0 2024-09-19 06:50:48,441 INFO [train.py:1198] (1/2) Epoch 34, batch 1750, loss[loss=0.1908, simple_loss=0.2454, pruned_loss=0.05029, ctc_loss=0.1065, cr_loss=0.3581, over 34188.00 frames. ], tot_loss[loss=0.2086, simple_loss=0.2643, pruned_loss=0.05649, ctc_loss=0.1205, cr_loss=0.3946, over 6752679.05 frames. ], batch size: 78, lr: 3.51e-03, grad_scale: 32.0 2024-09-19 06:51:00,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=605131.3333333334, ans=0.125 2024-09-19 06:51:32,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-09-19 06:51:38,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=605271.3333333334, ans=0.2 2024-09-19 06:51:47,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.25 vs. limit=15.0 2024-09-19 06:51:48,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=605271.3333333334, ans=0.0 2024-09-19 06:52:01,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=605318.0, ans=0.125 2024-09-19 06:52:04,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=605318.0, ans=0.125 2024-09-19 06:52:08,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=605318.0, ans=0.2 2024-09-19 06:52:14,134 INFO [train.py:1198] (1/2) Epoch 34, batch 1800, loss[loss=0.2185, simple_loss=0.2799, pruned_loss=0.05819, ctc_loss=0.1233, cr_loss=0.3992, over 34692.00 frames. ], tot_loss[loss=0.2091, simple_loss=0.2648, pruned_loss=0.05666, ctc_loss=0.1208, cr_loss=0.3957, over 6755798.16 frames. ], batch size: 97, lr: 3.51e-03, grad_scale: 32.0 2024-09-19 06:52:25,092 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.07 vs. limit=15.0 2024-09-19 06:52:28,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.51 vs. limit=10.0 2024-09-19 06:52:32,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=605411.3333333334, ans=0.1 2024-09-19 06:52:48,624 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.091e+02 2.549e+02 2.900e+02 3.660e+02 6.027e+02, threshold=5.799e+02, percent-clipped=0.0 2024-09-19 06:53:01,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.25 vs. limit=10.0 2024-09-19 06:53:02,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=605504.6666666666, ans=0.125 2024-09-19 06:53:05,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=605504.6666666666, ans=0.04949747468305833 2024-09-19 06:53:36,741 INFO [train.py:1198] (1/2) Epoch 34, batch 1850, loss[loss=0.2085, simple_loss=0.2685, pruned_loss=0.05481, ctc_loss=0.1186, cr_loss=0.3789, over 34447.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2646, pruned_loss=0.05673, ctc_loss=0.1208, cr_loss=0.3958, over 6763193.86 frames. ], batch size: 100, lr: 3.51e-03, grad_scale: 32.0 2024-09-19 06:53:37,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=605598.0, ans=0.0 2024-09-19 06:53:38,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.91 vs. limit=15.0 2024-09-19 06:53:42,079 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:53:46,999 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=605598.0, ans=0.125 2024-09-19 06:54:05,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=605644.6666666666, ans=0.125 2024-09-19 06:54:15,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=605691.3333333334, ans=10.0 2024-09-19 06:54:28,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=605738.0, ans=0.0 2024-09-19 06:54:49,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=605784.6666666666, ans=0.07 2024-09-19 06:54:59,158 INFO [train.py:1198] (1/2) Epoch 34, batch 1900, loss[loss=0.22, simple_loss=0.2763, pruned_loss=0.06043, ctc_loss=0.1315, cr_loss=0.4138, over 34378.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2648, pruned_loss=0.05664, ctc_loss=0.1207, cr_loss=0.3958, over 6772422.50 frames. ], batch size: 103, lr: 3.51e-03, grad_scale: 32.0 2024-09-19 06:55:07,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=605831.3333333334, ans=0.125 2024-09-19 06:55:11,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.77 vs. limit=15.0 2024-09-19 06:55:14,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=605878.0, ans=0.1 2024-09-19 06:55:33,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=605878.0, ans=0.0 2024-09-19 06:55:37,981 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.525e+02 2.946e+02 3.661e+02 7.395e+02, threshold=5.892e+02, percent-clipped=3.0 2024-09-19 06:55:41,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=605924.6666666666, ans=0.125 2024-09-19 06:56:06,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=605971.3333333334, ans=0.125 2024-09-19 06:56:22,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=606018.0, ans=0.07 2024-09-19 06:56:24,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=606064.6666666666, ans=0.125 2024-09-19 06:56:25,753 INFO [train.py:1198] (1/2) Epoch 34, batch 1950, loss[loss=0.2083, simple_loss=0.2605, pruned_loss=0.05765, ctc_loss=0.1239, cr_loss=0.4013, over 34355.00 frames. ], tot_loss[loss=0.2097, simple_loss=0.2658, pruned_loss=0.05681, ctc_loss=0.121, cr_loss=0.3971, over 6788987.82 frames. ], batch size: 91, lr: 3.51e-03, grad_scale: 32.0 2024-09-19 06:57:08,229 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2024-09-19 06:57:10,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=606158.0, ans=0.2 2024-09-19 06:57:28,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=606204.6666666666, ans=0.0 2024-09-19 06:57:40,563 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 06:57:41,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.27 vs. limit=15.0 2024-09-19 06:57:42,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=606251.3333333334, ans=0.125 2024-09-19 06:57:45,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=606251.3333333334, ans=0.0 2024-09-19 06:57:48,470 INFO [train.py:1198] (1/2) Epoch 34, batch 2000, loss[loss=0.1908, simple_loss=0.2395, pruned_loss=0.05233, ctc_loss=0.1123, cr_loss=0.3756, over 34153.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2665, pruned_loss=0.05719, ctc_loss=0.1218, cr_loss=0.3994, over 6761734.72 frames. ], batch size: 78, lr: 3.51e-03, grad_scale: 32.0 2024-09-19 06:57:53,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=606298.0, ans=0.125 2024-09-19 06:57:53,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=606298.0, ans=0.125 2024-09-19 06:58:04,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2024-09-19 06:58:13,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=606344.6666666666, ans=10.0 2024-09-19 06:58:14,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=606344.6666666666, ans=15.0 2024-09-19 06:58:15,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=606344.6666666666, ans=0.125 2024-09-19 06:58:23,328 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.187e+02 2.445e+02 2.868e+02 3.450e+02 7.129e+02, threshold=5.735e+02, percent-clipped=2.0 2024-09-19 06:58:38,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=606438.0, ans=0.125 2024-09-19 06:59:02,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=606484.6666666666, ans=0.0 2024-09-19 06:59:04,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.44 vs. limit=22.5 2024-09-19 06:59:15,228 INFO [train.py:1198] (1/2) Epoch 34, batch 2050, loss[loss=0.1834, simple_loss=0.2386, pruned_loss=0.04695, ctc_loss=0.1028, cr_loss=0.3436, over 34486.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.2651, pruned_loss=0.05668, ctc_loss=0.1208, cr_loss=0.3965, over 6753002.60 frames. ], batch size: 82, lr: 3.51e-03, grad_scale: 32.0 2024-09-19 06:59:32,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=606578.0, ans=0.2 2024-09-19 06:59:35,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=606578.0, ans=0.5 2024-09-19 06:59:37,122 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:00:04,576 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.34 vs. limit=22.5 2024-09-19 07:00:06,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=606671.3333333334, ans=0.1 2024-09-19 07:00:20,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=606718.0, ans=0.025 2024-09-19 07:00:25,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.20 vs. limit=22.5 2024-09-19 07:00:37,705 INFO [train.py:1198] (1/2) Epoch 34, batch 2100, loss[loss=0.2096, simple_loss=0.2666, pruned_loss=0.05625, ctc_loss=0.1202, cr_loss=0.4006, over 34525.00 frames. ], tot_loss[loss=0.2086, simple_loss=0.2645, pruned_loss=0.05644, ctc_loss=0.1203, cr_loss=0.3954, over 6769055.40 frames. ], batch size: 94, lr: 3.51e-03, grad_scale: 32.0 2024-09-19 07:00:54,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=606811.3333333334, ans=0.025 2024-09-19 07:00:56,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=606811.3333333334, ans=0.1 2024-09-19 07:00:57,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=606811.3333333334, ans=0.125 2024-09-19 07:01:12,113 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.057e+02 2.462e+02 2.722e+02 3.736e+02 7.846e+02, threshold=5.443e+02, percent-clipped=3.0 2024-09-19 07:01:14,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2024-09-19 07:01:30,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=15.0 2024-09-19 07:01:50,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2024-09-19 07:01:51,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=606951.3333333334, ans=0.125 2024-09-19 07:01:59,624 INFO [train.py:1198] (1/2) Epoch 34, batch 2150, loss[loss=0.2035, simple_loss=0.259, pruned_loss=0.05458, ctc_loss=0.1175, cr_loss=0.3814, over 34362.00 frames. ], tot_loss[loss=0.2081, simple_loss=0.264, pruned_loss=0.05618, ctc_loss=0.1199, cr_loss=0.3945, over 6788221.37 frames. ], batch size: 91, lr: 3.51e-03, grad_scale: 32.0 2024-09-19 07:01:59,993 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:02:06,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=606998.0, ans=0.025 2024-09-19 07:02:51,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=607138.0, ans=0.1 2024-09-19 07:03:09,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=607184.6666666666, ans=0.125 2024-09-19 07:03:26,742 INFO [train.py:1198] (1/2) Epoch 34, batch 2200, loss[loss=0.2085, simple_loss=0.2676, pruned_loss=0.05446, ctc_loss=0.1196, cr_loss=0.4101, over 34459.00 frames. ], tot_loss[loss=0.2082, simple_loss=0.2641, pruned_loss=0.05628, ctc_loss=0.12, cr_loss=0.3949, over 6783621.86 frames. ], batch size: 100, lr: 3.51e-03, grad_scale: 32.0 2024-09-19 07:03:27,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=607231.3333333334, ans=0.125 2024-09-19 07:03:40,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=607231.3333333334, ans=0.125 2024-09-19 07:03:40,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=607231.3333333334, ans=0.0 2024-09-19 07:03:51,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=607278.0, ans=0.0 2024-09-19 07:03:56,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=607278.0, ans=0.1 2024-09-19 07:03:56,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=607278.0, ans=0.125 2024-09-19 07:04:01,210 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.555e+02 3.089e+02 3.771e+02 7.677e+02, threshold=6.178e+02, percent-clipped=8.0 2024-09-19 07:04:29,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=607371.3333333334, ans=0.0 2024-09-19 07:04:34,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=607418.0, ans=0.0 2024-09-19 07:04:37,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=607418.0, ans=0.125 2024-09-19 07:04:48,834 INFO [train.py:1198] (1/2) Epoch 34, batch 2250, loss[loss=0.2145, simple_loss=0.2727, pruned_loss=0.05786, ctc_loss=0.1214, cr_loss=0.4045, over 34395.00 frames. ], tot_loss[loss=0.2082, simple_loss=0.264, pruned_loss=0.0563, ctc_loss=0.12, cr_loss=0.3951, over 6781029.03 frames. ], batch size: 95, lr: 3.50e-03, grad_scale: 32.0 2024-09-19 07:04:59,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=607464.6666666666, ans=0.025 2024-09-19 07:05:07,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=607511.3333333334, ans=0.2 2024-09-19 07:05:08,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=607511.3333333334, ans=0.1 2024-09-19 07:05:30,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=607558.0, ans=0.0 2024-09-19 07:06:11,601 INFO [train.py:1198] (1/2) Epoch 34, batch 2300, loss[loss=0.1834, simple_loss=0.2384, pruned_loss=0.04675, ctc_loss=0.1036, cr_loss=0.3536, over 34302.00 frames. ], tot_loss[loss=0.2073, simple_loss=0.2631, pruned_loss=0.05597, ctc_loss=0.1194, cr_loss=0.3934, over 6766644.61 frames. ], batch size: 83, lr: 3.50e-03, grad_scale: 32.0 2024-09-19 07:06:14,379 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.24 vs. limit=15.0 2024-09-19 07:06:15,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=607698.0, ans=0.0 2024-09-19 07:06:44,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=607791.3333333334, ans=0.0 2024-09-19 07:06:47,726 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.143e+02 2.581e+02 3.053e+02 3.765e+02 6.073e+02, threshold=6.106e+02, percent-clipped=0.0 2024-09-19 07:06:49,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.35 vs. limit=10.0 2024-09-19 07:07:00,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=607791.3333333334, ans=0.1 2024-09-19 07:07:07,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=607838.0, ans=0.025 2024-09-19 07:07:36,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=607931.3333333334, ans=0.125 2024-09-19 07:07:37,009 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:07:38,191 INFO [train.py:1198] (1/2) Epoch 34, batch 2350, loss[loss=0.2127, simple_loss=0.2684, pruned_loss=0.05772, ctc_loss=0.1247, cr_loss=0.4145, over 34697.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.2634, pruned_loss=0.05622, ctc_loss=0.1197, cr_loss=0.3942, over 6772746.85 frames. ], batch size: 97, lr: 3.50e-03, grad_scale: 32.0 2024-09-19 07:07:38,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=607931.3333333334, ans=0.0 2024-09-19 07:08:57,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=608118.0, ans=0.2 2024-09-19 07:09:00,649 INFO [train.py:1198] (1/2) Epoch 34, batch 2400, loss[loss=0.1879, simple_loss=0.2409, pruned_loss=0.0495, ctc_loss=0.1062, cr_loss=0.3669, over 34576.00 frames. ], tot_loss[loss=0.2083, simple_loss=0.264, pruned_loss=0.0564, ctc_loss=0.12, cr_loss=0.3945, over 6777072.13 frames. ], batch size: 89, lr: 3.50e-03, grad_scale: 32.0 2024-09-19 07:09:02,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=608164.6666666666, ans=0.125 2024-09-19 07:09:14,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=608164.6666666666, ans=0.2 2024-09-19 07:09:27,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=608211.3333333334, ans=0.1 2024-09-19 07:09:34,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=608258.0, ans=0.125 2024-09-19 07:09:37,300 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 2.467e+02 2.989e+02 3.756e+02 5.832e+02, threshold=5.978e+02, percent-clipped=0.0 2024-09-19 07:09:37,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=608258.0, ans=0.0 2024-09-19 07:09:47,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=608258.0, ans=0.2 2024-09-19 07:09:49,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten.whitening_limit, batch_count=608304.6666666666, ans=15.0 2024-09-19 07:09:50,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=608304.6666666666, ans=0.125 2024-09-19 07:09:56,186 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.53 vs. limit=22.5 2024-09-19 07:09:57,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=608304.6666666666, ans=0.125 2024-09-19 07:10:03,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=608304.6666666666, ans=0.125 2024-09-19 07:10:12,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=608351.3333333334, ans=0.95 2024-09-19 07:10:25,157 INFO [train.py:1198] (1/2) Epoch 34, batch 2450, loss[loss=0.2168, simple_loss=0.2694, pruned_loss=0.06085, ctc_loss=0.1305, cr_loss=0.4117, over 34417.00 frames. ], tot_loss[loss=0.2097, simple_loss=0.2653, pruned_loss=0.05698, ctc_loss=0.1212, cr_loss=0.3969, over 6751663.10 frames. ], batch size: 95, lr: 3.50e-03, grad_scale: 16.0 2024-09-19 07:10:41,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=608398.0, ans=0.125 2024-09-19 07:10:42,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=608444.6666666666, ans=0.0 2024-09-19 07:10:44,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=608444.6666666666, ans=0.2 2024-09-19 07:10:49,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=608444.6666666666, ans=0.1 2024-09-19 07:10:50,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=608444.6666666666, ans=0.0 2024-09-19 07:11:01,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.71 vs. limit=15.0 2024-09-19 07:11:48,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=608631.3333333334, ans=0.125 2024-09-19 07:11:50,135 INFO [train.py:1198] (1/2) Epoch 34, batch 2500, loss[loss=0.2208, simple_loss=0.2793, pruned_loss=0.06027, ctc_loss=0.1276, cr_loss=0.4058, over 34463.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.2654, pruned_loss=0.05713, ctc_loss=0.1215, cr_loss=0.3981, over 6763494.99 frames. ], batch size: 100, lr: 3.50e-03, grad_scale: 16.0 2024-09-19 07:12:11,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=608678.0, ans=0.1 2024-09-19 07:12:13,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=608678.0, ans=0.0 2024-09-19 07:12:26,384 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.187e+02 2.549e+02 3.065e+02 3.614e+02 6.784e+02, threshold=6.130e+02, percent-clipped=2.0 2024-09-19 07:12:41,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=608771.3333333334, ans=0.95 2024-09-19 07:12:50,593 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.56 vs. limit=22.5 2024-09-19 07:13:12,949 INFO [train.py:1198] (1/2) Epoch 34, batch 2550, loss[loss=0.1859, simple_loss=0.2429, pruned_loss=0.04622, ctc_loss=0.108, cr_loss=0.3706, over 34137.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.2654, pruned_loss=0.05706, ctc_loss=0.1215, cr_loss=0.3978, over 6766892.04 frames. ], batch size: 78, lr: 3.50e-03, grad_scale: 16.0 2024-09-19 07:13:36,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=608911.3333333334, ans=0.025 2024-09-19 07:13:51,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2024-09-19 07:14:10,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=609004.6666666666, ans=0.125 2024-09-19 07:14:39,294 INFO [train.py:1198] (1/2) Epoch 34, batch 2600, loss[loss=0.2068, simple_loss=0.261, pruned_loss=0.05628, ctc_loss=0.1217, cr_loss=0.3915, over 34751.00 frames. ], tot_loss[loss=0.21, simple_loss=0.2656, pruned_loss=0.05707, ctc_loss=0.1217, cr_loss=0.3982, over 6763902.16 frames. ], batch size: 92, lr: 3.50e-03, grad_scale: 16.0 2024-09-19 07:14:56,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=609144.6666666666, ans=0.125 2024-09-19 07:15:04,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=609144.6666666666, ans=0.2 2024-09-19 07:15:09,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=609144.6666666666, ans=0.0 2024-09-19 07:15:10,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=609191.3333333334, ans=0.125 2024-09-19 07:15:15,390 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.188e+02 2.447e+02 2.814e+02 3.390e+02 5.246e+02, threshold=5.628e+02, percent-clipped=0.0 2024-09-19 07:15:15,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=609191.3333333334, ans=0.125 2024-09-19 07:15:29,420 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.22 vs. limit=15.0 2024-09-19 07:16:01,462 INFO [train.py:1198] (1/2) Epoch 34, batch 2650, loss[loss=0.2233, simple_loss=0.2778, pruned_loss=0.06244, ctc_loss=0.1349, cr_loss=0.4217, over 34218.00 frames. ], tot_loss[loss=0.2095, simple_loss=0.2653, pruned_loss=0.05681, ctc_loss=0.1212, cr_loss=0.3972, over 6770924.31 frames. ], batch size: 117, lr: 3.50e-03, grad_scale: 16.0 2024-09-19 07:16:04,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=22.5 2024-09-19 07:16:11,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=609331.3333333334, ans=0.125 2024-09-19 07:16:13,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=609331.3333333334, ans=0.025 2024-09-19 07:16:16,994 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.53 vs. limit=15.0 2024-09-19 07:16:17,608 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.57 vs. limit=5.0 2024-09-19 07:16:23,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=609378.0, ans=0.125 2024-09-19 07:17:24,400 INFO [train.py:1198] (1/2) Epoch 34, batch 2700, loss[loss=0.2143, simple_loss=0.2762, pruned_loss=0.05642, ctc_loss=0.118, cr_loss=0.3956, over 34627.00 frames. ], tot_loss[loss=0.2097, simple_loss=0.2655, pruned_loss=0.0569, ctc_loss=0.1213, cr_loss=0.3971, over 6764965.62 frames. ], batch size: 102, lr: 3.50e-03, grad_scale: 16.0 2024-09-19 07:17:33,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=8.33 vs. limit=15.0 2024-09-19 07:17:41,480 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2024-09-19 07:17:49,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=609611.3333333334, ans=0.035 2024-09-19 07:18:02,482 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.146e+02 2.474e+02 2.814e+02 3.404e+02 5.832e+02, threshold=5.628e+02, percent-clipped=1.0 2024-09-19 07:18:13,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=609658.0, ans=0.2 2024-09-19 07:18:28,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=609704.6666666666, ans=0.125 2024-09-19 07:18:28,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=609704.6666666666, ans=0.04949747468305833 2024-09-19 07:18:50,988 INFO [train.py:1198] (1/2) Epoch 34, batch 2750, loss[loss=0.1986, simple_loss=0.2532, pruned_loss=0.05326, ctc_loss=0.1138, cr_loss=0.3673, over 34654.00 frames. ], tot_loss[loss=0.2087, simple_loss=0.2645, pruned_loss=0.0565, ctc_loss=0.1206, cr_loss=0.3953, over 6762001.94 frames. ], batch size: 88, lr: 3.50e-03, grad_scale: 16.0 2024-09-19 07:18:58,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=609798.0, ans=0.0 2024-09-19 07:19:01,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=609798.0, ans=0.1 2024-09-19 07:19:02,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=609798.0, ans=0.125 2024-09-19 07:19:07,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=609844.6666666666, ans=0.0 2024-09-19 07:19:12,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=609844.6666666666, ans=0.125 2024-09-19 07:19:17,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=609844.6666666666, ans=0.09899494936611666 2024-09-19 07:19:43,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.13 vs. limit=22.5 2024-09-19 07:20:14,010 INFO [train.py:1198] (1/2) Epoch 34, batch 2800, loss[loss=0.2325, simple_loss=0.2851, pruned_loss=0.06781, ctc_loss=0.1419, cr_loss=0.3988, over 23492.00 frames. ], tot_loss[loss=0.2093, simple_loss=0.2648, pruned_loss=0.05683, ctc_loss=0.121, cr_loss=0.3958, over 6741099.77 frames. ], batch size: 244, lr: 3.50e-03, grad_scale: 32.0 2024-09-19 07:20:15,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=610031.3333333334, ans=0.025 2024-09-19 07:20:32,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=610078.0, ans=0.125 2024-09-19 07:20:47,233 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:20:51,854 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.091e+02 2.490e+02 2.798e+02 3.559e+02 6.425e+02, threshold=5.597e+02, percent-clipped=3.0 2024-09-19 07:20:52,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=610124.6666666666, ans=0.1 2024-09-19 07:20:58,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=610124.6666666666, ans=0.1 2024-09-19 07:21:02,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=610171.3333333334, ans=0.125 2024-09-19 07:21:05,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=610171.3333333334, ans=0.125 2024-09-19 07:21:37,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=610264.6666666666, ans=0.0 2024-09-19 07:21:38,257 INFO [train.py:1198] (1/2) Epoch 34, batch 2850, loss[loss=0.1995, simple_loss=0.2537, pruned_loss=0.05362, ctc_loss=0.1157, cr_loss=0.3746, over 34492.00 frames. ], tot_loss[loss=0.2099, simple_loss=0.2653, pruned_loss=0.05714, ctc_loss=0.1216, cr_loss=0.3973, over 6725125.17 frames. ], batch size: 90, lr: 3.50e-03, grad_scale: 16.0 2024-09-19 07:21:39,284 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.86 vs. limit=15.0 2024-09-19 07:21:59,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.99 vs. limit=22.5 2024-09-19 07:22:00,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=610311.3333333334, ans=0.125 2024-09-19 07:22:01,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.05 vs. limit=15.0 2024-09-19 07:22:18,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=610358.0, ans=0.2 2024-09-19 07:22:20,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=610358.0, ans=0.125 2024-09-19 07:22:25,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=610358.0, ans=0.1 2024-09-19 07:23:03,278 INFO [train.py:1198] (1/2) Epoch 34, batch 2900, loss[loss=0.2074, simple_loss=0.2632, pruned_loss=0.05599, ctc_loss=0.1207, cr_loss=0.3874, over 34541.00 frames. ], tot_loss[loss=0.2106, simple_loss=0.2662, pruned_loss=0.0573, ctc_loss=0.1219, cr_loss=0.3981, over 6755354.04 frames. ], batch size: 94, lr: 3.50e-03, grad_scale: 16.0 2024-09-19 07:23:16,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=610498.0, ans=0.035 2024-09-19 07:23:17,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=610498.0, ans=0.025 2024-09-19 07:23:20,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=610544.6666666666, ans=0.125 2024-09-19 07:23:41,560 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.047e+02 2.482e+02 2.722e+02 3.324e+02 1.296e+03, threshold=5.443e+02, percent-clipped=1.0 2024-09-19 07:24:01,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=610638.0, ans=0.1 2024-09-19 07:24:26,204 INFO [train.py:1198] (1/2) Epoch 34, batch 2950, loss[loss=0.1942, simple_loss=0.2527, pruned_loss=0.04982, ctc_loss=0.1084, cr_loss=0.3614, over 34627.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.2648, pruned_loss=0.05677, ctc_loss=0.1209, cr_loss=0.3953, over 6748765.86 frames. ], batch size: 88, lr: 3.50e-03, grad_scale: 16.0 2024-09-19 07:24:31,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=610731.3333333334, ans=0.02 2024-09-19 07:24:33,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=5.27 vs. limit=12.0 2024-09-19 07:25:09,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=610824.6666666666, ans=0.95 2024-09-19 07:25:18,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=22.5 2024-09-19 07:25:22,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=610871.3333333334, ans=0.125 2024-09-19 07:25:23,374 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.28 vs. limit=10.0 2024-09-19 07:25:25,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=610871.3333333334, ans=0.2 2024-09-19 07:25:27,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=610871.3333333334, ans=0.125 2024-09-19 07:25:36,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=610918.0, ans=0.025 2024-09-19 07:25:52,644 INFO [train.py:1198] (1/2) Epoch 34, batch 3000, loss[loss=0.2069, simple_loss=0.26, pruned_loss=0.05687, ctc_loss=0.1207, cr_loss=0.3979, over 34527.00 frames. ], tot_loss[loss=0.2088, simple_loss=0.2645, pruned_loss=0.05658, ctc_loss=0.1206, cr_loss=0.3948, over 6750003.49 frames. ], batch size: 94, lr: 3.49e-03, grad_scale: 16.0 2024-09-19 07:25:52,644 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 07:26:04,960 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.6788, 4.0373, 4.2110, 3.7729, 3.6297, 3.5100, 3.6354, 3.8473], device='cuda:1') 2024-09-19 07:26:09,608 INFO [train.py:1230] (1/2) Epoch 34, validation: loss=0.1488, simple_loss=0.2432, pruned_loss=0.02321, ctc_loss=0.03948, cr_loss=2.143e-14, over 944034.00 frames. 2024-09-19 07:26:09,609 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 07:26:09,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=610964.6666666666, ans=0.125 2024-09-19 07:26:37,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=15.0 2024-09-19 07:26:40,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=611058.0, ans=0.125 2024-09-19 07:26:40,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=611058.0, ans=0.0 2024-09-19 07:26:46,732 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.499e+02 2.845e+02 3.487e+02 5.900e+02, threshold=5.690e+02, percent-clipped=3.0 2024-09-19 07:26:52,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=611058.0, ans=0.125 2024-09-19 07:27:00,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.79 vs. limit=12.0 2024-09-19 07:27:01,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=611104.6666666666, ans=0.2 2024-09-19 07:27:30,495 INFO [train.py:1198] (1/2) Epoch 34, batch 3050, loss[loss=0.197, simple_loss=0.253, pruned_loss=0.05164, ctc_loss=0.1103, cr_loss=0.3927, over 34576.00 frames. ], tot_loss[loss=0.21, simple_loss=0.2656, pruned_loss=0.05711, ctc_loss=0.1216, cr_loss=0.3972, over 6741218.87 frames. ], batch size: 89, lr: 3.49e-03, grad_scale: 16.0 2024-09-19 07:27:32,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=611198.0, ans=0.125 2024-09-19 07:27:39,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.85 vs. limit=15.0 2024-09-19 07:27:52,343 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:27:59,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=611244.6666666666, ans=0.0 2024-09-19 07:28:07,305 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=15.0 2024-09-19 07:28:42,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=611384.6666666666, ans=0.025 2024-09-19 07:28:47,839 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:28:52,227 INFO [train.py:1198] (1/2) Epoch 34, batch 3100, loss[loss=0.2234, simple_loss=0.2817, pruned_loss=0.06152, ctc_loss=0.1291, cr_loss=0.4074, over 34242.00 frames. ], tot_loss[loss=0.2095, simple_loss=0.2651, pruned_loss=0.05693, ctc_loss=0.1213, cr_loss=0.3965, over 6742066.40 frames. ], batch size: 117, lr: 3.49e-03, grad_scale: 16.0 2024-09-19 07:29:04,499 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.50 vs. limit=15.0 2024-09-19 07:29:29,073 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.515e+02 2.839e+02 3.991e+02 6.848e+02, threshold=5.678e+02, percent-clipped=5.0 2024-09-19 07:29:42,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=611571.3333333334, ans=0.125 2024-09-19 07:29:48,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=611571.3333333334, ans=0.025 2024-09-19 07:29:50,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=611571.3333333334, ans=0.125 2024-09-19 07:29:50,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611571.3333333334, ans=0.1 2024-09-19 07:30:14,501 INFO [train.py:1198] (1/2) Epoch 34, batch 3150, loss[loss=0.2192, simple_loss=0.2772, pruned_loss=0.05913, ctc_loss=0.1296, cr_loss=0.4274, over 33830.00 frames. ], tot_loss[loss=0.2094, simple_loss=0.265, pruned_loss=0.05692, ctc_loss=0.1212, cr_loss=0.3958, over 6748695.31 frames. ], batch size: 122, lr: 3.49e-03, grad_scale: 16.0 2024-09-19 07:30:15,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=611664.6666666666, ans=15.0 2024-09-19 07:30:16,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=611664.6666666666, ans=0.125 2024-09-19 07:30:18,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=611664.6666666666, ans=0.125 2024-09-19 07:30:51,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=611758.0, ans=0.125 2024-09-19 07:31:00,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.02 vs. limit=15.0 2024-09-19 07:31:01,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=611804.6666666666, ans=0.125 2024-09-19 07:31:06,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=611804.6666666666, ans=0.0 2024-09-19 07:31:32,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611851.3333333334, ans=0.1 2024-09-19 07:31:35,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=611898.0, ans=0.125 2024-09-19 07:31:36,512 INFO [train.py:1198] (1/2) Epoch 34, batch 3200, loss[loss=0.1959, simple_loss=0.2543, pruned_loss=0.05074, ctc_loss=0.1089, cr_loss=0.3568, over 34517.00 frames. ], tot_loss[loss=0.2088, simple_loss=0.2643, pruned_loss=0.05664, ctc_loss=0.1207, cr_loss=0.395, over 6762396.61 frames. ], batch size: 94, lr: 3.49e-03, grad_scale: 32.0 2024-09-19 07:31:36,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=611898.0, ans=0.125 2024-09-19 07:31:43,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=611898.0, ans=0.025 2024-09-19 07:31:56,190 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:32:13,566 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.160e+02 2.519e+02 2.830e+02 3.521e+02 6.059e+02, threshold=5.659e+02, percent-clipped=1.0 2024-09-19 07:32:29,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=612038.0, ans=0.125 2024-09-19 07:32:39,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=612084.6666666666, ans=0.0 2024-09-19 07:32:57,017 INFO [train.py:1198] (1/2) Epoch 34, batch 3250, loss[loss=0.2158, simple_loss=0.2717, pruned_loss=0.05912, ctc_loss=0.1244, cr_loss=0.4189, over 34651.00 frames. ], tot_loss[loss=0.2093, simple_loss=0.2649, pruned_loss=0.05682, ctc_loss=0.121, cr_loss=0.3957, over 6772541.26 frames. ], batch size: 98, lr: 3.49e-03, grad_scale: 32.0 2024-09-19 07:33:36,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.28 vs. limit=10.0 2024-09-19 07:33:44,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=612271.3333333334, ans=0.0 2024-09-19 07:33:45,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=612271.3333333334, ans=0.025 2024-09-19 07:34:04,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=612318.0, ans=0.125 2024-09-19 07:34:17,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=612364.6666666666, ans=0.125 2024-09-19 07:34:18,208 INFO [train.py:1198] (1/2) Epoch 34, batch 3300, loss[loss=0.2099, simple_loss=0.2709, pruned_loss=0.05473, ctc_loss=0.1175, cr_loss=0.3986, over 33043.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.2635, pruned_loss=0.0562, ctc_loss=0.1198, cr_loss=0.3936, over 6770830.90 frames. ], batch size: 130, lr: 3.49e-03, grad_scale: 32.0 2024-09-19 07:34:44,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=612411.3333333334, ans=0.0 2024-09-19 07:34:55,614 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.989e+02 2.408e+02 2.735e+02 3.455e+02 5.227e+02, threshold=5.470e+02, percent-clipped=0.0 2024-09-19 07:35:31,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=612551.3333333334, ans=0.95 2024-09-19 07:35:40,457 INFO [train.py:1198] (1/2) Epoch 34, batch 3350, loss[loss=0.232, simple_loss=0.288, pruned_loss=0.06546, ctc_loss=0.1381, cr_loss=0.4359, over 33884.00 frames. ], tot_loss[loss=0.2087, simple_loss=0.2644, pruned_loss=0.05656, ctc_loss=0.1207, cr_loss=0.3952, over 6744082.41 frames. ], batch size: 122, lr: 3.49e-03, grad_scale: 32.0 2024-09-19 07:35:42,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=612598.0, ans=0.0 2024-09-19 07:36:27,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=612738.0, ans=0.125 2024-09-19 07:36:34,639 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.14 vs. limit=10.0 2024-09-19 07:36:36,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=612738.0, ans=22.5 2024-09-19 07:37:02,195 INFO [train.py:1198] (1/2) Epoch 34, batch 3400, loss[loss=0.175, simple_loss=0.2292, pruned_loss=0.04443, ctc_loss=0.09558, cr_loss=0.3176, over 34137.00 frames. ], tot_loss[loss=0.2089, simple_loss=0.2645, pruned_loss=0.05669, ctc_loss=0.1208, cr_loss=0.3954, over 6735302.37 frames. ], batch size: 78, lr: 3.49e-03, grad_scale: 32.0 2024-09-19 07:37:04,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=612831.3333333334, ans=0.125 2024-09-19 07:37:10,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=612831.3333333334, ans=0.125 2024-09-19 07:37:39,341 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.053e+02 2.584e+02 2.896e+02 3.516e+02 6.780e+02, threshold=5.791e+02, percent-clipped=5.0 2024-09-19 07:37:41,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=612924.6666666666, ans=0.1 2024-09-19 07:37:46,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=612924.6666666666, ans=0.125 2024-09-19 07:38:05,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=613018.0, ans=10.0 2024-09-19 07:38:22,851 INFO [train.py:1198] (1/2) Epoch 34, batch 3450, loss[loss=0.2058, simple_loss=0.2664, pruned_loss=0.05337, ctc_loss=0.1167, cr_loss=0.3764, over 33077.00 frames. ], tot_loss[loss=0.2091, simple_loss=0.2648, pruned_loss=0.05669, ctc_loss=0.1208, cr_loss=0.3951, over 6747372.29 frames. ], batch size: 130, lr: 3.49e-03, grad_scale: 32.0 2024-09-19 07:38:39,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=613111.3333333334, ans=0.125 2024-09-19 07:38:51,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=613111.3333333334, ans=0.125 2024-09-19 07:39:43,346 INFO [train.py:1198] (1/2) Epoch 34, batch 3500, loss[loss=0.1811, simple_loss=0.2348, pruned_loss=0.04681, ctc_loss=0.1005, cr_loss=0.3396, over 34489.00 frames. ], tot_loss[loss=0.2086, simple_loss=0.2643, pruned_loss=0.0565, ctc_loss=0.1204, cr_loss=0.3943, over 6750122.82 frames. ], batch size: 85, lr: 3.49e-03, grad_scale: 32.0 2024-09-19 07:40:18,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=613391.3333333334, ans=0.0 2024-09-19 07:40:21,538 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.085e+02 2.432e+02 2.733e+02 3.580e+02 5.823e+02, threshold=5.465e+02, percent-clipped=1.0 2024-09-19 07:40:29,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=613391.3333333334, ans=0.125 2024-09-19 07:40:38,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2024-09-19 07:40:46,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=613438.0, ans=0.0 2024-09-19 07:40:46,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=613438.0, ans=0.2 2024-09-19 07:40:57,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=613484.6666666666, ans=0.0 2024-09-19 07:41:05,022 INFO [train.py:1198] (1/2) Epoch 34, batch 3550, loss[loss=0.2237, simple_loss=0.2815, pruned_loss=0.06176, ctc_loss=0.1283, cr_loss=0.4165, over 34385.00 frames. ], tot_loss[loss=0.2084, simple_loss=0.2641, pruned_loss=0.05641, ctc_loss=0.1201, cr_loss=0.3938, over 6758156.74 frames. ], batch size: 103, lr: 3.49e-03, grad_scale: 32.0 2024-09-19 07:41:32,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=613578.0, ans=0.05 2024-09-19 07:42:17,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=613718.0, ans=0.125 2024-09-19 07:42:26,567 INFO [train.py:1198] (1/2) Epoch 34, batch 3600, loss[loss=0.2157, simple_loss=0.2681, pruned_loss=0.06081, ctc_loss=0.1255, cr_loss=0.4142, over 34469.00 frames. ], tot_loss[loss=0.2087, simple_loss=0.2645, pruned_loss=0.05654, ctc_loss=0.1203, cr_loss=0.3943, over 6767025.27 frames. ], batch size: 90, lr: 3.49e-03, grad_scale: 32.0 2024-09-19 07:42:34,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=613764.6666666666, ans=0.0 2024-09-19 07:42:40,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2024-09-19 07:42:45,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=613811.3333333334, ans=22.5 2024-09-19 07:42:50,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=613811.3333333334, ans=0.0 2024-09-19 07:43:01,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=613858.0, ans=0.1 2024-09-19 07:43:03,102 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.054e+02 2.651e+02 3.556e+02 4.798e+02 9.131e+02, threshold=7.112e+02, percent-clipped=15.0 2024-09-19 07:43:06,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=613858.0, ans=0.0 2024-09-19 07:43:13,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=613904.6666666666, ans=0.2 2024-09-19 07:43:32,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=613951.3333333334, ans=0.125 2024-09-19 07:43:34,308 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:43:46,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=613998.0, ans=0.0 2024-09-19 07:43:47,764 INFO [train.py:1198] (1/2) Epoch 34, batch 3650, loss[loss=0.22, simple_loss=0.2757, pruned_loss=0.06061, ctc_loss=0.1299, cr_loss=0.4245, over 34426.00 frames. ], tot_loss[loss=0.208, simple_loss=0.2639, pruned_loss=0.05618, ctc_loss=0.1198, cr_loss=0.3932, over 6769597.05 frames. ], batch size: 110, lr: 3.49e-03, grad_scale: 32.0 2024-09-19 07:43:55,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=613998.0, ans=0.125 2024-09-19 07:43:59,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=613998.0, ans=0.125 2024-09-19 07:44:01,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=613998.0, ans=0.125 2024-09-19 07:44:07,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=614044.6666666666, ans=0.125 2024-09-19 07:44:09,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=614044.6666666666, ans=0.0 2024-09-19 07:44:21,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=614091.3333333334, ans=0.0 2024-09-19 07:44:25,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=614091.3333333334, ans=0.125 2024-09-19 07:44:39,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=614138.0, ans=22.5 2024-09-19 07:44:57,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=614184.6666666666, ans=0.1 2024-09-19 07:45:05,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=614184.6666666666, ans=0.125 2024-09-19 07:45:08,611 INFO [train.py:1198] (1/2) Epoch 34, batch 3700, loss[loss=0.2005, simple_loss=0.2611, pruned_loss=0.05102, ctc_loss=0.1136, cr_loss=0.379, over 34591.00 frames. ], tot_loss[loss=0.208, simple_loss=0.2641, pruned_loss=0.05609, ctc_loss=0.1197, cr_loss=0.3928, over 6785091.02 frames. ], batch size: 102, lr: 3.49e-03, grad_scale: 32.0 2024-09-19 07:45:10,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=614231.3333333334, ans=0.0 2024-09-19 07:45:34,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=614278.0, ans=0.0 2024-09-19 07:45:45,664 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.821e+02 2.574e+02 2.964e+02 3.496e+02 7.034e+02, threshold=5.927e+02, percent-clipped=0.0 2024-09-19 07:45:50,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=614324.6666666666, ans=0.2 2024-09-19 07:45:52,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=614324.6666666666, ans=0.125 2024-09-19 07:45:54,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=614324.6666666666, ans=0.125 2024-09-19 07:46:09,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2024-09-19 07:46:16,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=614418.0, ans=0.125 2024-09-19 07:46:21,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=614418.0, ans=0.04949747468305833 2024-09-19 07:46:27,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=614464.6666666666, ans=0.0 2024-09-19 07:46:29,104 INFO [train.py:1198] (1/2) Epoch 34, batch 3750, loss[loss=0.2167, simple_loss=0.2757, pruned_loss=0.05807, ctc_loss=0.1247, cr_loss=0.4149, over 34362.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.2673, pruned_loss=0.05734, ctc_loss=0.122, cr_loss=0.3988, over 6785986.51 frames. ], batch size: 113, lr: 3.49e-03, grad_scale: 32.0 2024-09-19 07:46:29,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=614464.6666666666, ans=0.125 2024-09-19 07:46:40,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=614464.6666666666, ans=0.09899494936611666 2024-09-19 07:46:44,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=614511.3333333334, ans=0.125 2024-09-19 07:46:52,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=614511.3333333334, ans=0.04949747468305833 2024-09-19 07:46:55,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=614511.3333333334, ans=0.2 2024-09-19 07:46:58,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=614511.3333333334, ans=0.02 2024-09-19 07:47:08,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=614558.0, ans=0.1 2024-09-19 07:47:26,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=614604.6666666666, ans=0.0 2024-09-19 07:47:28,764 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2024-09-19 07:47:30,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=614604.6666666666, ans=0.0 2024-09-19 07:47:51,030 INFO [train.py:1198] (1/2) Epoch 34, batch 3800, loss[loss=0.2375, simple_loss=0.2878, pruned_loss=0.06977, ctc_loss=0.147, cr_loss=0.4563, over 30184.00 frames. ], tot_loss[loss=0.2145, simple_loss=0.2702, pruned_loss=0.05881, ctc_loss=0.1248, cr_loss=0.4046, over 6676778.08 frames. ], batch size: 175, lr: 3.48e-03, grad_scale: 32.0 2024-09-19 07:48:05,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.73 vs. limit=15.0 2024-09-19 07:48:31,647 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.225e+02 2.364e+02 2.521e+02 2.836e+02 7.176e+02, threshold=5.042e+02, percent-clipped=2.0 2024-09-19 07:48:33,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=614791.3333333334, ans=0.125 2024-09-19 07:48:35,448 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 07:48:35,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=614791.3333333334, ans=0.125 2024-09-19 07:48:42,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=614838.0, ans=0.1 2024-09-19 07:48:49,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.75 vs. limit=6.0 2024-09-19 07:48:56,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.80 vs. limit=15.0 2024-09-19 07:48:57,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.57 vs. limit=15.0 2024-09-19 07:48:58,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=614884.6666666666, ans=0.125 2024-09-19 07:49:10,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=614884.6666666666, ans=0.125 2024-09-19 07:49:14,670 INFO [train.py:1198] (1/2) Epoch 34, batch 3850, loss[loss=0.237, simple_loss=0.2893, pruned_loss=0.06904, ctc_loss=0.1483, cr_loss=0.4263, over 23150.00 frames. ], tot_loss[loss=0.218, simple_loss=0.2724, pruned_loss=0.06075, ctc_loss=0.1288, cr_loss=0.4092, over 6247623.31 frames. ], batch size: 245, lr: 3.48e-03, grad_scale: 16.0 2024-09-19 07:49:33,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=614978.0, ans=0.05 2024-09-19 07:50:41,243 INFO [train.py:1198] (1/2) Epoch 35, batch 0, loss[loss=0.1922, simple_loss=0.2503, pruned_loss=0.04918, ctc_loss=0.1052, cr_loss=0.3655, over 34443.00 frames. ], tot_loss[loss=0.1922, simple_loss=0.2503, pruned_loss=0.04918, ctc_loss=0.1052, cr_loss=0.3655, over 34443.00 frames. ], batch size: 85, lr: 3.43e-03, grad_scale: 32.0 2024-09-19 07:50:41,244 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 07:50:58,232 INFO [train.py:1230] (1/2) Epoch 35, validation: loss=0.1495, simple_loss=0.2445, pruned_loss=0.02328, ctc_loss=0.03966, cr_loss=2.109e-14, over 944034.00 frames. 2024-09-19 07:50:58,233 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 07:51:17,711 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.77 vs. limit=12.0 2024-09-19 07:51:30,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=615146.0, ans=0.125 2024-09-19 07:51:53,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.66 vs. limit=15.0 2024-09-19 07:52:18,627 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.192e+02 2.611e+02 2.885e+02 3.192e+02 6.862e+02, threshold=5.770e+02, percent-clipped=6.0 2024-09-19 07:52:20,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=615286.0, ans=0.125 2024-09-19 07:52:21,882 INFO [train.py:1198] (1/2) Epoch 35, batch 50, loss[loss=0.1845, simple_loss=0.2374, pruned_loss=0.04804, ctc_loss=0.1063, cr_loss=0.3569, over 34478.00 frames. ], tot_loss[loss=0.2109, simple_loss=0.2663, pruned_loss=0.05761, ctc_loss=0.1223, cr_loss=0.3988, over 1481053.73 frames. ], batch size: 82, lr: 3.43e-03, grad_scale: 32.0 2024-09-19 07:52:52,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=615332.6666666666, ans=0.125 2024-09-19 07:53:00,983 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2024-09-19 07:53:06,825 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=12.0 2024-09-19 07:53:08,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.07 vs. limit=10.0 2024-09-19 07:53:11,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=615426.0, ans=0.125 2024-09-19 07:53:39,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=615472.6666666666, ans=0.2 2024-09-19 07:53:40,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=615472.6666666666, ans=0.125 2024-09-19 07:53:45,755 INFO [train.py:1198] (1/2) Epoch 35, batch 100, loss[loss=0.1884, simple_loss=0.2459, pruned_loss=0.04819, ctc_loss=0.1027, cr_loss=0.3478, over 34571.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.267, pruned_loss=0.05751, ctc_loss=0.1223, cr_loss=0.3993, over 2627428.26 frames. ], batch size: 89, lr: 3.43e-03, grad_scale: 32.0 2024-09-19 07:53:47,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=615519.3333333334, ans=0.125 2024-09-19 07:53:55,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=615519.3333333334, ans=0.125 2024-09-19 07:54:00,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=615566.0, ans=0.0 2024-09-19 07:54:06,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.81 vs. limit=15.0 2024-09-19 07:54:09,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.77 vs. limit=15.0 2024-09-19 07:54:14,426 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.16 vs. limit=22.5 2024-09-19 07:54:30,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=615612.6666666666, ans=0.125 2024-09-19 07:54:45,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.65 vs. limit=15.0 2024-09-19 07:54:48,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=615659.3333333334, ans=0.0 2024-09-19 07:55:05,873 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.995e+02 2.544e+02 3.006e+02 3.640e+02 6.858e+02, threshold=6.011e+02, percent-clipped=1.0 2024-09-19 07:55:09,132 INFO [train.py:1198] (1/2) Epoch 35, batch 150, loss[loss=0.1783, simple_loss=0.233, pruned_loss=0.04542, ctc_loss=0.09662, cr_loss=0.3339, over 34472.00 frames. ], tot_loss[loss=0.2086, simple_loss=0.2648, pruned_loss=0.05625, ctc_loss=0.12, cr_loss=0.3939, over 3555845.15 frames. ], batch size: 82, lr: 3.43e-03, grad_scale: 32.0 2024-09-19 07:55:17,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=615752.6666666666, ans=0.125 2024-09-19 07:55:19,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=615752.6666666666, ans=0.0 2024-09-19 07:55:21,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.46 vs. limit=15.0 2024-09-19 07:55:24,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=615799.3333333334, ans=0.1 2024-09-19 07:55:31,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=615799.3333333334, ans=0.125 2024-09-19 07:55:47,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=615846.0, ans=0.125 2024-09-19 07:55:59,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.91 vs. limit=22.5 2024-09-19 07:56:19,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=615939.3333333334, ans=0.025 2024-09-19 07:56:30,971 INFO [train.py:1198] (1/2) Epoch 35, batch 200, loss[loss=0.2186, simple_loss=0.2777, pruned_loss=0.05935, ctc_loss=0.1251, cr_loss=0.3928, over 31952.00 frames. ], tot_loss[loss=0.2077, simple_loss=0.2638, pruned_loss=0.05603, ctc_loss=0.1196, cr_loss=0.3922, over 4271342.72 frames. ], batch size: 145, lr: 3.43e-03, grad_scale: 32.0 2024-09-19 07:56:35,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.32 vs. limit=15.0 2024-09-19 07:57:07,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=616032.6666666666, ans=0.05 2024-09-19 07:57:07,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=616032.6666666666, ans=0.0 2024-09-19 07:57:19,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=616079.3333333334, ans=0.125 2024-09-19 07:57:51,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=616172.6666666666, ans=0.0 2024-09-19 07:57:52,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=616172.6666666666, ans=0.125 2024-09-19 07:57:59,057 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.009e+02 2.574e+02 3.190e+02 4.577e+02 8.928e+02, threshold=6.379e+02, percent-clipped=8.0 2024-09-19 07:58:02,417 INFO [train.py:1198] (1/2) Epoch 35, batch 250, loss[loss=0.2391, simple_loss=0.2951, pruned_loss=0.06838, ctc_loss=0.1411, cr_loss=0.4541, over 34219.00 frames. ], tot_loss[loss=0.2075, simple_loss=0.2638, pruned_loss=0.05588, ctc_loss=0.1191, cr_loss=0.3919, over 4833100.09 frames. ], batch size: 117, lr: 3.43e-03, grad_scale: 32.0 2024-09-19 07:58:18,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=616219.3333333334, ans=0.125 2024-09-19 07:58:27,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=616266.0, ans=0.2 2024-09-19 07:58:31,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=616266.0, ans=0.05 2024-09-19 07:59:15,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=616406.0, ans=0.0 2024-09-19 07:59:26,509 INFO [train.py:1198] (1/2) Epoch 35, batch 300, loss[loss=0.2236, simple_loss=0.2794, pruned_loss=0.06203, ctc_loss=0.1342, cr_loss=0.4226, over 34323.00 frames. ], tot_loss[loss=0.2079, simple_loss=0.2639, pruned_loss=0.05615, ctc_loss=0.1196, cr_loss=0.3936, over 5261456.27 frames. ], batch size: 107, lr: 3.43e-03, grad_scale: 32.0 2024-09-19 07:59:27,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.13 vs. limit=15.0 2024-09-19 07:59:33,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616452.6666666666, ans=0.1 2024-09-19 08:00:16,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=616592.6666666666, ans=0.125 2024-09-19 08:00:34,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=23.35 vs. limit=22.5 2024-09-19 08:00:47,972 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.117e+02 2.505e+02 2.758e+02 3.448e+02 6.680e+02, threshold=5.517e+02, percent-clipped=2.0 2024-09-19 08:00:51,229 INFO [train.py:1198] (1/2) Epoch 35, batch 350, loss[loss=0.1882, simple_loss=0.2429, pruned_loss=0.04896, ctc_loss=0.1045, cr_loss=0.3654, over 34298.00 frames. ], tot_loss[loss=0.2085, simple_loss=0.2645, pruned_loss=0.05635, ctc_loss=0.1201, cr_loss=0.3948, over 5597259.45 frames. ], batch size: 83, lr: 3.43e-03, grad_scale: 32.0 2024-09-19 08:00:56,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=616686.0, ans=0.0 2024-09-19 08:00:56,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=616686.0, ans=0.2 2024-09-19 08:01:06,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=616732.6666666666, ans=0.1 2024-09-19 08:01:19,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=616732.6666666666, ans=0.0 2024-09-19 08:02:15,093 INFO [train.py:1198] (1/2) Epoch 35, batch 400, loss[loss=0.2116, simple_loss=0.2684, pruned_loss=0.05724, ctc_loss=0.12, cr_loss=0.4075, over 34416.00 frames. ], tot_loss[loss=0.2083, simple_loss=0.2642, pruned_loss=0.05634, ctc_loss=0.1199, cr_loss=0.3949, over 5865092.67 frames. ], batch size: 95, lr: 3.43e-03, grad_scale: 32.0 2024-09-19 08:02:29,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=616919.3333333334, ans=0.0 2024-09-19 08:02:34,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=616966.0, ans=0.0 2024-09-19 08:02:42,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=616966.0, ans=0.0 2024-09-19 08:02:47,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=617012.6666666666, ans=10.0 2024-09-19 08:03:12,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=617059.3333333334, ans=0.1 2024-09-19 08:03:15,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2024-09-19 08:03:31,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=617106.0, ans=0.04949747468305833 2024-09-19 08:03:34,928 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.121e+02 2.498e+02 2.897e+02 3.640e+02 6.048e+02, threshold=5.795e+02, percent-clipped=1.0 2024-09-19 08:03:36,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=617152.6666666666, ans=0.125 2024-09-19 08:03:36,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=617152.6666666666, ans=0.125 2024-09-19 08:03:38,175 INFO [train.py:1198] (1/2) Epoch 35, batch 450, loss[loss=0.2083, simple_loss=0.2686, pruned_loss=0.05472, ctc_loss=0.1172, cr_loss=0.3776, over 34707.00 frames. ], tot_loss[loss=0.2087, simple_loss=0.2644, pruned_loss=0.05654, ctc_loss=0.1204, cr_loss=0.3961, over 6055380.92 frames. ], batch size: 97, lr: 3.43e-03, grad_scale: 32.0 2024-09-19 08:03:47,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.68 vs. limit=12.0 2024-09-19 08:03:58,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=617199.3333333334, ans=0.0 2024-09-19 08:04:11,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=617246.0, ans=0.1 2024-09-19 08:04:14,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=617246.0, ans=0.125 2024-09-19 08:04:50,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=617339.3333333334, ans=0.0 2024-09-19 08:05:02,794 INFO [train.py:1198] (1/2) Epoch 35, batch 500, loss[loss=0.221, simple_loss=0.2783, pruned_loss=0.06063, ctc_loss=0.1284, cr_loss=0.4157, over 34412.00 frames. ], tot_loss[loss=0.2075, simple_loss=0.2633, pruned_loss=0.05603, ctc_loss=0.1195, cr_loss=0.394, over 6220448.05 frames. ], batch size: 110, lr: 3.43e-03, grad_scale: 32.0 2024-09-19 08:05:04,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=617386.0, ans=0.2 2024-09-19 08:05:04,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=617386.0, ans=0.1 2024-09-19 08:05:06,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=617386.0, ans=0.125 2024-09-19 08:05:18,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=617432.6666666666, ans=0.2 2024-09-19 08:05:28,568 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.84 vs. limit=15.0 2024-09-19 08:05:39,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=617479.3333333334, ans=0.125 2024-09-19 08:06:00,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.35 vs. limit=15.0 2024-09-19 08:06:24,162 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.059e+02 2.491e+02 2.919e+02 3.450e+02 6.283e+02, threshold=5.838e+02, percent-clipped=1.0 2024-09-19 08:06:27,442 INFO [train.py:1198] (1/2) Epoch 35, batch 550, loss[loss=0.2038, simple_loss=0.2677, pruned_loss=0.05145, ctc_loss=0.1112, cr_loss=0.3673, over 33728.00 frames. ], tot_loss[loss=0.2073, simple_loss=0.2631, pruned_loss=0.05592, ctc_loss=0.1194, cr_loss=0.3939, over 6328930.61 frames. ], batch size: 122, lr: 3.43e-03, grad_scale: 32.0 2024-09-19 08:06:36,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=617619.3333333334, ans=0.0 2024-09-19 08:06:53,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=6.91 vs. limit=15.0 2024-09-19 08:07:02,710 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:07:14,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=617712.6666666666, ans=0.0 2024-09-19 08:07:27,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=617759.3333333334, ans=0.0 2024-09-19 08:07:30,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=617759.3333333334, ans=0.125 2024-09-19 08:07:34,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=617806.0, ans=0.125 2024-09-19 08:07:34,540 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2024-09-19 08:07:39,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=617806.0, ans=0.125 2024-09-19 08:07:48,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.89 vs. limit=6.0 2024-09-19 08:07:50,396 INFO [train.py:1198] (1/2) Epoch 35, batch 600, loss[loss=0.2425, simple_loss=0.296, pruned_loss=0.07041, ctc_loss=0.1467, cr_loss=0.4678, over 34183.00 frames. ], tot_loss[loss=0.2077, simple_loss=0.2637, pruned_loss=0.05602, ctc_loss=0.1197, cr_loss=0.3944, over 6431494.46 frames. ], batch size: 117, lr: 3.42e-03, grad_scale: 32.0 2024-09-19 08:07:59,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=617852.6666666666, ans=0.0 2024-09-19 08:08:47,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=617992.6666666666, ans=0.125 2024-09-19 08:09:05,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=618039.3333333334, ans=0.125 2024-09-19 08:09:11,918 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.057e+02 2.500e+02 2.902e+02 3.671e+02 7.174e+02, threshold=5.804e+02, percent-clipped=4.0 2024-09-19 08:09:15,174 INFO [train.py:1198] (1/2) Epoch 35, batch 650, loss[loss=0.2093, simple_loss=0.2607, pruned_loss=0.05824, ctc_loss=0.1246, cr_loss=0.4146, over 34530.00 frames. ], tot_loss[loss=0.2072, simple_loss=0.2632, pruned_loss=0.05579, ctc_loss=0.1191, cr_loss=0.3934, over 6523746.60 frames. ], batch size: 94, lr: 3.42e-03, grad_scale: 32.0 2024-09-19 08:09:28,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=618086.0, ans=0.125 2024-09-19 08:10:30,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.02 vs. limit=15.0 2024-09-19 08:10:34,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=618272.6666666666, ans=10.0 2024-09-19 08:10:39,432 INFO [train.py:1198] (1/2) Epoch 35, batch 700, loss[loss=0.1973, simple_loss=0.25, pruned_loss=0.05343, ctc_loss=0.1134, cr_loss=0.3742, over 34601.00 frames. ], tot_loss[loss=0.2076, simple_loss=0.2637, pruned_loss=0.05595, ctc_loss=0.1195, cr_loss=0.3944, over 6580015.58 frames. ], batch size: 89, lr: 3.42e-03, grad_scale: 32.0 2024-09-19 08:10:49,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=618319.3333333334, ans=0.125 2024-09-19 08:11:04,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=618366.0, ans=0.125 2024-09-19 08:11:10,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.33 vs. limit=10.0 2024-09-19 08:11:16,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=618412.6666666666, ans=0.1 2024-09-19 08:11:17,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=618412.6666666666, ans=0.125 2024-09-19 08:11:21,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=618412.6666666666, ans=0.025 2024-09-19 08:11:37,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=618459.3333333334, ans=0.125 2024-09-19 08:11:59,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=618506.0, ans=0.125 2024-09-19 08:12:00,218 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.178e+02 2.462e+02 2.810e+02 3.365e+02 7.761e+02, threshold=5.619e+02, percent-clipped=1.0 2024-09-19 08:12:01,942 INFO [train.py:1198] (1/2) Epoch 35, batch 750, loss[loss=0.2144, simple_loss=0.272, pruned_loss=0.0581, ctc_loss=0.1225, cr_loss=0.4023, over 34418.00 frames. ], tot_loss[loss=0.2076, simple_loss=0.2635, pruned_loss=0.056, ctc_loss=0.1194, cr_loss=0.3944, over 6623223.26 frames. ], batch size: 95, lr: 3.42e-03, grad_scale: 16.0 2024-09-19 08:12:03,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=618552.6666666666, ans=0.125 2024-09-19 08:12:06,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.27 vs. limit=15.0 2024-09-19 08:12:15,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=618552.6666666666, ans=0.05 2024-09-19 08:12:22,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=618599.3333333334, ans=0.0 2024-09-19 08:12:23,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=618599.3333333334, ans=0.1 2024-09-19 08:12:28,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.83 vs. limit=15.0 2024-09-19 08:13:27,983 INFO [train.py:1198] (1/2) Epoch 35, batch 800, loss[loss=0.188, simple_loss=0.2451, pruned_loss=0.04781, ctc_loss=0.1065, cr_loss=0.3487, over 34434.00 frames. ], tot_loss[loss=0.2073, simple_loss=0.2633, pruned_loss=0.05588, ctc_loss=0.1192, cr_loss=0.394, over 6659656.79 frames. ], batch size: 85, lr: 3.42e-03, grad_scale: 32.0 2024-09-19 08:13:55,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=618832.6666666666, ans=0.1 2024-09-19 08:13:57,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.33 vs. limit=6.0 2024-09-19 08:14:27,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=618926.0, ans=0.0 2024-09-19 08:14:39,352 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:14:48,733 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.918e+02 2.530e+02 2.903e+02 3.739e+02 6.277e+02, threshold=5.806e+02, percent-clipped=2.0 2024-09-19 08:14:49,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=619019.3333333334, ans=0.125 2024-09-19 08:14:50,310 INFO [train.py:1198] (1/2) Epoch 35, batch 850, loss[loss=0.2177, simple_loss=0.2803, pruned_loss=0.05679, ctc_loss=0.1246, cr_loss=0.4129, over 34382.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.2625, pruned_loss=0.05561, ctc_loss=0.1187, cr_loss=0.3925, over 6692242.14 frames. ], batch size: 103, lr: 3.42e-03, grad_scale: 32.0 2024-09-19 08:15:31,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=619112.6666666666, ans=0.2 2024-09-19 08:15:39,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=619159.3333333334, ans=0.0 2024-09-19 08:15:44,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=619159.3333333334, ans=0.125 2024-09-19 08:15:50,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-09-19 08:15:54,858 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:16:14,792 INFO [train.py:1198] (1/2) Epoch 35, batch 900, loss[loss=0.1771, simple_loss=0.2363, pruned_loss=0.0427, ctc_loss=0.09563, cr_loss=0.335, over 34436.00 frames. ], tot_loss[loss=0.2068, simple_loss=0.2627, pruned_loss=0.05567, ctc_loss=0.1189, cr_loss=0.3929, over 6699211.68 frames. ], batch size: 85, lr: 3.42e-03, grad_scale: 32.0 2024-09-19 08:16:22,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.49 vs. limit=15.0 2024-09-19 08:16:23,461 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:17:37,676 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.098e+02 2.552e+02 2.985e+02 3.768e+02 7.005e+02, threshold=5.970e+02, percent-clipped=3.0 2024-09-19 08:17:39,267 INFO [train.py:1198] (1/2) Epoch 35, batch 950, loss[loss=0.1872, simple_loss=0.2456, pruned_loss=0.04679, ctc_loss=0.1043, cr_loss=0.3581, over 34704.00 frames. ], tot_loss[loss=0.2071, simple_loss=0.263, pruned_loss=0.05584, ctc_loss=0.1192, cr_loss=0.3942, over 6703376.57 frames. ], batch size: 87, lr: 3.42e-03, grad_scale: 32.0 2024-09-19 08:17:44,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=619486.0, ans=0.1 2024-09-19 08:17:49,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=619486.0, ans=0.125 2024-09-19 08:17:55,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=619532.6666666666, ans=0.125 2024-09-19 08:18:15,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=619579.3333333334, ans=0.0 2024-09-19 08:18:19,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=619579.3333333334, ans=0.125 2024-09-19 08:18:22,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.min_positive, batch_count=619579.3333333334, ans=0.025 2024-09-19 08:18:27,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=619626.0, ans=0.1 2024-09-19 08:18:30,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-09-19 08:18:38,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=619626.0, ans=0.0 2024-09-19 08:18:45,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=619672.6666666666, ans=0.2 2024-09-19 08:19:01,628 INFO [train.py:1198] (1/2) Epoch 35, batch 1000, loss[loss=0.2, simple_loss=0.2539, pruned_loss=0.05403, ctc_loss=0.1151, cr_loss=0.3766, over 34506.00 frames. ], tot_loss[loss=0.2079, simple_loss=0.2638, pruned_loss=0.05614, ctc_loss=0.1197, cr_loss=0.3952, over 6695985.49 frames. ], batch size: 90, lr: 3.42e-03, grad_scale: 32.0 2024-09-19 08:19:08,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.87 vs. limit=22.5 2024-09-19 08:19:22,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=619766.0, ans=0.07 2024-09-19 08:19:27,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=619766.0, ans=0.1 2024-09-19 08:20:06,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=22.5 2024-09-19 08:20:26,584 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.114e+02 2.497e+02 2.984e+02 3.756e+02 7.458e+02, threshold=5.968e+02, percent-clipped=3.0 2024-09-19 08:20:26,609 INFO [train.py:1198] (1/2) Epoch 35, batch 1050, loss[loss=0.2041, simple_loss=0.2654, pruned_loss=0.0522, ctc_loss=0.1147, cr_loss=0.389, over 34567.00 frames. ], tot_loss[loss=0.2075, simple_loss=0.2633, pruned_loss=0.05601, ctc_loss=0.1195, cr_loss=0.3942, over 6705541.75 frames. ], batch size: 99, lr: 3.42e-03, grad_scale: 16.0 2024-09-19 08:20:35,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=619952.6666666666, ans=0.125 2024-09-19 08:20:45,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=619999.3333333334, ans=0.125 2024-09-19 08:21:02,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=620046.0, ans=0.125 2024-09-19 08:21:14,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2024-09-19 08:21:33,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=620139.3333333334, ans=0.125 2024-09-19 08:21:37,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=620139.3333333334, ans=0.125 2024-09-19 08:21:43,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=620139.3333333334, ans=0.1 2024-09-19 08:21:51,556 INFO [train.py:1198] (1/2) Epoch 35, batch 1100, loss[loss=0.2103, simple_loss=0.2646, pruned_loss=0.0574, ctc_loss=0.124, cr_loss=0.4087, over 34347.00 frames. ], tot_loss[loss=0.2071, simple_loss=0.263, pruned_loss=0.05584, ctc_loss=0.1192, cr_loss=0.3933, over 6718160.43 frames. ], batch size: 91, lr: 3.42e-03, grad_scale: 16.0 2024-09-19 08:22:16,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=620232.6666666666, ans=0.0 2024-09-19 08:22:20,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.60 vs. limit=22.5 2024-09-19 08:22:22,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=620279.3333333334, ans=0.0 2024-09-19 08:22:58,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=620372.6666666666, ans=0.125 2024-09-19 08:23:11,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=620372.6666666666, ans=0.2 2024-09-19 08:23:14,657 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.512e+02 2.915e+02 3.623e+02 5.255e+02, threshold=5.831e+02, percent-clipped=0.0 2024-09-19 08:23:14,678 INFO [train.py:1198] (1/2) Epoch 35, batch 1150, loss[loss=0.1958, simple_loss=0.2503, pruned_loss=0.05183, ctc_loss=0.1121, cr_loss=0.3813, over 34362.00 frames. ], tot_loss[loss=0.2072, simple_loss=0.2631, pruned_loss=0.05591, ctc_loss=0.1193, cr_loss=0.3934, over 6715783.24 frames. ], batch size: 91, lr: 3.42e-03, grad_scale: 16.0 2024-09-19 08:23:18,724 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.50 vs. limit=22.5 2024-09-19 08:23:45,375 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:23:48,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=620512.6666666666, ans=0.025 2024-09-19 08:23:58,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=620512.6666666666, ans=0.0 2024-09-19 08:24:21,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=620606.0, ans=0.0 2024-09-19 08:24:37,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=620606.0, ans=22.5 2024-09-19 08:24:41,201 INFO [train.py:1198] (1/2) Epoch 35, batch 1200, loss[loss=0.2213, simple_loss=0.2762, pruned_loss=0.06204, ctc_loss=0.129, cr_loss=0.4168, over 34597.00 frames. ], tot_loss[loss=0.208, simple_loss=0.2639, pruned_loss=0.05615, ctc_loss=0.1198, cr_loss=0.3944, over 6708236.62 frames. ], batch size: 99, lr: 3.42e-03, grad_scale: 32.0 2024-09-19 08:24:58,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=620699.3333333334, ans=0.1 2024-09-19 08:25:01,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=620699.3333333334, ans=0.125 2024-09-19 08:25:24,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=620746.0, ans=0.0 2024-09-19 08:25:25,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=620746.0, ans=0.125 2024-09-19 08:25:46,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=620839.3333333334, ans=0.125 2024-09-19 08:25:56,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=620839.3333333334, ans=0.125 2024-09-19 08:25:56,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=620839.3333333334, ans=0.0 2024-09-19 08:25:56,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=620839.3333333334, ans=0.2 2024-09-19 08:25:57,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=620839.3333333334, ans=0.025 2024-09-19 08:26:03,939 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.047e+02 2.484e+02 2.714e+02 3.340e+02 5.378e+02, threshold=5.429e+02, percent-clipped=0.0 2024-09-19 08:26:03,960 INFO [train.py:1198] (1/2) Epoch 35, batch 1250, loss[loss=0.2265, simple_loss=0.2828, pruned_loss=0.06341, ctc_loss=0.133, cr_loss=0.4196, over 34325.00 frames. ], tot_loss[loss=0.2087, simple_loss=0.2648, pruned_loss=0.0564, ctc_loss=0.1201, cr_loss=0.3955, over 6742053.70 frames. ], batch size: 107, lr: 3.42e-03, grad_scale: 32.0 2024-09-19 08:26:10,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=620886.0, ans=0.125 2024-09-19 08:26:17,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=620886.0, ans=0.125 2024-09-19 08:26:48,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=620979.3333333334, ans=0.0 2024-09-19 08:26:51,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.93 vs. limit=15.0 2024-09-19 08:26:59,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=621026.0, ans=0.125 2024-09-19 08:27:00,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=621026.0, ans=0.125 2024-09-19 08:27:09,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=621072.6666666666, ans=0.0 2024-09-19 08:27:11,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=15.0 2024-09-19 08:27:13,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621072.6666666666, ans=0.1 2024-09-19 08:27:26,791 INFO [train.py:1198] (1/2) Epoch 35, batch 1300, loss[loss=0.2125, simple_loss=0.276, pruned_loss=0.05451, ctc_loss=0.121, cr_loss=0.3924, over 33060.00 frames. ], tot_loss[loss=0.2082, simple_loss=0.2643, pruned_loss=0.05622, ctc_loss=0.1198, cr_loss=0.3946, over 6744653.91 frames. ], batch size: 130, lr: 3.42e-03, grad_scale: 32.0 2024-09-19 08:27:31,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.60 vs. limit=15.0 2024-09-19 08:27:39,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=621119.3333333334, ans=0.2 2024-09-19 08:27:44,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=621166.0, ans=0.125 2024-09-19 08:27:46,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=22.5 2024-09-19 08:28:07,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=621212.6666666666, ans=0.0 2024-09-19 08:28:25,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621259.3333333334, ans=0.1 2024-09-19 08:28:33,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=621259.3333333334, ans=0.0 2024-09-19 08:28:48,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-09-19 08:28:53,244 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.060e+02 2.585e+02 2.962e+02 3.576e+02 6.373e+02, threshold=5.923e+02, percent-clipped=2.0 2024-09-19 08:28:53,266 INFO [train.py:1198] (1/2) Epoch 35, batch 1350, loss[loss=0.2183, simple_loss=0.2765, pruned_loss=0.05921, ctc_loss=0.1244, cr_loss=0.4187, over 34536.00 frames. ], tot_loss[loss=0.2079, simple_loss=0.264, pruned_loss=0.05608, ctc_loss=0.1196, cr_loss=0.3944, over 6764008.14 frames. ], batch size: 94, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:29:22,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=621399.3333333334, ans=0.125 2024-09-19 08:29:51,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=621492.6666666666, ans=0.09899494936611666 2024-09-19 08:30:15,670 INFO [train.py:1198] (1/2) Epoch 35, batch 1400, loss[loss=0.1907, simple_loss=0.2429, pruned_loss=0.05084, ctc_loss=0.1084, cr_loss=0.3798, over 34281.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2635, pruned_loss=0.05588, ctc_loss=0.1193, cr_loss=0.3935, over 6776698.99 frames. ], batch size: 80, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:30:25,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.49 vs. limit=15.0 2024-09-19 08:30:36,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=621632.6666666666, ans=0.0 2024-09-19 08:30:47,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=621679.3333333334, ans=0.125 2024-09-19 08:31:00,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=621679.3333333334, ans=0.125 2024-09-19 08:31:06,500 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2024-09-19 08:31:13,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621726.0, ans=0.1 2024-09-19 08:31:17,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=621726.0, ans=0.1 2024-09-19 08:31:21,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=621772.6666666666, ans=0.125 2024-09-19 08:31:23,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=621772.6666666666, ans=0.125 2024-09-19 08:31:29,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=621772.6666666666, ans=0.07 2024-09-19 08:31:30,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=621772.6666666666, ans=0.0 2024-09-19 08:31:39,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=621819.3333333334, ans=0.0 2024-09-19 08:31:40,500 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.095e+02 2.581e+02 3.094e+02 3.679e+02 5.443e+02, threshold=6.188e+02, percent-clipped=0.0 2024-09-19 08:31:40,520 INFO [train.py:1198] (1/2) Epoch 35, batch 1450, loss[loss=0.2173, simple_loss=0.2746, pruned_loss=0.05927, ctc_loss=0.1252, cr_loss=0.4074, over 34472.00 frames. ], tot_loss[loss=0.2079, simple_loss=0.2642, pruned_loss=0.05597, ctc_loss=0.1196, cr_loss=0.3948, over 6773568.84 frames. ], batch size: 110, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:31:51,401 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=15.0 2024-09-19 08:31:52,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=621819.3333333334, ans=0.0 2024-09-19 08:32:33,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=621959.3333333334, ans=0.025 2024-09-19 08:32:36,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=621959.3333333334, ans=0.025 2024-09-19 08:32:50,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622006.0, ans=0.1 2024-09-19 08:32:57,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2024-09-19 08:33:04,649 INFO [train.py:1198] (1/2) Epoch 35, batch 1500, loss[loss=0.2098, simple_loss=0.2731, pruned_loss=0.05355, ctc_loss=0.1178, cr_loss=0.3967, over 34462.00 frames. ], tot_loss[loss=0.2084, simple_loss=0.2646, pruned_loss=0.05615, ctc_loss=0.12, cr_loss=0.395, over 6774455.44 frames. ], batch size: 100, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:33:11,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=622052.6666666666, ans=0.125 2024-09-19 08:33:29,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=622099.3333333334, ans=0.0 2024-09-19 08:33:39,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=622146.0, ans=0.1 2024-09-19 08:34:01,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=622192.6666666666, ans=0.125 2024-09-19 08:34:05,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=622192.6666666666, ans=0.1 2024-09-19 08:34:10,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=622239.3333333334, ans=0.125 2024-09-19 08:34:11,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=622239.3333333334, ans=10.0 2024-09-19 08:34:19,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.52 vs. limit=22.5 2024-09-19 08:34:22,340 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.73 vs. limit=10.0 2024-09-19 08:34:24,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=622239.3333333334, ans=0.2 2024-09-19 08:34:27,743 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.133e+02 2.475e+02 2.919e+02 3.525e+02 8.032e+02, threshold=5.838e+02, percent-clipped=5.0 2024-09-19 08:34:27,781 INFO [train.py:1198] (1/2) Epoch 35, batch 1550, loss[loss=0.2235, simple_loss=0.2804, pruned_loss=0.06274, ctc_loss=0.1256, cr_loss=0.4008, over 34418.00 frames. ], tot_loss[loss=0.2083, simple_loss=0.2645, pruned_loss=0.05617, ctc_loss=0.1199, cr_loss=0.3942, over 6745098.46 frames. ], batch size: 105, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:35:35,199 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=12.0 2024-09-19 08:35:48,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.34 vs. limit=15.0 2024-09-19 08:35:54,154 INFO [train.py:1198] (1/2) Epoch 35, batch 1600, loss[loss=0.2231, simple_loss=0.2751, pruned_loss=0.06376, ctc_loss=0.1308, cr_loss=0.4343, over 34560.00 frames. ], tot_loss[loss=0.2083, simple_loss=0.2644, pruned_loss=0.05622, ctc_loss=0.1201, cr_loss=0.3945, over 6723979.81 frames. ], batch size: 99, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:35:54,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=622519.3333333334, ans=0.125 2024-09-19 08:35:56,917 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.00 vs. limit=15.0 2024-09-19 08:36:39,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=622612.6666666666, ans=0.2 2024-09-19 08:36:41,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=622612.6666666666, ans=0.0 2024-09-19 08:36:53,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622659.3333333334, ans=0.1 2024-09-19 08:37:02,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=622706.0, ans=0.0 2024-09-19 08:37:16,926 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.135e+02 2.496e+02 2.805e+02 3.253e+02 6.677e+02, threshold=5.611e+02, percent-clipped=2.0 2024-09-19 08:37:16,947 INFO [train.py:1198] (1/2) Epoch 35, batch 1650, loss[loss=0.2129, simple_loss=0.2708, pruned_loss=0.0569, ctc_loss=0.1218, cr_loss=0.4191, over 34367.00 frames. ], tot_loss[loss=0.208, simple_loss=0.264, pruned_loss=0.05611, ctc_loss=0.1199, cr_loss=0.3939, over 6718019.13 frames. ], batch size: 103, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:37:19,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=622752.6666666666, ans=0.125 2024-09-19 08:37:22,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=622752.6666666666, ans=0.5 2024-09-19 08:37:47,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=622799.3333333334, ans=0.125 2024-09-19 08:37:50,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=622846.0, ans=0.125 2024-09-19 08:37:54,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=622846.0, ans=0.0 2024-09-19 08:37:55,443 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.97 vs. limit=15.0 2024-09-19 08:38:11,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=622892.6666666666, ans=0.0 2024-09-19 08:38:25,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.61 vs. limit=22.5 2024-09-19 08:38:31,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.61 vs. limit=12.0 2024-09-19 08:38:39,134 INFO [train.py:1198] (1/2) Epoch 35, batch 1700, loss[loss=0.1672, simple_loss=0.2253, pruned_loss=0.03967, ctc_loss=0.08693, cr_loss=0.3098, over 34348.00 frames. ], tot_loss[loss=0.2077, simple_loss=0.2638, pruned_loss=0.056, ctc_loss=0.1196, cr_loss=0.3935, over 6743719.54 frames. ], batch size: 80, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:39:09,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=623032.6666666666, ans=0.0 2024-09-19 08:39:14,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=623079.3333333334, ans=0.025 2024-09-19 08:39:37,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=623126.0, ans=0.025 2024-09-19 08:39:56,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=623172.6666666666, ans=0.125 2024-09-19 08:39:59,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=623172.6666666666, ans=10.0 2024-09-19 08:39:59,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=623172.6666666666, ans=0.1 2024-09-19 08:40:05,638 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 2.420e+02 2.784e+02 3.372e+02 7.317e+02, threshold=5.568e+02, percent-clipped=3.0 2024-09-19 08:40:05,659 INFO [train.py:1198] (1/2) Epoch 35, batch 1750, loss[loss=0.1707, simple_loss=0.227, pruned_loss=0.0415, ctc_loss=0.09181, cr_loss=0.324, over 34173.00 frames. ], tot_loss[loss=0.2073, simple_loss=0.2634, pruned_loss=0.05578, ctc_loss=0.1192, cr_loss=0.3928, over 6753252.86 frames. ], batch size: 78, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:40:17,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=623219.3333333334, ans=22.5 2024-09-19 08:40:29,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=623266.0, ans=0.025 2024-09-19 08:40:46,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.92 vs. limit=15.0 2024-09-19 08:41:22,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=623406.0, ans=0.125 2024-09-19 08:41:27,499 INFO [train.py:1198] (1/2) Epoch 35, batch 1800, loss[loss=0.2097, simple_loss=0.2698, pruned_loss=0.0551, ctc_loss=0.1175, cr_loss=0.3962, over 34691.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2636, pruned_loss=0.05581, ctc_loss=0.1193, cr_loss=0.3931, over 6756574.57 frames. ], batch size: 97, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:41:27,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=623452.6666666666, ans=0.025 2024-09-19 08:41:31,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=623452.6666666666, ans=0.0 2024-09-19 08:41:43,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.03 vs. limit=6.0 2024-09-19 08:41:49,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=623499.3333333334, ans=0.1 2024-09-19 08:42:04,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=623546.0, ans=0.0 2024-09-19 08:42:19,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=623592.6666666666, ans=0.125 2024-09-19 08:42:32,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=623639.3333333334, ans=0.1 2024-09-19 08:42:39,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=623639.3333333334, ans=0.125 2024-09-19 08:42:41,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=623639.3333333334, ans=0.125 2024-09-19 08:42:52,756 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.160e+02 2.602e+02 3.342e+02 4.285e+02 7.959e+02, threshold=6.685e+02, percent-clipped=8.0 2024-09-19 08:42:52,782 INFO [train.py:1198] (1/2) Epoch 35, batch 1850, loss[loss=0.2222, simple_loss=0.2849, pruned_loss=0.05926, ctc_loss=0.1267, cr_loss=0.3897, over 34477.00 frames. ], tot_loss[loss=0.2075, simple_loss=0.2637, pruned_loss=0.05585, ctc_loss=0.1193, cr_loss=0.393, over 6763349.09 frames. ], batch size: 100, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:42:54,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=623686.0, ans=0.035 2024-09-19 08:42:57,151 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.27 vs. limit=15.0 2024-09-19 08:43:16,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=623732.6666666666, ans=0.0 2024-09-19 08:43:42,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=623826.0, ans=0.1 2024-09-19 08:43:42,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=623826.0, ans=0.07 2024-09-19 08:43:50,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=623826.0, ans=0.0 2024-09-19 08:44:10,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=623872.6666666666, ans=0.125 2024-09-19 08:44:16,506 INFO [train.py:1198] (1/2) Epoch 35, batch 1900, loss[loss=0.2117, simple_loss=0.2731, pruned_loss=0.05562, ctc_loss=0.1166, cr_loss=0.3916, over 34374.00 frames. ], tot_loss[loss=0.2081, simple_loss=0.2644, pruned_loss=0.05606, ctc_loss=0.1197, cr_loss=0.3941, over 6772763.42 frames. ], batch size: 103, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:44:29,178 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2024-09-19 08:44:56,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=624012.6666666666, ans=0.1 2024-09-19 08:45:01,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=624012.6666666666, ans=0.1 2024-09-19 08:45:05,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=624059.3333333334, ans=0.125 2024-09-19 08:45:05,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=624059.3333333334, ans=0.1 2024-09-19 08:45:28,506 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.87 vs. limit=22.5 2024-09-19 08:45:39,004 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.167e+02 2.494e+02 2.860e+02 3.628e+02 5.918e+02, threshold=5.720e+02, percent-clipped=0.0 2024-09-19 08:45:39,029 INFO [train.py:1198] (1/2) Epoch 35, batch 1950, loss[loss=0.2002, simple_loss=0.2521, pruned_loss=0.0547, ctc_loss=0.1166, cr_loss=0.3886, over 34332.00 frames. ], tot_loss[loss=0.2091, simple_loss=0.2655, pruned_loss=0.05644, ctc_loss=0.1203, cr_loss=0.3958, over 6789439.20 frames. ], batch size: 91, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:45:39,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=624152.6666666666, ans=0.125 2024-09-19 08:46:20,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=624246.0, ans=0.0 2024-09-19 08:46:25,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2024-09-19 08:46:29,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=624292.6666666666, ans=0.125 2024-09-19 08:47:05,302 INFO [train.py:1198] (1/2) Epoch 35, batch 2000, loss[loss=0.1738, simple_loss=0.2301, pruned_loss=0.04256, ctc_loss=0.09566, cr_loss=0.3311, over 34180.00 frames. ], tot_loss[loss=0.2091, simple_loss=0.2655, pruned_loss=0.05637, ctc_loss=0.1203, cr_loss=0.3953, over 6765414.85 frames. ], batch size: 78, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:47:05,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=624386.0, ans=0.125 2024-09-19 08:47:50,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=624479.3333333334, ans=0.125 2024-09-19 08:47:55,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=624526.0, ans=0.125 2024-09-19 08:48:20,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=624572.6666666666, ans=0.125 2024-09-19 08:48:28,327 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.102e+02 2.507e+02 2.819e+02 3.464e+02 7.006e+02, threshold=5.638e+02, percent-clipped=6.0 2024-09-19 08:48:28,349 INFO [train.py:1198] (1/2) Epoch 35, batch 2050, loss[loss=0.1833, simple_loss=0.2419, pruned_loss=0.04535, ctc_loss=0.1014, cr_loss=0.3436, over 34480.00 frames. ], tot_loss[loss=0.2083, simple_loss=0.2646, pruned_loss=0.05614, ctc_loss=0.1197, cr_loss=0.3938, over 6757256.98 frames. ], batch size: 82, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:48:38,634 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:48:43,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=624666.0, ans=0.2 2024-09-19 08:48:49,108 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.70 vs. limit=22.5 2024-09-19 08:49:26,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.07 vs. limit=15.0 2024-09-19 08:49:29,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=624759.3333333334, ans=0.125 2024-09-19 08:49:32,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=624806.0, ans=0.0 2024-09-19 08:49:36,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=624806.0, ans=0.125 2024-09-19 08:49:42,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.20 vs. limit=15.0 2024-09-19 08:49:50,819 INFO [train.py:1198] (1/2) Epoch 35, batch 2100, loss[loss=0.2031, simple_loss=0.2596, pruned_loss=0.05386, ctc_loss=0.1138, cr_loss=0.4046, over 34535.00 frames. ], tot_loss[loss=0.2081, simple_loss=0.2643, pruned_loss=0.0561, ctc_loss=0.1197, cr_loss=0.3941, over 6769515.64 frames. ], batch size: 94, lr: 3.41e-03, grad_scale: 32.0 2024-09-19 08:49:54,801 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2024-09-19 08:50:04,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=624852.6666666666, ans=0.1 2024-09-19 08:50:19,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=624899.3333333334, ans=0.0 2024-09-19 08:50:36,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=624946.0, ans=0.0 2024-09-19 08:50:39,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=624946.0, ans=0.125 2024-09-19 08:50:58,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=625039.3333333334, ans=0.1 2024-09-19 08:51:16,368 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.125e+02 2.454e+02 2.708e+02 3.267e+02 6.987e+02, threshold=5.416e+02, percent-clipped=3.0 2024-09-19 08:51:16,389 INFO [train.py:1198] (1/2) Epoch 35, batch 2150, loss[loss=0.1991, simple_loss=0.2552, pruned_loss=0.05257, ctc_loss=0.1125, cr_loss=0.3837, over 34354.00 frames. ], tot_loss[loss=0.2071, simple_loss=0.2632, pruned_loss=0.0557, ctc_loss=0.1189, cr_loss=0.3925, over 6788344.78 frames. ], batch size: 91, lr: 3.40e-03, grad_scale: 32.0 2024-09-19 08:52:00,054 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:52:17,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=625226.0, ans=0.125 2024-09-19 08:52:22,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=625272.6666666666, ans=0.0 2024-09-19 08:52:29,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=625272.6666666666, ans=10.0 2024-09-19 08:52:34,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=625272.6666666666, ans=0.0 2024-09-19 08:52:35,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=625272.6666666666, ans=0.0 2024-09-19 08:52:38,924 INFO [train.py:1198] (1/2) Epoch 35, batch 2200, loss[loss=0.2128, simple_loss=0.275, pruned_loss=0.05575, ctc_loss=0.1186, cr_loss=0.3859, over 34450.00 frames. ], tot_loss[loss=0.2072, simple_loss=0.2635, pruned_loss=0.05572, ctc_loss=0.119, cr_loss=0.3926, over 6782626.52 frames. ], batch size: 100, lr: 3.40e-03, grad_scale: 32.0 2024-09-19 08:52:42,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625319.3333333334, ans=0.1 2024-09-19 08:52:54,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625366.0, ans=0.1 2024-09-19 08:52:54,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625366.0, ans=0.1 2024-09-19 08:52:56,075 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:52:56,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=625366.0, ans=0.125 2024-09-19 08:53:12,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=625412.6666666666, ans=0.125 2024-09-19 08:53:41,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=625459.3333333334, ans=0.025 2024-09-19 08:53:46,289 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 08:53:46,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.38 vs. limit=22.5 2024-09-19 08:53:51,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=15.0 2024-09-19 08:54:02,052 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.578e+02 3.132e+02 4.073e+02 5.270e+02, threshold=6.265e+02, percent-clipped=0.0 2024-09-19 08:54:02,072 INFO [train.py:1198] (1/2) Epoch 35, batch 2250, loss[loss=0.204, simple_loss=0.2633, pruned_loss=0.05357, ctc_loss=0.1136, cr_loss=0.3715, over 34381.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2632, pruned_loss=0.05561, ctc_loss=0.1188, cr_loss=0.3924, over 6780571.61 frames. ], batch size: 95, lr: 3.40e-03, grad_scale: 32.0 2024-09-19 08:54:17,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=625552.6666666666, ans=0.125 2024-09-19 08:54:26,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=625599.3333333334, ans=0.2 2024-09-19 08:55:25,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=625739.3333333334, ans=0.125 2024-09-19 08:55:28,035 INFO [train.py:1198] (1/2) Epoch 35, batch 2300, loss[loss=0.1931, simple_loss=0.241, pruned_loss=0.05343, ctc_loss=0.1162, cr_loss=0.377, over 34289.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.2625, pruned_loss=0.05548, ctc_loss=0.1185, cr_loss=0.3912, over 6764152.89 frames. ], batch size: 83, lr: 3.40e-03, grad_scale: 32.0 2024-09-19 08:56:17,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=625926.0, ans=0.1 2024-09-19 08:56:36,543 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.35 vs. limit=22.5 2024-09-19 08:56:45,162 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.12 vs. limit=15.0 2024-09-19 08:56:49,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=626019.3333333334, ans=0.02 2024-09-19 08:56:50,289 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.520e+02 2.914e+02 3.726e+02 6.197e+02, threshold=5.828e+02, percent-clipped=0.0 2024-09-19 08:56:50,315 INFO [train.py:1198] (1/2) Epoch 35, batch 2350, loss[loss=0.2142, simple_loss=0.2717, pruned_loss=0.05794, ctc_loss=0.1214, cr_loss=0.4163, over 34717.00 frames. ], tot_loss[loss=0.2069, simple_loss=0.2628, pruned_loss=0.0557, ctc_loss=0.1189, cr_loss=0.3928, over 6771171.00 frames. ], batch size: 97, lr: 3.40e-03, grad_scale: 32.0 2024-09-19 08:56:52,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=626019.3333333334, ans=0.5 2024-09-19 08:57:10,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=626066.0, ans=0.1 2024-09-19 08:57:13,411 INFO [scaling.py:801] (1/2) Caught exception in Balancer backward: CUDA out of memory. Tried to allocate 3.77 GiB. GPU 1 has a total capacity of 79.17 GiB of which 3.66 GiB is free. Process 39810 has 75.51 GiB memory in use. Of the allocated memory 29.31 GiB is allocated by PyTorch, and 43.81 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables), size=[196, 384, 708, 19], will continue. 2024-09-19 08:57:22,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=626112.6666666666, ans=0.1 2024-09-19 08:57:37,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=626112.6666666666, ans=0.125 2024-09-19 08:58:17,183 INFO [train.py:1198] (1/2) Epoch 35, batch 2400, loss[loss=0.1947, simple_loss=0.2506, pruned_loss=0.05063, ctc_loss=0.1119, cr_loss=0.3797, over 34581.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2635, pruned_loss=0.0559, ctc_loss=0.1193, cr_loss=0.3935, over 6775457.15 frames. ], batch size: 89, lr: 3.40e-03, grad_scale: 32.0 2024-09-19 08:59:04,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=626346.0, ans=0.125 2024-09-19 08:59:05,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=626392.6666666666, ans=0.0 2024-09-19 08:59:07,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=626392.6666666666, ans=10.0 2024-09-19 08:59:10,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=626392.6666666666, ans=0.0 2024-09-19 08:59:14,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=626392.6666666666, ans=0.1 2024-09-19 08:59:14,868 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.36 vs. limit=22.5 2024-09-19 08:59:40,569 INFO [train.py:1198] (1/2) Epoch 35, batch 2450, loss[loss=0.2031, simple_loss=0.2615, pruned_loss=0.05296, ctc_loss=0.1171, cr_loss=0.3839, over 34424.00 frames. ], tot_loss[loss=0.2083, simple_loss=0.2644, pruned_loss=0.05624, ctc_loss=0.12, cr_loss=0.395, over 6750207.96 frames. ], batch size: 95, lr: 3.40e-03, grad_scale: 16.0 2024-09-19 08:59:40,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=626486.0, ans=0.2 2024-09-19 08:59:42,190 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.565e+02 2.939e+02 3.762e+02 8.760e+02, threshold=5.878e+02, percent-clipped=3.0 2024-09-19 09:00:08,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=626532.6666666666, ans=0.125 2024-09-19 09:00:56,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=626672.6666666666, ans=0.0 2024-09-19 09:01:02,527 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=12.0 2024-09-19 09:01:02,986 INFO [train.py:1198] (1/2) Epoch 35, batch 2500, loss[loss=0.2227, simple_loss=0.2815, pruned_loss=0.06067, ctc_loss=0.1303, cr_loss=0.4111, over 34428.00 frames. ], tot_loss[loss=0.2086, simple_loss=0.2645, pruned_loss=0.05642, ctc_loss=0.1204, cr_loss=0.3958, over 6762162.07 frames. ], batch size: 100, lr: 3.40e-03, grad_scale: 16.0 2024-09-19 09:01:04,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=626719.3333333334, ans=0.0 2024-09-19 09:01:07,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2024-09-19 09:01:07,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.90 vs. limit=12.0 2024-09-19 09:01:11,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=12.0 2024-09-19 09:01:13,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=626719.3333333334, ans=0.125 2024-09-19 09:01:41,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=626812.6666666666, ans=15.0 2024-09-19 09:01:44,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.77 vs. limit=15.0 2024-09-19 09:01:45,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=626812.6666666666, ans=0.2 2024-09-19 09:02:19,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=626906.0, ans=0.0 2024-09-19 09:02:21,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=626906.0, ans=0.0 2024-09-19 09:02:24,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=626906.0, ans=0.125 2024-09-19 09:02:29,386 INFO [train.py:1198] (1/2) Epoch 35, batch 2550, loss[loss=0.1853, simple_loss=0.2402, pruned_loss=0.04802, ctc_loss=0.102, cr_loss=0.3483, over 34200.00 frames. ], tot_loss[loss=0.2085, simple_loss=0.2645, pruned_loss=0.05637, ctc_loss=0.1202, cr_loss=0.396, over 6766209.54 frames. ], batch size: 78, lr: 3.40e-03, grad_scale: 16.0 2024-09-19 09:02:30,961 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.138e+02 2.525e+02 2.874e+02 3.800e+02 6.648e+02, threshold=5.748e+02, percent-clipped=2.0 2024-09-19 09:02:36,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.05 vs. limit=15.0 2024-09-19 09:02:39,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.32 vs. limit=15.0 2024-09-19 09:02:41,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=626952.6666666666, ans=0.125 2024-09-19 09:02:55,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=626999.3333333334, ans=0.09899494936611666 2024-09-19 09:02:57,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=626999.3333333334, ans=0.2 2024-09-19 09:03:05,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=627046.0, ans=0.1 2024-09-19 09:03:13,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=627046.0, ans=0.07 2024-09-19 09:03:27,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=627092.6666666666, ans=0.1 2024-09-19 09:03:49,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.68 vs. limit=15.0 2024-09-19 09:03:52,179 INFO [train.py:1198] (1/2) Epoch 35, batch 2600, loss[loss=0.2026, simple_loss=0.2561, pruned_loss=0.05497, ctc_loss=0.1169, cr_loss=0.3923, over 34363.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2649, pruned_loss=0.05651, ctc_loss=0.1205, cr_loss=0.3966, over 6762199.82 frames. ], batch size: 91, lr: 3.40e-03, grad_scale: 16.0 2024-09-19 09:04:07,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=627232.6666666666, ans=0.05 2024-09-19 09:04:09,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=627232.6666666666, ans=10.0 2024-09-19 09:04:35,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=627279.3333333334, ans=0.125 2024-09-19 09:04:47,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=627326.0, ans=0.0 2024-09-19 09:04:47,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=627326.0, ans=0.125 2024-09-19 09:05:14,860 INFO [train.py:1198] (1/2) Epoch 35, batch 2650, loss[loss=0.2215, simple_loss=0.2786, pruned_loss=0.06121, ctc_loss=0.1289, cr_loss=0.4053, over 34288.00 frames. ], tot_loss[loss=0.2086, simple_loss=0.2648, pruned_loss=0.05628, ctc_loss=0.1201, cr_loss=0.3955, over 6769517.85 frames. ], batch size: 117, lr: 3.40e-03, grad_scale: 16.0 2024-09-19 09:05:16,524 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.164e+02 2.506e+02 2.737e+02 3.545e+02 6.141e+02, threshold=5.473e+02, percent-clipped=2.0 2024-09-19 09:05:30,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.46 vs. limit=10.0 2024-09-19 09:05:36,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=627466.0, ans=0.125 2024-09-19 09:05:38,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=627466.0, ans=0.125 2024-09-19 09:06:29,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.58 vs. limit=15.0 2024-09-19 09:06:33,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=627606.0, ans=0.0 2024-09-19 09:06:41,255 INFO [train.py:1198] (1/2) Epoch 35, batch 2700, loss[loss=0.2156, simple_loss=0.2777, pruned_loss=0.05591, ctc_loss=0.1252, cr_loss=0.4135, over 34614.00 frames. ], tot_loss[loss=0.2088, simple_loss=0.2649, pruned_loss=0.05639, ctc_loss=0.1203, cr_loss=0.3956, over 6765233.17 frames. ], batch size: 102, lr: 3.40e-03, grad_scale: 16.0 2024-09-19 09:06:43,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=627652.6666666666, ans=0.125 2024-09-19 09:07:06,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.63 vs. limit=12.0 2024-09-19 09:07:10,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.04 vs. limit=12.0 2024-09-19 09:07:13,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=627746.0, ans=0.07 2024-09-19 09:07:29,788 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:08:04,104 INFO [train.py:1198] (1/2) Epoch 35, batch 2750, loss[loss=0.1966, simple_loss=0.2518, pruned_loss=0.05178, ctc_loss=0.1132, cr_loss=0.3808, over 34634.00 frames. ], tot_loss[loss=0.2077, simple_loss=0.2637, pruned_loss=0.05598, ctc_loss=0.1196, cr_loss=0.3933, over 6762836.41 frames. ], batch size: 88, lr: 3.40e-03, grad_scale: 16.0 2024-09-19 09:08:05,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.083e+02 2.526e+02 2.992e+02 3.602e+02 5.797e+02, threshold=5.985e+02, percent-clipped=1.0 2024-09-19 09:08:06,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=627886.0, ans=0.125 2024-09-19 09:08:21,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.38 vs. limit=15.0 2024-09-19 09:08:24,926 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.56 vs. limit=12.0 2024-09-19 09:08:57,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=628026.0, ans=0.125 2024-09-19 09:09:28,624 INFO [train.py:1198] (1/2) Epoch 35, batch 2800, loss[loss=0.2346, simple_loss=0.2805, pruned_loss=0.07108, ctc_loss=0.1475, cr_loss=0.4244, over 23283.00 frames. ], tot_loss[loss=0.2081, simple_loss=0.264, pruned_loss=0.05623, ctc_loss=0.1201, cr_loss=0.3941, over 6740926.96 frames. ], batch size: 245, lr: 3.40e-03, grad_scale: 32.0 2024-09-19 09:09:30,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=628119.3333333334, ans=0.1 2024-09-19 09:09:42,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=628119.3333333334, ans=0.125 2024-09-19 09:09:54,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=628166.0, ans=0.125 2024-09-19 09:09:54,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=628166.0, ans=0.125 2024-09-19 09:10:42,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=628306.0, ans=0.125 2024-09-19 09:10:54,027 INFO [train.py:1198] (1/2) Epoch 35, batch 2850, loss[loss=0.2125, simple_loss=0.267, pruned_loss=0.05869, ctc_loss=0.1243, cr_loss=0.3916, over 34476.00 frames. ], tot_loss[loss=0.2087, simple_loss=0.2645, pruned_loss=0.05647, ctc_loss=0.1205, cr_loss=0.3955, over 6723982.93 frames. ], batch size: 90, lr: 3.40e-03, grad_scale: 32.0 2024-09-19 09:10:54,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=628352.6666666666, ans=0.0 2024-09-19 09:10:55,694 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.125e+02 2.533e+02 3.011e+02 3.545e+02 6.399e+02, threshold=6.022e+02, percent-clipped=2.0 2024-09-19 09:11:17,704 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:11:24,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=628399.3333333334, ans=0.0 2024-09-19 09:11:32,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=628446.0, ans=0.09899494936611666 2024-09-19 09:11:42,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=628492.6666666666, ans=0.125 2024-09-19 09:11:48,852 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:12:16,768 INFO [train.py:1198] (1/2) Epoch 35, batch 2900, loss[loss=0.2033, simple_loss=0.2611, pruned_loss=0.05341, ctc_loss=0.1148, cr_loss=0.3894, over 34543.00 frames. ], tot_loss[loss=0.2098, simple_loss=0.2657, pruned_loss=0.05686, ctc_loss=0.1214, cr_loss=0.3984, over 6753988.94 frames. ], batch size: 94, lr: 3.40e-03, grad_scale: 32.0 2024-09-19 09:12:17,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=628586.0, ans=0.0 2024-09-19 09:12:44,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=628632.6666666666, ans=0.125 2024-09-19 09:12:57,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=628679.3333333334, ans=0.125 2024-09-19 09:13:43,736 INFO [train.py:1198] (1/2) Epoch 35, batch 2950, loss[loss=0.1997, simple_loss=0.2533, pruned_loss=0.05445, ctc_loss=0.1121, cr_loss=0.3708, over 34655.00 frames. ], tot_loss[loss=0.2085, simple_loss=0.2644, pruned_loss=0.05638, ctc_loss=0.1203, cr_loss=0.3958, over 6747999.60 frames. ], batch size: 88, lr: 3.39e-03, grad_scale: 32.0 2024-09-19 09:13:45,374 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.120e+02 2.588e+02 2.913e+02 4.107e+02 7.214e+02, threshold=5.827e+02, percent-clipped=1.0 2024-09-19 09:13:50,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=628819.3333333334, ans=0.0 2024-09-19 09:14:08,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=628866.0, ans=0.125 2024-09-19 09:14:25,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=628912.6666666666, ans=0.0 2024-09-19 09:14:48,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=629006.0, ans=0.5 2024-09-19 09:15:02,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=629006.0, ans=0.0 2024-09-19 09:15:06,865 INFO [train.py:1198] (1/2) Epoch 35, batch 3000, loss[loss=0.2058, simple_loss=0.2632, pruned_loss=0.05459, ctc_loss=0.117, cr_loss=0.3974, over 34529.00 frames. ], tot_loss[loss=0.2081, simple_loss=0.2641, pruned_loss=0.05615, ctc_loss=0.1199, cr_loss=0.395, over 6749017.88 frames. ], batch size: 94, lr: 3.39e-03, grad_scale: 32.0 2024-09-19 09:15:06,865 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 09:15:23,743 INFO [train.py:1230] (1/2) Epoch 35, validation: loss=0.1489, simple_loss=0.2431, pruned_loss=0.0234, ctc_loss=0.03985, cr_loss=2.165e-14, over 944034.00 frames. 2024-09-19 09:15:23,744 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 09:15:36,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=629052.6666666666, ans=0.125 2024-09-19 09:16:21,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=629192.6666666666, ans=0.125 2024-09-19 09:16:47,314 INFO [train.py:1198] (1/2) Epoch 35, batch 3050, loss[loss=0.2029, simple_loss=0.2564, pruned_loss=0.05532, ctc_loss=0.1183, cr_loss=0.3776, over 34585.00 frames. ], tot_loss[loss=0.2088, simple_loss=0.2648, pruned_loss=0.05645, ctc_loss=0.1204, cr_loss=0.3963, over 6741974.69 frames. ], batch size: 89, lr: 3.39e-03, grad_scale: 32.0 2024-09-19 09:16:48,851 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.943e+02 2.417e+02 2.615e+02 3.156e+02 6.233e+02, threshold=5.229e+02, percent-clipped=1.0 2024-09-19 09:16:49,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=629286.0, ans=0.0 2024-09-19 09:16:55,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=629286.0, ans=0.07 2024-09-19 09:17:05,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=629332.6666666666, ans=0.125 2024-09-19 09:17:15,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=629332.6666666666, ans=0.0 2024-09-19 09:17:27,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=629379.3333333334, ans=15.0 2024-09-19 09:17:45,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=629426.0, ans=0.0 2024-09-19 09:18:09,729 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.93 vs. limit=22.5 2024-09-19 09:18:10,242 INFO [train.py:1198] (1/2) Epoch 35, batch 3100, loss[loss=0.2279, simple_loss=0.2817, pruned_loss=0.06495, ctc_loss=0.1356, cr_loss=0.4284, over 34179.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2647, pruned_loss=0.05663, ctc_loss=0.1207, cr_loss=0.3969, over 6742372.25 frames. ], batch size: 117, lr: 3.39e-03, grad_scale: 8.0 2024-09-19 09:18:24,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=629566.0, ans=0.09899494936611666 2024-09-19 09:18:33,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.08 vs. limit=15.0 2024-09-19 09:18:52,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=629612.6666666666, ans=0.05 2024-09-19 09:18:57,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=629659.3333333334, ans=0.2 2024-09-19 09:19:09,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=629659.3333333334, ans=0.125 2024-09-19 09:19:16,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=629706.0, ans=0.0 2024-09-19 09:19:17,277 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=15.0 2024-09-19 09:19:23,492 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:19:31,382 INFO [train.py:1198] (1/2) Epoch 35, batch 3150, loss[loss=0.2189, simple_loss=0.2781, pruned_loss=0.05913, ctc_loss=0.1259, cr_loss=0.4061, over 33894.00 frames. ], tot_loss[loss=0.2088, simple_loss=0.2646, pruned_loss=0.05652, ctc_loss=0.1206, cr_loss=0.3963, over 6747645.13 frames. ], batch size: 122, lr: 3.39e-03, grad_scale: 8.0 2024-09-19 09:19:36,218 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.483e+02 2.945e+02 3.625e+02 6.455e+02, threshold=5.889e+02, percent-clipped=6.0 2024-09-19 09:20:07,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=629846.0, ans=0.025 2024-09-19 09:20:52,487 INFO [train.py:1198] (1/2) Epoch 35, batch 3200, loss[loss=0.207, simple_loss=0.2593, pruned_loss=0.05782, ctc_loss=0.1196, cr_loss=0.3783, over 34542.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.2638, pruned_loss=0.05611, ctc_loss=0.1197, cr_loss=0.394, over 6760592.77 frames. ], batch size: 94, lr: 3.39e-03, grad_scale: 16.0 2024-09-19 09:20:59,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=629986.0, ans=0.0 2024-09-19 09:21:52,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=630126.0, ans=0.125 2024-09-19 09:21:57,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=630172.6666666666, ans=0.0 2024-09-19 09:22:13,690 INFO [train.py:1198] (1/2) Epoch 35, batch 3250, loss[loss=0.213, simple_loss=0.2725, pruned_loss=0.05668, ctc_loss=0.1207, cr_loss=0.4005, over 34667.00 frames. ], tot_loss[loss=0.2081, simple_loss=0.2642, pruned_loss=0.05611, ctc_loss=0.1198, cr_loss=0.3947, over 6770612.05 frames. ], batch size: 98, lr: 3.39e-03, grad_scale: 16.0 2024-09-19 09:22:18,625 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.112e+02 2.500e+02 2.942e+02 3.653e+02 5.416e+02, threshold=5.884e+02, percent-clipped=0.0 2024-09-19 09:22:44,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=630266.0, ans=0.125 2024-09-19 09:22:45,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=630312.6666666666, ans=0.0 2024-09-19 09:22:55,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=630312.6666666666, ans=0.125 2024-09-19 09:23:09,610 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.68 vs. limit=10.0 2024-09-19 09:23:27,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=630406.0, ans=0.125 2024-09-19 09:23:35,706 INFO [train.py:1198] (1/2) Epoch 35, batch 3300, loss[loss=0.2146, simple_loss=0.2774, pruned_loss=0.05563, ctc_loss=0.1215, cr_loss=0.4058, over 33151.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2631, pruned_loss=0.05576, ctc_loss=0.119, cr_loss=0.3928, over 6769393.34 frames. ], batch size: 130, lr: 3.39e-03, grad_scale: 16.0 2024-09-19 09:23:59,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=630499.3333333334, ans=0.0 2024-09-19 09:24:08,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=630546.0, ans=0.125 2024-09-19 09:24:21,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=630546.0, ans=0.125 2024-09-19 09:24:28,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=630592.6666666666, ans=0.125 2024-09-19 09:24:38,039 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:24:41,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=630639.3333333334, ans=0.2 2024-09-19 09:24:41,757 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.10 vs. limit=15.0 2024-09-19 09:24:58,547 INFO [train.py:1198] (1/2) Epoch 35, batch 3350, loss[loss=0.2173, simple_loss=0.2771, pruned_loss=0.05834, ctc_loss=0.1245, cr_loss=0.4009, over 33875.00 frames. ], tot_loss[loss=0.2077, simple_loss=0.2636, pruned_loss=0.05605, ctc_loss=0.1196, cr_loss=0.3938, over 6744492.57 frames. ], batch size: 122, lr: 3.39e-03, grad_scale: 16.0 2024-09-19 09:25:03,420 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.133e+02 2.463e+02 2.626e+02 3.260e+02 6.016e+02, threshold=5.253e+02, percent-clipped=1.0 2024-09-19 09:25:05,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=630686.0, ans=0.0 2024-09-19 09:25:13,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=630732.6666666666, ans=0.1 2024-09-19 09:25:47,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=630826.0, ans=0.025 2024-09-19 09:25:50,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=630826.0, ans=0.0 2024-09-19 09:26:19,254 INFO [train.py:1198] (1/2) Epoch 35, batch 3400, loss[loss=0.1782, simple_loss=0.2349, pruned_loss=0.04407, ctc_loss=0.09869, cr_loss=0.3416, over 34132.00 frames. ], tot_loss[loss=0.2079, simple_loss=0.2638, pruned_loss=0.05618, ctc_loss=0.1197, cr_loss=0.3937, over 6733844.88 frames. ], batch size: 78, lr: 3.39e-03, grad_scale: 16.0 2024-09-19 09:26:30,449 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.69 vs. limit=15.0 2024-09-19 09:26:49,463 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=7.51 vs. limit=15.0 2024-09-19 09:26:55,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=631012.6666666666, ans=0.125 2024-09-19 09:27:11,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=631059.3333333334, ans=0.0 2024-09-19 09:27:24,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=631106.0, ans=0.2 2024-09-19 09:27:41,333 INFO [train.py:1198] (1/2) Epoch 35, batch 3450, loss[loss=0.2196, simple_loss=0.2822, pruned_loss=0.05752, ctc_loss=0.1246, cr_loss=0.4285, over 33070.00 frames. ], tot_loss[loss=0.2083, simple_loss=0.2643, pruned_loss=0.05629, ctc_loss=0.1199, cr_loss=0.3944, over 6745672.39 frames. ], batch size: 130, lr: 3.39e-03, grad_scale: 16.0 2024-09-19 09:27:46,117 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.470e+02 2.836e+02 3.495e+02 6.628e+02, threshold=5.673e+02, percent-clipped=2.0 2024-09-19 09:28:10,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=631199.3333333334, ans=0.025 2024-09-19 09:28:31,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=631292.6666666666, ans=0.2 2024-09-19 09:28:41,992 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.23 vs. limit=15.0 2024-09-19 09:28:48,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=631339.3333333334, ans=0.2 2024-09-19 09:28:57,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=631339.3333333334, ans=0.125 2024-09-19 09:29:02,460 INFO [train.py:1198] (1/2) Epoch 35, batch 3500, loss[loss=0.18, simple_loss=0.2431, pruned_loss=0.04279, ctc_loss=0.09278, cr_loss=0.3214, over 34417.00 frames. ], tot_loss[loss=0.2077, simple_loss=0.2637, pruned_loss=0.05601, ctc_loss=0.1194, cr_loss=0.3931, over 6747444.40 frames. ], batch size: 85, lr: 3.39e-03, grad_scale: 16.0 2024-09-19 09:29:08,265 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.87 vs. limit=15.0 2024-09-19 09:29:32,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2024-09-19 09:29:33,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=631479.3333333334, ans=0.0 2024-09-19 09:29:42,191 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.28 vs. limit=22.5 2024-09-19 09:29:55,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=631526.0, ans=0.125 2024-09-19 09:30:02,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=631526.0, ans=0.07 2024-09-19 09:30:05,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=631572.6666666666, ans=0.07 2024-09-19 09:30:05,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2024-09-19 09:30:22,784 INFO [train.py:1198] (1/2) Epoch 35, batch 3550, loss[loss=0.2222, simple_loss=0.2845, pruned_loss=0.05873, ctc_loss=0.1269, cr_loss=0.4289, over 34391.00 frames. ], tot_loss[loss=0.2077, simple_loss=0.264, pruned_loss=0.05592, ctc_loss=0.1193, cr_loss=0.3934, over 6756953.94 frames. ], batch size: 103, lr: 3.39e-03, grad_scale: 16.0 2024-09-19 09:30:27,710 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.961e+02 2.647e+02 3.115e+02 4.126e+02 6.686e+02, threshold=6.230e+02, percent-clipped=4.0 2024-09-19 09:30:33,262 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.53 vs. limit=22.5 2024-09-19 09:30:35,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=631619.3333333334, ans=0.0 2024-09-19 09:30:39,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=631666.0, ans=0.125 2024-09-19 09:30:42,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=631666.0, ans=0.1 2024-09-19 09:30:47,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=631666.0, ans=0.1 2024-09-19 09:31:27,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=631806.0, ans=0.1 2024-09-19 09:31:44,443 INFO [train.py:1198] (1/2) Epoch 35, batch 3600, loss[loss=0.1943, simple_loss=0.2524, pruned_loss=0.05012, ctc_loss=0.1068, cr_loss=0.3656, over 34474.00 frames. ], tot_loss[loss=0.2079, simple_loss=0.2642, pruned_loss=0.056, ctc_loss=0.1195, cr_loss=0.3937, over 6766278.00 frames. ], batch size: 90, lr: 3.39e-03, grad_scale: 32.0 2024-09-19 09:31:50,176 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:32:12,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=631899.3333333334, ans=0.0 2024-09-19 09:32:17,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.44 vs. limit=15.0 2024-09-19 09:32:20,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=631946.0, ans=0.09899494936611666 2024-09-19 09:32:23,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=631946.0, ans=0.125 2024-09-19 09:32:42,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=631992.6666666666, ans=0.125 2024-09-19 09:32:54,806 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:32:58,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=632039.3333333334, ans=0.125 2024-09-19 09:33:00,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.69 vs. limit=15.0 2024-09-19 09:33:05,991 INFO [train.py:1198] (1/2) Epoch 35, batch 3650, loss[loss=0.2162, simple_loss=0.2694, pruned_loss=0.06022, ctc_loss=0.1281, cr_loss=0.4223, over 34388.00 frames. ], tot_loss[loss=0.2071, simple_loss=0.2632, pruned_loss=0.0557, ctc_loss=0.119, cr_loss=0.3924, over 6769169.79 frames. ], batch size: 110, lr: 3.39e-03, grad_scale: 32.0 2024-09-19 09:33:10,736 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.540e+02 3.286e+02 4.330e+02 7.035e+02, threshold=6.572e+02, percent-clipped=9.0 2024-09-19 09:33:12,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=632086.0, ans=0.125 2024-09-19 09:33:17,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632086.0, ans=0.1 2024-09-19 09:33:57,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=632226.0, ans=0.125 2024-09-19 09:34:26,105 INFO [train.py:1198] (1/2) Epoch 35, batch 3700, loss[loss=0.2061, simple_loss=0.2671, pruned_loss=0.05331, ctc_loss=0.1142, cr_loss=0.3888, over 34616.00 frames. ], tot_loss[loss=0.2069, simple_loss=0.2632, pruned_loss=0.05553, ctc_loss=0.1187, cr_loss=0.3918, over 6784012.02 frames. ], batch size: 102, lr: 3.39e-03, grad_scale: 32.0 2024-09-19 09:34:31,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=632319.3333333334, ans=0.125 2024-09-19 09:34:46,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.35 vs. limit=22.5 2024-09-19 09:35:01,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=632412.6666666666, ans=0.2 2024-09-19 09:35:12,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632412.6666666666, ans=0.1 2024-09-19 09:35:28,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=632459.3333333334, ans=0.125 2024-09-19 09:35:47,467 INFO [train.py:1198] (1/2) Epoch 35, batch 3750, loss[loss=0.219, simple_loss=0.2769, pruned_loss=0.05923, ctc_loss=0.129, cr_loss=0.4213, over 34270.00 frames. ], tot_loss[loss=0.2102, simple_loss=0.2665, pruned_loss=0.05688, ctc_loss=0.1212, cr_loss=0.3981, over 6785562.52 frames. ], batch size: 113, lr: 3.38e-03, grad_scale: 32.0 2024-09-19 09:35:49,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=632552.6666666666, ans=0.125 2024-09-19 09:35:52,937 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.398e+02 2.703e+02 3.066e+02 5.521e+02, threshold=5.407e+02, percent-clipped=0.0 2024-09-19 09:36:09,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=632599.3333333334, ans=0.125 2024-09-19 09:36:11,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=632599.3333333334, ans=0.125 2024-09-19 09:36:19,771 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.82 vs. limit=22.5 2024-09-19 09:36:21,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.75 vs. limit=15.0 2024-09-19 09:36:27,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=632646.0, ans=15.0 2024-09-19 09:36:40,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=632692.6666666666, ans=0.2 2024-09-19 09:37:08,854 INFO [train.py:1198] (1/2) Epoch 35, batch 3800, loss[loss=0.2278, simple_loss=0.2773, pruned_loss=0.0666, ctc_loss=0.1386, cr_loss=0.4349, over 29907.00 frames. ], tot_loss[loss=0.2134, simple_loss=0.2692, pruned_loss=0.05834, ctc_loss=0.124, cr_loss=0.4039, over 6675869.92 frames. ], batch size: 175, lr: 3.38e-03, grad_scale: 16.0 2024-09-19 09:37:09,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=632786.0, ans=0.0 2024-09-19 09:37:12,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=632786.0, ans=0.125 2024-09-19 09:37:35,774 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2024-09-19 09:37:38,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=632832.6666666666, ans=0.125 2024-09-19 09:37:41,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=632879.3333333334, ans=0.125 2024-09-19 09:37:43,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=632879.3333333334, ans=0.5 2024-09-19 09:37:53,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2024-09-19 09:38:00,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=632926.0, ans=0.0 2024-09-19 09:38:33,767 INFO [train.py:1198] (1/2) Epoch 35, batch 3850, loss[loss=0.2344, simple_loss=0.286, pruned_loss=0.06803, ctc_loss=0.1459, cr_loss=0.4365, over 23490.00 frames. ], tot_loss[loss=0.2166, simple_loss=0.2713, pruned_loss=0.06009, ctc_loss=0.1275, cr_loss=0.4076, over 6248354.88 frames. ], batch size: 245, lr: 3.38e-03, grad_scale: 16.0 2024-09-19 09:38:39,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=633019.3333333334, ans=0.125 2024-09-19 09:38:40,321 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.183e+02 2.514e+02 2.689e+02 2.853e+02 5.093e+02, threshold=5.379e+02, percent-clipped=0.0 2024-09-19 09:38:42,420 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:39:10,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=633112.6666666666, ans=10.0 2024-09-19 09:39:12,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=633112.6666666666, ans=0.2 2024-09-19 09:39:12,287 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:40:07,182 INFO [train.py:1198] (1/2) Epoch 36, batch 0, loss[loss=0.1873, simple_loss=0.2442, pruned_loss=0.04739, ctc_loss=0.1052, cr_loss=0.363, over 34462.00 frames. ], tot_loss[loss=0.1873, simple_loss=0.2442, pruned_loss=0.04739, ctc_loss=0.1052, cr_loss=0.363, over 34462.00 frames. ], batch size: 85, lr: 3.33e-03, grad_scale: 32.0 2024-09-19 09:40:07,182 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 09:40:18,659 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.4874, 3.8807, 3.8912, 3.6386, 3.4863, 3.1130, 3.4921, 3.7593], device='cuda:1') 2024-09-19 09:40:24,010 INFO [train.py:1230] (1/2) Epoch 36, validation: loss=0.1485, simple_loss=0.2436, pruned_loss=0.02272, ctc_loss=0.03904, cr_loss=2.114e-14, over 944034.00 frames. 2024-09-19 09:40:24,010 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 09:40:52,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=633192.0, ans=0.125 2024-09-19 09:41:21,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633285.3333333334, ans=0.1 2024-09-19 09:41:22,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=633285.3333333334, ans=0.125 2024-09-19 09:41:23,209 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.55 vs. limit=10.0 2024-09-19 09:41:37,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=633332.0, ans=0.0 2024-09-19 09:41:42,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=633332.0, ans=0.125 2024-09-19 09:41:49,084 INFO [train.py:1198] (1/2) Epoch 36, batch 50, loss[loss=0.1871, simple_loss=0.2396, pruned_loss=0.04921, ctc_loss=0.1073, cr_loss=0.3707, over 34451.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2647, pruned_loss=0.05668, ctc_loss=0.1203, cr_loss=0.3956, over 1481996.65 frames. ], batch size: 82, lr: 3.33e-03, grad_scale: 32.0 2024-09-19 09:41:59,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=633378.6666666666, ans=0.125 2024-09-19 09:42:01,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=633378.6666666666, ans=22.5 2024-09-19 09:42:22,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=633472.0, ans=0.0 2024-09-19 09:42:35,573 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.078e+02 2.513e+02 2.781e+02 3.162e+02 7.561e+02, threshold=5.563e+02, percent-clipped=5.0 2024-09-19 09:42:36,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.83 vs. limit=22.5 2024-09-19 09:42:37,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=633472.0, ans=0.0 2024-09-19 09:42:47,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=633518.6666666666, ans=0.0 2024-09-19 09:42:55,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=633565.3333333334, ans=0.125 2024-09-19 09:43:07,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=633565.3333333334, ans=0.125 2024-09-19 09:43:08,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=633565.3333333334, ans=0.125 2024-09-19 09:43:13,408 INFO [train.py:1198] (1/2) Epoch 36, batch 100, loss[loss=0.2118, simple_loss=0.2629, pruned_loss=0.05996, ctc_loss=0.1244, cr_loss=0.3985, over 34579.00 frames. ], tot_loss[loss=0.2111, simple_loss=0.2668, pruned_loss=0.05752, ctc_loss=0.1222, cr_loss=0.4, over 2629500.00 frames. ], batch size: 89, lr: 3.33e-03, grad_scale: 32.0 2024-09-19 09:43:22,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=633612.0, ans=0.125 2024-09-19 09:43:39,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.16 vs. limit=6.0 2024-09-19 09:43:58,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=633705.3333333334, ans=0.125 2024-09-19 09:43:58,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=633705.3333333334, ans=0.125 2024-09-19 09:44:08,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=633752.0, ans=0.125 2024-09-19 09:44:15,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-09-19 09:44:22,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=633798.6666666666, ans=0.125 2024-09-19 09:44:35,489 INFO [train.py:1198] (1/2) Epoch 36, batch 150, loss[loss=0.1724, simple_loss=0.232, pruned_loss=0.04082, ctc_loss=0.09202, cr_loss=0.3193, over 34499.00 frames. ], tot_loss[loss=0.2083, simple_loss=0.2646, pruned_loss=0.05612, ctc_loss=0.1196, cr_loss=0.3945, over 3558162.98 frames. ], batch size: 82, lr: 3.33e-03, grad_scale: 32.0 2024-09-19 09:45:06,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=633892.0, ans=0.1 2024-09-19 09:45:22,164 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.089e+02 2.591e+02 2.961e+02 3.809e+02 7.349e+02, threshold=5.923e+02, percent-clipped=3.0 2024-09-19 09:45:28,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.44 vs. limit=22.5 2024-09-19 09:45:45,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=634032.0, ans=0.125 2024-09-19 09:46:01,810 INFO [train.py:1198] (1/2) Epoch 36, batch 200, loss[loss=0.2271, simple_loss=0.2806, pruned_loss=0.06425, ctc_loss=0.1394, cr_loss=0.4318, over 31840.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2635, pruned_loss=0.05589, ctc_loss=0.1194, cr_loss=0.3934, over 4272742.51 frames. ], batch size: 145, lr: 3.33e-03, grad_scale: 32.0 2024-09-19 09:46:17,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=634125.3333333334, ans=0.125 2024-09-19 09:46:52,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.60 vs. limit=15.0 2024-09-19 09:47:01,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=634218.6666666666, ans=0.2 2024-09-19 09:47:14,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=634265.3333333334, ans=0.05 2024-09-19 09:47:19,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=634265.3333333334, ans=0.0 2024-09-19 09:47:24,069 INFO [train.py:1198] (1/2) Epoch 36, batch 250, loss[loss=0.2299, simple_loss=0.2879, pruned_loss=0.0634, ctc_loss=0.1372, cr_loss=0.4415, over 34260.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2636, pruned_loss=0.05583, ctc_loss=0.1192, cr_loss=0.3935, over 4834746.34 frames. ], batch size: 117, lr: 3.33e-03, grad_scale: 32.0 2024-09-19 09:47:44,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=634358.6666666666, ans=0.125 2024-09-19 09:47:45,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=634358.6666666666, ans=0.125 2024-09-19 09:47:55,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=634405.3333333334, ans=0.0 2024-09-19 09:47:58,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=634405.3333333334, ans=0.07 2024-09-19 09:48:05,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=634405.3333333334, ans=0.2 2024-09-19 09:48:08,609 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.620e+02 3.183e+02 4.166e+02 8.381e+02, threshold=6.366e+02, percent-clipped=6.0 2024-09-19 09:48:20,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=634452.0, ans=0.2 2024-09-19 09:48:45,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=634498.6666666666, ans=0.0 2024-09-19 09:48:48,181 INFO [train.py:1198] (1/2) Epoch 36, batch 300, loss[loss=0.2278, simple_loss=0.2806, pruned_loss=0.06447, ctc_loss=0.1399, cr_loss=0.4509, over 34321.00 frames. ], tot_loss[loss=0.2073, simple_loss=0.2632, pruned_loss=0.05584, ctc_loss=0.1192, cr_loss=0.3935, over 5262023.36 frames. ], batch size: 107, lr: 3.33e-03, grad_scale: 32.0 2024-09-19 09:48:51,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=634545.3333333334, ans=0.0 2024-09-19 09:49:25,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=634638.6666666666, ans=0.125 2024-09-19 09:49:47,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=634685.3333333334, ans=0.1 2024-09-19 09:50:14,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.77 vs. limit=12.0 2024-09-19 09:50:18,815 INFO [train.py:1198] (1/2) Epoch 36, batch 350, loss[loss=0.1871, simple_loss=0.241, pruned_loss=0.04848, ctc_loss=0.1083, cr_loss=0.3632, over 34271.00 frames. ], tot_loss[loss=0.2075, simple_loss=0.2636, pruned_loss=0.05586, ctc_loss=0.1194, cr_loss=0.3943, over 5597947.37 frames. ], batch size: 83, lr: 3.33e-03, grad_scale: 32.0 2024-09-19 09:50:27,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=634778.6666666666, ans=0.0 2024-09-19 09:50:53,679 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 09:50:59,290 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.38 vs. limit=15.0 2024-09-19 09:51:03,203 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.116e+02 2.482e+02 2.876e+02 3.614e+02 7.305e+02, threshold=5.751e+02, percent-clipped=1.0 2024-09-19 09:51:08,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=634918.6666666666, ans=0.09899494936611666 2024-09-19 09:51:18,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=634918.6666666666, ans=0.125 2024-09-19 09:51:40,863 INFO [train.py:1198] (1/2) Epoch 36, batch 400, loss[loss=0.2153, simple_loss=0.2703, pruned_loss=0.05973, ctc_loss=0.1242, cr_loss=0.4012, over 34414.00 frames. ], tot_loss[loss=0.2072, simple_loss=0.2633, pruned_loss=0.05571, ctc_loss=0.1191, cr_loss=0.3935, over 5865825.42 frames. ], batch size: 95, lr: 3.33e-03, grad_scale: 32.0 2024-09-19 09:51:41,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=635012.0, ans=0.2 2024-09-19 09:51:51,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635012.0, ans=0.1 2024-09-19 09:52:22,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=635105.3333333334, ans=0.125 2024-09-19 09:52:36,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635152.0, ans=0.1 2024-09-19 09:52:48,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=635198.6666666666, ans=0.125 2024-09-19 09:53:04,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=635245.3333333334, ans=0.015 2024-09-19 09:53:05,638 INFO [train.py:1198] (1/2) Epoch 36, batch 450, loss[loss=0.2165, simple_loss=0.2715, pruned_loss=0.06047, ctc_loss=0.1233, cr_loss=0.3967, over 34706.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2636, pruned_loss=0.05581, ctc_loss=0.1192, cr_loss=0.3941, over 6054125.27 frames. ], batch size: 97, lr: 3.33e-03, grad_scale: 32.0 2024-09-19 09:53:12,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=635245.3333333334, ans=0.125 2024-09-19 09:53:52,034 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.124e+02 2.425e+02 2.801e+02 3.474e+02 6.041e+02, threshold=5.602e+02, percent-clipped=1.0 2024-09-19 09:53:57,488 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=635385.3333333334, ans=0.0 2024-09-19 09:54:09,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2024-09-19 09:54:12,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635432.0, ans=0.1 2024-09-19 09:54:30,413 INFO [train.py:1198] (1/2) Epoch 36, batch 500, loss[loss=0.2257, simple_loss=0.2799, pruned_loss=0.06449, ctc_loss=0.1322, cr_loss=0.4023, over 34450.00 frames. ], tot_loss[loss=0.2067, simple_loss=0.2629, pruned_loss=0.0555, ctc_loss=0.1186, cr_loss=0.3925, over 6220132.14 frames. ], batch size: 110, lr: 3.33e-03, grad_scale: 32.0 2024-09-19 09:54:42,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=635478.6666666666, ans=0.0 2024-09-19 09:54:50,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=635525.3333333334, ans=0.125 2024-09-19 09:54:52,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=635525.3333333334, ans=0.0 2024-09-19 09:55:02,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=635572.0, ans=0.125 2024-09-19 09:55:15,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=635572.0, ans=0.2 2024-09-19 09:55:32,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=635618.6666666666, ans=0.125 2024-09-19 09:55:36,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.56 vs. limit=5.0 2024-09-19 09:55:43,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=635665.3333333334, ans=0.2 2024-09-19 09:55:53,068 INFO [train.py:1198] (1/2) Epoch 36, batch 550, loss[loss=0.2258, simple_loss=0.2806, pruned_loss=0.06335, ctc_loss=0.1338, cr_loss=0.4377, over 33739.00 frames. ], tot_loss[loss=0.207, simple_loss=0.263, pruned_loss=0.0557, ctc_loss=0.119, cr_loss=0.3935, over 6329464.64 frames. ], batch size: 122, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 09:56:16,865 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2024-09-19 09:56:41,397 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.142e+02 2.461e+02 2.901e+02 3.504e+02 6.850e+02, threshold=5.802e+02, percent-clipped=1.0 2024-09-19 09:56:45,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.73 vs. limit=15.0 2024-09-19 09:56:51,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=635852.0, ans=0.0 2024-09-19 09:56:55,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=635852.0, ans=0.125 2024-09-19 09:56:58,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=635852.0, ans=0.125 2024-09-19 09:57:07,074 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.81 vs. limit=22.5 2024-09-19 09:57:17,790 INFO [train.py:1198] (1/2) Epoch 36, batch 600, loss[loss=0.2384, simple_loss=0.2907, pruned_loss=0.06989, ctc_loss=0.1404, cr_loss=0.4564, over 34288.00 frames. ], tot_loss[loss=0.2071, simple_loss=0.2632, pruned_loss=0.0557, ctc_loss=0.119, cr_loss=0.3934, over 6432315.47 frames. ], batch size: 117, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 09:58:07,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=636085.3333333334, ans=0.1 2024-09-19 09:58:08,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=636085.3333333334, ans=0.0 2024-09-19 09:58:12,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=636085.3333333334, ans=0.0 2024-09-19 09:58:20,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=636085.3333333334, ans=0.125 2024-09-19 09:58:33,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=636132.0, ans=0.125 2024-09-19 09:58:41,359 INFO [train.py:1198] (1/2) Epoch 36, batch 650, loss[loss=0.2115, simple_loss=0.2686, pruned_loss=0.05692, ctc_loss=0.1233, cr_loss=0.3991, over 34541.00 frames. ], tot_loss[loss=0.2063, simple_loss=0.2626, pruned_loss=0.0553, ctc_loss=0.1182, cr_loss=0.3917, over 6523780.57 frames. ], batch size: 94, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 09:59:06,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636225.3333333334, ans=0.1 2024-09-19 09:59:27,517 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.988e+02 2.434e+02 2.935e+02 3.494e+02 7.247e+02, threshold=5.869e+02, percent-clipped=3.0 2024-09-19 09:59:59,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=636365.3333333334, ans=0.125 2024-09-19 10:00:03,903 INFO [train.py:1198] (1/2) Epoch 36, batch 700, loss[loss=0.2074, simple_loss=0.2561, pruned_loss=0.05909, ctc_loss=0.1225, cr_loss=0.3998, over 34609.00 frames. ], tot_loss[loss=0.2071, simple_loss=0.2633, pruned_loss=0.05567, ctc_loss=0.1188, cr_loss=0.3935, over 6581964.16 frames. ], batch size: 89, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 10:00:21,213 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:00:23,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.13 vs. limit=10.0 2024-09-19 10:00:29,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.56 vs. limit=15.0 2024-09-19 10:00:34,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.64 vs. limit=15.0 2024-09-19 10:00:37,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=636505.3333333334, ans=0.1 2024-09-19 10:00:47,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.93 vs. limit=15.0 2024-09-19 10:00:54,686 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.47 vs. limit=15.0 2024-09-19 10:00:56,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.78 vs. limit=15.0 2024-09-19 10:01:14,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=636598.6666666666, ans=0.025 2024-09-19 10:01:20,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=636598.6666666666, ans=0.2 2024-09-19 10:01:27,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=636598.6666666666, ans=0.125 2024-09-19 10:01:29,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.52 vs. limit=22.5 2024-09-19 10:01:30,359 INFO [train.py:1198] (1/2) Epoch 36, batch 750, loss[loss=0.2115, simple_loss=0.2705, pruned_loss=0.05605, ctc_loss=0.121, cr_loss=0.4041, over 34415.00 frames. ], tot_loss[loss=0.2065, simple_loss=0.2628, pruned_loss=0.05542, ctc_loss=0.1184, cr_loss=0.3921, over 6626195.01 frames. ], batch size: 95, lr: 3.33e-03, grad_scale: 16.0 2024-09-19 10:01:42,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.11 vs. limit=10.0 2024-09-19 10:02:13,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=636738.6666666666, ans=0.0 2024-09-19 10:02:16,390 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.105e+02 2.578e+02 2.928e+02 3.713e+02 7.876e+02, threshold=5.857e+02, percent-clipped=3.0 2024-09-19 10:02:16,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=636738.6666666666, ans=0.125 2024-09-19 10:02:21,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=636785.3333333334, ans=0.0 2024-09-19 10:02:46,946 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.92 vs. limit=10.0 2024-09-19 10:02:52,799 INFO [train.py:1198] (1/2) Epoch 36, batch 800, loss[loss=0.1837, simple_loss=0.2428, pruned_loss=0.04511, ctc_loss=0.1023, cr_loss=0.3463, over 34466.00 frames. ], tot_loss[loss=0.2065, simple_loss=0.2627, pruned_loss=0.05547, ctc_loss=0.1184, cr_loss=0.3923, over 6659494.83 frames. ], batch size: 85, lr: 3.33e-03, grad_scale: 32.0 2024-09-19 10:03:01,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.80 vs. limit=22.5 2024-09-19 10:03:14,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=636925.3333333334, ans=0.0 2024-09-19 10:03:25,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=636972.0, ans=0.1 2024-09-19 10:03:28,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.55 vs. limit=15.0 2024-09-19 10:03:33,347 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.48 vs. limit=15.0 2024-09-19 10:03:43,051 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.97 vs. limit=15.0 2024-09-19 10:03:50,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=637018.6666666666, ans=0.0 2024-09-19 10:04:01,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.38 vs. limit=15.0 2024-09-19 10:04:07,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2024-09-19 10:04:17,160 INFO [train.py:1198] (1/2) Epoch 36, batch 850, loss[loss=0.2137, simple_loss=0.2737, pruned_loss=0.05656, ctc_loss=0.1226, cr_loss=0.4041, over 34397.00 frames. ], tot_loss[loss=0.2061, simple_loss=0.2625, pruned_loss=0.05521, ctc_loss=0.1181, cr_loss=0.3917, over 6694391.21 frames. ], batch size: 103, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:04:35,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=637158.6666666666, ans=0.125 2024-09-19 10:04:50,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=637205.3333333334, ans=0.0 2024-09-19 10:05:04,873 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.087e+02 2.473e+02 2.867e+02 3.606e+02 5.794e+02, threshold=5.734e+02, percent-clipped=0.0 2024-09-19 10:05:12,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=637252.0, ans=0.125 2024-09-19 10:05:12,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=637252.0, ans=0.025 2024-09-19 10:05:29,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=637298.6666666666, ans=0.125 2024-09-19 10:05:29,108 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=637298.6666666666, ans=0.025 2024-09-19 10:05:41,712 INFO [train.py:1198] (1/2) Epoch 36, batch 900, loss[loss=0.1879, simple_loss=0.2433, pruned_loss=0.04869, ctc_loss=0.1047, cr_loss=0.3567, over 34488.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.2629, pruned_loss=0.05545, ctc_loss=0.1185, cr_loss=0.3925, over 6700281.69 frames. ], batch size: 85, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:05:50,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=637345.3333333334, ans=0.1 2024-09-19 10:05:56,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=637392.0, ans=0.025 2024-09-19 10:05:57,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.70 vs. limit=15.0 2024-09-19 10:05:58,991 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.06 vs. limit=15.0 2024-09-19 10:06:00,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=637392.0, ans=0.0 2024-09-19 10:06:00,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.68 vs. limit=10.0 2024-09-19 10:06:02,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-19 10:06:03,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=637392.0, ans=0.025 2024-09-19 10:06:06,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=637392.0, ans=0.0 2024-09-19 10:06:33,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=637485.3333333334, ans=0.07 2024-09-19 10:06:36,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=637485.3333333334, ans=0.2 2024-09-19 10:06:41,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=637485.3333333334, ans=0.025 2024-09-19 10:06:46,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=637532.0, ans=0.125 2024-09-19 10:06:59,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=637532.0, ans=10.0 2024-09-19 10:07:03,705 INFO [train.py:1198] (1/2) Epoch 36, batch 950, loss[loss=0.1927, simple_loss=0.2489, pruned_loss=0.05018, ctc_loss=0.1079, cr_loss=0.3662, over 34692.00 frames. ], tot_loss[loss=0.2069, simple_loss=0.2632, pruned_loss=0.05554, ctc_loss=0.1186, cr_loss=0.3926, over 6702807.39 frames. ], batch size: 87, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:07:36,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=637672.0, ans=0.07 2024-09-19 10:07:40,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=637672.0, ans=0.0 2024-09-19 10:07:46,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=637672.0, ans=0.125 2024-09-19 10:07:49,686 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.088e+02 2.583e+02 2.916e+02 3.566e+02 6.602e+02, threshold=5.832e+02, percent-clipped=2.0 2024-09-19 10:08:03,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=637718.6666666666, ans=0.125 2024-09-19 10:08:05,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=637718.6666666666, ans=0.125 2024-09-19 10:08:10,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=637765.3333333334, ans=0.125 2024-09-19 10:08:28,014 INFO [train.py:1198] (1/2) Epoch 36, batch 1000, loss[loss=0.2064, simple_loss=0.2586, pruned_loss=0.05679, ctc_loss=0.1221, cr_loss=0.4038, over 34488.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.2641, pruned_loss=0.05594, ctc_loss=0.1195, cr_loss=0.3943, over 6695679.52 frames. ], batch size: 90, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:08:29,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.55 vs. limit=10.0 2024-09-19 10:08:40,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.51 vs. limit=10.0 2024-09-19 10:08:55,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=637858.6666666666, ans=0.125 2024-09-19 10:09:03,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=637905.3333333334, ans=0.125 2024-09-19 10:09:05,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=637905.3333333334, ans=0.2 2024-09-19 10:09:08,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=637905.3333333334, ans=0.125 2024-09-19 10:09:14,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=637905.3333333334, ans=0.125 2024-09-19 10:09:19,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=637952.0, ans=0.125 2024-09-19 10:09:29,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=637952.0, ans=0.125 2024-09-19 10:09:52,456 INFO [train.py:1198] (1/2) Epoch 36, batch 1050, loss[loss=0.2035, simple_loss=0.2649, pruned_loss=0.05185, ctc_loss=0.1149, cr_loss=0.3851, over 34555.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2636, pruned_loss=0.05582, ctc_loss=0.1192, cr_loss=0.3938, over 6704789.28 frames. ], batch size: 99, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:10:04,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.67 vs. limit=22.5 2024-09-19 10:10:20,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=638092.0, ans=0.125 2024-09-19 10:10:38,681 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.039e+02 2.455e+02 2.810e+02 3.365e+02 5.805e+02, threshold=5.621e+02, percent-clipped=0.0 2024-09-19 10:10:54,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.87 vs. limit=10.0 2024-09-19 10:11:10,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=638232.0, ans=0.125 2024-09-19 10:11:14,874 INFO [train.py:1198] (1/2) Epoch 36, batch 1100, loss[loss=0.1995, simple_loss=0.2557, pruned_loss=0.05225, ctc_loss=0.1155, cr_loss=0.3907, over 34370.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2632, pruned_loss=0.05568, ctc_loss=0.119, cr_loss=0.3931, over 6717295.28 frames. ], batch size: 91, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:11:23,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=638278.6666666666, ans=10.0 2024-09-19 10:11:33,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=638325.3333333334, ans=22.5 2024-09-19 10:11:48,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=638372.0, ans=0.1 2024-09-19 10:12:00,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=638372.0, ans=0.125 2024-09-19 10:12:03,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=638372.0, ans=0.0 2024-09-19 10:12:22,434 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=638418.6666666666, ans=0.125 2024-09-19 10:12:31,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=638465.3333333334, ans=0.025 2024-09-19 10:12:41,555 INFO [train.py:1198] (1/2) Epoch 36, batch 1150, loss[loss=0.2016, simple_loss=0.2521, pruned_loss=0.05609, ctc_loss=0.1167, cr_loss=0.389, over 34358.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2632, pruned_loss=0.05567, ctc_loss=0.1189, cr_loss=0.3931, over 6715670.32 frames. ], batch size: 91, lr: 3.32e-03, grad_scale: 16.0 2024-09-19 10:12:46,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=638512.0, ans=0.025 2024-09-19 10:13:01,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=638558.6666666666, ans=0.125 2024-09-19 10:13:29,766 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.908e+02 2.491e+02 2.756e+02 3.474e+02 6.555e+02, threshold=5.511e+02, percent-clipped=2.0 2024-09-19 10:13:31,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=638652.0, ans=0.0 2024-09-19 10:13:33,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=638652.0, ans=0.125 2024-09-19 10:13:37,409 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.13 vs. limit=22.5 2024-09-19 10:13:41,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=638652.0, ans=0.1 2024-09-19 10:13:42,079 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.17 vs. limit=15.0 2024-09-19 10:13:46,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=638698.6666666666, ans=0.125 2024-09-19 10:13:54,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=638698.6666666666, ans=0.2 2024-09-19 10:13:56,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=638698.6666666666, ans=0.0 2024-09-19 10:14:04,246 INFO [train.py:1198] (1/2) Epoch 36, batch 1200, loss[loss=0.2252, simple_loss=0.2775, pruned_loss=0.06478, ctc_loss=0.1341, cr_loss=0.4128, over 34570.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.264, pruned_loss=0.05594, ctc_loss=0.1195, cr_loss=0.3945, over 6707365.41 frames. ], batch size: 99, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:14:09,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=638745.3333333334, ans=0.1 2024-09-19 10:14:12,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=638745.3333333334, ans=0.1 2024-09-19 10:14:37,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=638838.6666666666, ans=0.125 2024-09-19 10:14:43,200 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.09 vs. limit=15.0 2024-09-19 10:14:57,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=638885.3333333334, ans=0.0 2024-09-19 10:15:26,967 INFO [train.py:1198] (1/2) Epoch 36, batch 1250, loss[loss=0.2443, simple_loss=0.2926, pruned_loss=0.07315, ctc_loss=0.151, cr_loss=0.4876, over 34334.00 frames. ], tot_loss[loss=0.2085, simple_loss=0.2646, pruned_loss=0.05624, ctc_loss=0.12, cr_loss=0.3957, over 6741540.01 frames. ], batch size: 107, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:15:32,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=638978.6666666666, ans=0.125 2024-09-19 10:15:34,917 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2024-09-19 10:15:51,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=639025.3333333334, ans=0.125 2024-09-19 10:15:51,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=639025.3333333334, ans=0.125 2024-09-19 10:15:52,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=639025.3333333334, ans=10.0 2024-09-19 10:16:04,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=639072.0, ans=0.0 2024-09-19 10:16:09,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=639072.0, ans=0.0 2024-09-19 10:16:18,703 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.181e+02 2.548e+02 2.986e+02 3.716e+02 6.439e+02, threshold=5.972e+02, percent-clipped=3.0 2024-09-19 10:16:37,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=639165.3333333334, ans=0.2 2024-09-19 10:16:53,393 INFO [train.py:1198] (1/2) Epoch 36, batch 1300, loss[loss=0.2061, simple_loss=0.272, pruned_loss=0.05141, ctc_loss=0.1118, cr_loss=0.3742, over 33149.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2638, pruned_loss=0.05576, ctc_loss=0.119, cr_loss=0.3933, over 6745102.13 frames. ], batch size: 130, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:16:55,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=639212.0, ans=0.0 2024-09-19 10:16:56,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=639212.0, ans=0.0 2024-09-19 10:17:00,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=639212.0, ans=10.0 2024-09-19 10:17:03,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=639212.0, ans=0.125 2024-09-19 10:17:11,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639258.6666666666, ans=0.1 2024-09-19 10:17:24,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2024-09-19 10:17:26,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=639305.3333333334, ans=0.07 2024-09-19 10:17:34,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.42 vs. limit=12.0 2024-09-19 10:18:04,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=639398.6666666666, ans=0.125 2024-09-19 10:18:05,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=639398.6666666666, ans=0.0 2024-09-19 10:18:13,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=639398.6666666666, ans=0.05 2024-09-19 10:18:16,129 INFO [train.py:1198] (1/2) Epoch 36, batch 1350, loss[loss=0.2157, simple_loss=0.2674, pruned_loss=0.06037, ctc_loss=0.1295, cr_loss=0.4323, over 34547.00 frames. ], tot_loss[loss=0.2072, simple_loss=0.2635, pruned_loss=0.05568, ctc_loss=0.1189, cr_loss=0.3933, over 6765828.40 frames. ], batch size: 94, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:18:49,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.15 vs. limit=15.0 2024-09-19 10:18:50,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=639538.6666666666, ans=0.0 2024-09-19 10:18:50,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=639538.6666666666, ans=0.2 2024-09-19 10:19:03,411 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.642e+02 3.112e+02 3.814e+02 7.073e+02, threshold=6.223e+02, percent-clipped=1.0 2024-09-19 10:19:05,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=639585.3333333334, ans=0.2 2024-09-19 10:19:34,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=639632.0, ans=0.125 2024-09-19 10:19:40,494 INFO [train.py:1198] (1/2) Epoch 36, batch 1400, loss[loss=0.1769, simple_loss=0.2325, pruned_loss=0.04419, ctc_loss=0.09554, cr_loss=0.3469, over 34301.00 frames. ], tot_loss[loss=0.2069, simple_loss=0.2632, pruned_loss=0.05552, ctc_loss=0.1188, cr_loss=0.3935, over 6777628.85 frames. ], batch size: 80, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:20:10,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=639725.3333333334, ans=0.125 2024-09-19 10:20:43,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639818.6666666666, ans=0.1 2024-09-19 10:20:59,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=639865.3333333334, ans=0.95 2024-09-19 10:21:04,267 INFO [train.py:1198] (1/2) Epoch 36, batch 1450, loss[loss=0.2258, simple_loss=0.2801, pruned_loss=0.06414, ctc_loss=0.1331, cr_loss=0.4148, over 34439.00 frames. ], tot_loss[loss=0.2071, simple_loss=0.2635, pruned_loss=0.05554, ctc_loss=0.1189, cr_loss=0.3935, over 6773986.57 frames. ], batch size: 110, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:21:09,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=639912.0, ans=0.1 2024-09-19 10:21:12,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=639912.0, ans=0.125 2024-09-19 10:21:29,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=639958.6666666666, ans=0.125 2024-09-19 10:21:41,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.06 vs. limit=15.0 2024-09-19 10:21:48,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=640005.3333333334, ans=0.125 2024-09-19 10:21:51,623 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.124e+02 2.506e+02 2.900e+02 3.376e+02 5.710e+02, threshold=5.800e+02, percent-clipped=0.0 2024-09-19 10:22:08,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=640098.6666666666, ans=0.0 2024-09-19 10:22:19,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=640098.6666666666, ans=0.1 2024-09-19 10:22:26,159 INFO [train.py:1198] (1/2) Epoch 36, batch 1500, loss[loss=0.2082, simple_loss=0.2648, pruned_loss=0.05601, ctc_loss=0.1174, cr_loss=0.4015, over 34459.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2634, pruned_loss=0.0555, ctc_loss=0.1187, cr_loss=0.3935, over 6775157.31 frames. ], batch size: 100, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:22:34,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=640145.3333333334, ans=0.125 2024-09-19 10:22:49,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=640192.0, ans=0.125 2024-09-19 10:22:58,229 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:23:17,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=640285.3333333334, ans=0.125 2024-09-19 10:23:38,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=640332.0, ans=0.0 2024-09-19 10:23:41,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.82 vs. limit=12.0 2024-09-19 10:23:52,981 INFO [train.py:1198] (1/2) Epoch 36, batch 1550, loss[loss=0.2247, simple_loss=0.2818, pruned_loss=0.06226, ctc_loss=0.1308, cr_loss=0.4243, over 34409.00 frames. ], tot_loss[loss=0.2077, simple_loss=0.2639, pruned_loss=0.05591, ctc_loss=0.1194, cr_loss=0.3944, over 6745332.88 frames. ], batch size: 105, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:24:01,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=640378.6666666666, ans=0.125 2024-09-19 10:24:40,349 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.049e+02 2.528e+02 2.861e+02 3.585e+02 7.397e+02, threshold=5.721e+02, percent-clipped=5.0 2024-09-19 10:24:54,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=640518.6666666666, ans=0.125 2024-09-19 10:25:15,116 INFO [train.py:1198] (1/2) Epoch 36, batch 1600, loss[loss=0.2175, simple_loss=0.2776, pruned_loss=0.05801, ctc_loss=0.1251, cr_loss=0.4078, over 34572.00 frames. ], tot_loss[loss=0.2076, simple_loss=0.2637, pruned_loss=0.0559, ctc_loss=0.1194, cr_loss=0.3939, over 6726168.96 frames. ], batch size: 99, lr: 3.32e-03, grad_scale: 32.0 2024-09-19 10:25:33,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=640658.6666666666, ans=0.0 2024-09-19 10:25:34,280 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.98 vs. limit=12.0 2024-09-19 10:26:13,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=640752.0, ans=0.025 2024-09-19 10:26:20,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=640798.6666666666, ans=0.125 2024-09-19 10:26:33,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=640798.6666666666, ans=0.025 2024-09-19 10:26:37,787 INFO [train.py:1198] (1/2) Epoch 36, batch 1650, loss[loss=0.2204, simple_loss=0.2789, pruned_loss=0.05944, ctc_loss=0.1273, cr_loss=0.4375, over 34370.00 frames. ], tot_loss[loss=0.2072, simple_loss=0.2634, pruned_loss=0.05568, ctc_loss=0.1191, cr_loss=0.3932, over 6719357.54 frames. ], batch size: 103, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:26:39,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=640845.3333333334, ans=0.125 2024-09-19 10:26:41,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=640845.3333333334, ans=0.125 2024-09-19 10:26:48,246 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:26:58,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=640892.0, ans=0.0 2024-09-19 10:27:03,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=640892.0, ans=0.125 2024-09-19 10:27:25,100 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.31 vs. limit=15.0 2024-09-19 10:27:29,198 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.147e+02 2.511e+02 2.869e+02 3.530e+02 6.607e+02, threshold=5.738e+02, percent-clipped=4.0 2024-09-19 10:28:03,650 INFO [train.py:1198] (1/2) Epoch 36, batch 1700, loss[loss=0.1869, simple_loss=0.2401, pruned_loss=0.04878, ctc_loss=0.1058, cr_loss=0.3754, over 34305.00 frames. ], tot_loss[loss=0.2072, simple_loss=0.2634, pruned_loss=0.05573, ctc_loss=0.1192, cr_loss=0.3935, over 6745577.26 frames. ], batch size: 80, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:28:08,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=641078.6666666666, ans=0.0 2024-09-19 10:28:10,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=641078.6666666666, ans=0.1 2024-09-19 10:28:17,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=641078.6666666666, ans=0.125 2024-09-19 10:28:17,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=641078.6666666666, ans=0.025 2024-09-19 10:28:30,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=641125.3333333334, ans=0.0 2024-09-19 10:28:37,266 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=15.0 2024-09-19 10:28:50,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=641172.0, ans=0.0 2024-09-19 10:28:50,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=641172.0, ans=0.125 2024-09-19 10:29:08,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=641265.3333333334, ans=0.125 2024-09-19 10:29:17,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.54 vs. limit=15.0 2024-09-19 10:29:21,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=641265.3333333334, ans=0.015 2024-09-19 10:29:23,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=641265.3333333334, ans=0.125 2024-09-19 10:29:26,419 INFO [train.py:1198] (1/2) Epoch 36, batch 1750, loss[loss=0.1823, simple_loss=0.237, pruned_loss=0.04661, ctc_loss=0.103, cr_loss=0.3466, over 34133.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2632, pruned_loss=0.05564, ctc_loss=0.119, cr_loss=0.3933, over 6753880.78 frames. ], batch size: 78, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:29:59,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=641405.3333333334, ans=0.125 2024-09-19 10:30:08,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=641405.3333333334, ans=0.125 2024-09-19 10:30:14,059 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.857e+02 2.484e+02 2.752e+02 3.342e+02 8.616e+02, threshold=5.503e+02, percent-clipped=1.0 2024-09-19 10:30:30,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=641498.6666666666, ans=10.0 2024-09-19 10:30:34,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-19 10:30:38,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=641498.6666666666, ans=0.125 2024-09-19 10:30:52,152 INFO [train.py:1198] (1/2) Epoch 36, batch 1800, loss[loss=0.2217, simple_loss=0.2836, pruned_loss=0.05956, ctc_loss=0.1233, cr_loss=0.3968, over 34695.00 frames. ], tot_loss[loss=0.2073, simple_loss=0.2635, pruned_loss=0.0558, ctc_loss=0.1192, cr_loss=0.3939, over 6756026.21 frames. ], batch size: 97, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:31:25,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=641638.6666666666, ans=0.2 2024-09-19 10:31:31,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.53 vs. limit=15.0 2024-09-19 10:31:43,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641685.3333333334, ans=0.1 2024-09-19 10:31:45,952 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.54 vs. limit=15.0 2024-09-19 10:31:53,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.96 vs. limit=15.0 2024-09-19 10:31:56,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=641732.0, ans=0.0 2024-09-19 10:32:03,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=641732.0, ans=0.07 2024-09-19 10:32:14,844 INFO [train.py:1198] (1/2) Epoch 36, batch 1850, loss[loss=0.2201, simple_loss=0.2769, pruned_loss=0.06066, ctc_loss=0.1279, cr_loss=0.4116, over 34469.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2635, pruned_loss=0.05583, ctc_loss=0.1193, cr_loss=0.3942, over 6763225.36 frames. ], batch size: 100, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:32:15,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=641778.6666666666, ans=0.2 2024-09-19 10:32:34,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=641825.3333333334, ans=0.025 2024-09-19 10:33:02,682 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 2.605e+02 3.061e+02 4.155e+02 7.846e+02, threshold=6.123e+02, percent-clipped=7.0 2024-09-19 10:33:11,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=641918.6666666666, ans=0.0 2024-09-19 10:33:14,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641918.6666666666, ans=0.1 2024-09-19 10:33:24,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=641965.3333333334, ans=0.125 2024-09-19 10:33:36,923 INFO [train.py:1198] (1/2) Epoch 36, batch 1900, loss[loss=0.2112, simple_loss=0.2715, pruned_loss=0.05561, ctc_loss=0.1208, cr_loss=0.389, over 34375.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.2641, pruned_loss=0.05592, ctc_loss=0.1195, cr_loss=0.3951, over 6772698.50 frames. ], batch size: 103, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:33:37,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=642012.0, ans=0.125 2024-09-19 10:33:57,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=642058.6666666666, ans=0.125 2024-09-19 10:34:02,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=642058.6666666666, ans=0.125 2024-09-19 10:34:03,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=642058.6666666666, ans=0.1 2024-09-19 10:34:09,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=22.5 2024-09-19 10:34:25,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=642152.0, ans=0.05 2024-09-19 10:34:43,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.81 vs. limit=15.0 2024-09-19 10:34:46,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=642198.6666666666, ans=0.125 2024-09-19 10:35:00,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=642198.6666666666, ans=0.125 2024-09-19 10:35:03,746 INFO [train.py:1198] (1/2) Epoch 36, batch 1950, loss[loss=0.206, simple_loss=0.2574, pruned_loss=0.05737, ctc_loss=0.1216, cr_loss=0.3874, over 34390.00 frames. ], tot_loss[loss=0.209, simple_loss=0.2653, pruned_loss=0.05634, ctc_loss=0.1203, cr_loss=0.3972, over 6790051.01 frames. ], batch size: 91, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:35:04,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=642245.3333333334, ans=0.0 2024-09-19 10:35:09,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=642245.3333333334, ans=0.125 2024-09-19 10:35:18,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=642292.0, ans=0.025 2024-09-19 10:35:20,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=642292.0, ans=0.2 2024-09-19 10:35:46,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=642338.6666666666, ans=0.125 2024-09-19 10:35:51,175 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.155e+02 2.527e+02 2.894e+02 3.393e+02 5.948e+02, threshold=5.788e+02, percent-clipped=0.0 2024-09-19 10:35:58,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.81 vs. limit=22.5 2024-09-19 10:36:11,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=642432.0, ans=0.125 2024-09-19 10:36:20,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.60 vs. limit=15.0 2024-09-19 10:36:25,655 INFO [train.py:1198] (1/2) Epoch 36, batch 2000, loss[loss=0.1899, simple_loss=0.2463, pruned_loss=0.04896, ctc_loss=0.1051, cr_loss=0.3658, over 34182.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.2656, pruned_loss=0.0564, ctc_loss=0.1205, cr_loss=0.3976, over 6766683.71 frames. ], batch size: 78, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:36:27,672 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:36:36,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=642478.6666666666, ans=0.1 2024-09-19 10:36:42,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=642525.3333333334, ans=0.125 2024-09-19 10:36:43,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.72 vs. limit=15.0 2024-09-19 10:37:02,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=642572.0, ans=0.1 2024-09-19 10:37:23,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=642618.6666666666, ans=0.125 2024-09-19 10:37:47,348 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.10 vs. limit=22.5 2024-09-19 10:37:47,896 INFO [train.py:1198] (1/2) Epoch 36, batch 2050, loss[loss=0.1804, simple_loss=0.2333, pruned_loss=0.04655, ctc_loss=0.103, cr_loss=0.3425, over 34464.00 frames. ], tot_loss[loss=0.208, simple_loss=0.2643, pruned_loss=0.05594, ctc_loss=0.1196, cr_loss=0.3949, over 6758038.41 frames. ], batch size: 82, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:37:56,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=642712.0, ans=0.0 2024-09-19 10:38:24,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.56 vs. limit=15.0 2024-09-19 10:38:33,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=642805.3333333334, ans=0.125 2024-09-19 10:38:33,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=642805.3333333334, ans=0.0 2024-09-19 10:38:33,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=642805.3333333334, ans=0.0 2024-09-19 10:38:34,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=642805.3333333334, ans=0.125 2024-09-19 10:38:39,485 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.112e+02 2.484e+02 2.895e+02 3.629e+02 1.113e+03, threshold=5.789e+02, percent-clipped=5.0 2024-09-19 10:38:49,056 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.04 vs. limit=15.0 2024-09-19 10:38:51,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.41 vs. limit=12.0 2024-09-19 10:38:56,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=642898.6666666666, ans=0.125 2024-09-19 10:38:56,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.36 vs. limit=15.0 2024-09-19 10:38:59,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=642898.6666666666, ans=0.0 2024-09-19 10:38:59,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.97 vs. limit=15.0 2024-09-19 10:39:03,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.20 vs. limit=12.0 2024-09-19 10:39:14,133 INFO [train.py:1198] (1/2) Epoch 36, batch 2100, loss[loss=0.2116, simple_loss=0.2651, pruned_loss=0.05841, ctc_loss=0.1235, cr_loss=0.417, over 34524.00 frames. ], tot_loss[loss=0.2075, simple_loss=0.2639, pruned_loss=0.05577, ctc_loss=0.1193, cr_loss=0.3942, over 6772478.35 frames. ], batch size: 94, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:39:24,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=642945.3333333334, ans=0.125 2024-09-19 10:39:32,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=642992.0, ans=0.0 2024-09-19 10:39:57,744 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.85 vs. limit=22.5 2024-09-19 10:39:58,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=643038.6666666666, ans=0.125 2024-09-19 10:40:00,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=643038.6666666666, ans=0.0 2024-09-19 10:40:04,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.26 vs. limit=12.0 2024-09-19 10:40:10,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=643085.3333333334, ans=0.0 2024-09-19 10:40:13,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=643085.3333333334, ans=0.125 2024-09-19 10:40:36,383 INFO [train.py:1198] (1/2) Epoch 36, batch 2150, loss[loss=0.2015, simple_loss=0.2534, pruned_loss=0.05505, ctc_loss=0.1175, cr_loss=0.3982, over 34361.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2633, pruned_loss=0.05556, ctc_loss=0.1189, cr_loss=0.3934, over 6790680.95 frames. ], batch size: 91, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:41:15,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=643272.0, ans=0.125 2024-09-19 10:41:20,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=643272.0, ans=0.0 2024-09-19 10:41:24,528 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 2.759e+02 3.445e+02 4.458e+02 8.279e+02, threshold=6.891e+02, percent-clipped=6.0 2024-09-19 10:41:37,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=643318.6666666666, ans=22.5 2024-09-19 10:41:41,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=643365.3333333334, ans=0.125 2024-09-19 10:41:48,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=643365.3333333334, ans=0.125 2024-09-19 10:42:01,018 INFO [train.py:1198] (1/2) Epoch 36, batch 2200, loss[loss=0.1971, simple_loss=0.2636, pruned_loss=0.04737, ctc_loss=0.1067, cr_loss=0.3644, over 34464.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2633, pruned_loss=0.05562, ctc_loss=0.119, cr_loss=0.3934, over 6783967.31 frames. ], batch size: 100, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:42:02,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2024-09-19 10:42:04,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=643412.0, ans=0.05 2024-09-19 10:42:12,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=643412.0, ans=0.125 2024-09-19 10:42:19,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=643458.6666666666, ans=0.0 2024-09-19 10:42:20,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=643458.6666666666, ans=0.125 2024-09-19 10:42:20,179 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:42:34,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=643505.3333333334, ans=0.0 2024-09-19 10:43:15,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=643598.6666666666, ans=0.125 2024-09-19 10:43:22,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=643598.6666666666, ans=0.02 2024-09-19 10:43:25,608 INFO [train.py:1198] (1/2) Epoch 36, batch 2250, loss[loss=0.2079, simple_loss=0.2639, pruned_loss=0.05675, ctc_loss=0.1161, cr_loss=0.3783, over 34421.00 frames. ], tot_loss[loss=0.2067, simple_loss=0.263, pruned_loss=0.05544, ctc_loss=0.1187, cr_loss=0.3926, over 6781659.34 frames. ], batch size: 95, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:43:26,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.71 vs. limit=15.0 2024-09-19 10:43:31,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.74 vs. limit=12.0 2024-09-19 10:44:03,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=643738.6666666666, ans=0.125 2024-09-19 10:44:04,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.06 vs. limit=10.0 2024-09-19 10:44:08,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=643738.6666666666, ans=0.125 2024-09-19 10:44:08,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.30 vs. limit=22.5 2024-09-19 10:44:12,887 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.511e+02 3.021e+02 3.965e+02 7.015e+02, threshold=6.041e+02, percent-clipped=1.0 2024-09-19 10:44:16,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=643785.3333333334, ans=0.0 2024-09-19 10:44:23,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=643785.3333333334, ans=0.0 2024-09-19 10:44:24,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=643785.3333333334, ans=0.95 2024-09-19 10:44:29,799 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:44:32,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=643832.0, ans=10.0 2024-09-19 10:44:35,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.04 vs. limit=10.0 2024-09-19 10:44:38,943 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.46 vs. limit=5.0 2024-09-19 10:44:39,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=643832.0, ans=0.125 2024-09-19 10:44:47,513 INFO [train.py:1198] (1/2) Epoch 36, batch 2300, loss[loss=0.1741, simple_loss=0.2327, pruned_loss=0.04171, ctc_loss=0.09419, cr_loss=0.3323, over 34320.00 frames. ], tot_loss[loss=0.2055, simple_loss=0.2619, pruned_loss=0.05501, ctc_loss=0.1178, cr_loss=0.3903, over 6766399.59 frames. ], batch size: 83, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:44:48,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.18 vs. limit=15.0 2024-09-19 10:45:31,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.96 vs. limit=15.0 2024-09-19 10:45:32,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=643972.0, ans=0.2 2024-09-19 10:45:37,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=644018.6666666666, ans=0.07 2024-09-19 10:45:47,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=644018.6666666666, ans=0.025 2024-09-19 10:46:11,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=644065.3333333334, ans=0.2 2024-09-19 10:46:12,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=644112.0, ans=0.025 2024-09-19 10:46:13,866 INFO [train.py:1198] (1/2) Epoch 36, batch 2350, loss[loss=0.2151, simple_loss=0.2759, pruned_loss=0.05666, ctc_loss=0.1232, cr_loss=0.408, over 34705.00 frames. ], tot_loss[loss=0.2062, simple_loss=0.2626, pruned_loss=0.05531, ctc_loss=0.1182, cr_loss=0.3916, over 6772073.46 frames. ], batch size: 97, lr: 3.31e-03, grad_scale: 16.0 2024-09-19 10:46:22,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=644112.0, ans=0.125 2024-09-19 10:46:27,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=644112.0, ans=0.1 2024-09-19 10:46:37,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=644158.6666666666, ans=0.125 2024-09-19 10:46:50,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644205.3333333334, ans=0.1 2024-09-19 10:47:02,901 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.076e+02 2.465e+02 2.863e+02 3.614e+02 6.976e+02, threshold=5.727e+02, percent-clipped=1.0 2024-09-19 10:47:08,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=644252.0, ans=0.125 2024-09-19 10:47:13,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=644252.0, ans=0.0 2024-09-19 10:47:26,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=644298.6666666666, ans=0.0 2024-09-19 10:47:35,553 INFO [train.py:1198] (1/2) Epoch 36, batch 2400, loss[loss=0.1999, simple_loss=0.2531, pruned_loss=0.05361, ctc_loss=0.1163, cr_loss=0.4054, over 34592.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.2628, pruned_loss=0.05544, ctc_loss=0.1184, cr_loss=0.3924, over 6775998.23 frames. ], batch size: 89, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:47:37,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=644345.3333333334, ans=0.05 2024-09-19 10:47:55,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=644392.0, ans=0.0 2024-09-19 10:48:05,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=644392.0, ans=0.125 2024-09-19 10:48:17,081 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:48:33,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=644485.3333333334, ans=0.0 2024-09-19 10:48:38,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=644485.3333333334, ans=0.0 2024-09-19 10:48:57,962 INFO [train.py:1198] (1/2) Epoch 36, batch 2450, loss[loss=0.2094, simple_loss=0.2711, pruned_loss=0.05452, ctc_loss=0.1169, cr_loss=0.3823, over 34440.00 frames. ], tot_loss[loss=0.2075, simple_loss=0.2638, pruned_loss=0.0558, ctc_loss=0.1192, cr_loss=0.3941, over 6749941.49 frames. ], batch size: 95, lr: 3.31e-03, grad_scale: 32.0 2024-09-19 10:49:26,451 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 10:49:46,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=644672.0, ans=0.025 2024-09-19 10:49:51,131 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.248e+02 2.613e+02 2.956e+02 3.590e+02 5.770e+02, threshold=5.913e+02, percent-clipped=1.0 2024-09-19 10:49:56,462 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=644718.6666666666, ans=0.025 2024-09-19 10:49:59,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=644718.6666666666, ans=0.125 2024-09-19 10:50:24,155 INFO [train.py:1198] (1/2) Epoch 36, batch 2500, loss[loss=0.2112, simple_loss=0.273, pruned_loss=0.05497, ctc_loss=0.1187, cr_loss=0.3937, over 34449.00 frames. ], tot_loss[loss=0.2081, simple_loss=0.2643, pruned_loss=0.05606, ctc_loss=0.1197, cr_loss=0.3949, over 6761373.06 frames. ], batch size: 100, lr: 3.30e-03, grad_scale: 32.0 2024-09-19 10:50:36,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=644812.0, ans=0.0 2024-09-19 10:50:58,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2024-09-19 10:51:19,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=644952.0, ans=0.0 2024-09-19 10:51:40,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.32 vs. limit=15.0 2024-09-19 10:51:44,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=644998.6666666666, ans=0.125 2024-09-19 10:51:47,922 INFO [train.py:1198] (1/2) Epoch 36, batch 2550, loss[loss=0.1806, simple_loss=0.2382, pruned_loss=0.04509, ctc_loss=0.09885, cr_loss=0.3249, over 34138.00 frames. ], tot_loss[loss=0.2081, simple_loss=0.2641, pruned_loss=0.05613, ctc_loss=0.1198, cr_loss=0.3951, over 6764735.27 frames. ], batch size: 78, lr: 3.30e-03, grad_scale: 32.0 2024-09-19 10:51:48,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=645045.3333333334, ans=0.2 2024-09-19 10:51:53,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=645045.3333333334, ans=0.0 2024-09-19 10:51:55,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=645045.3333333334, ans=0.0 2024-09-19 10:51:57,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=12.0 2024-09-19 10:52:14,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=645092.0, ans=0.1 2024-09-19 10:52:22,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=645138.6666666666, ans=0.125 2024-09-19 10:52:37,345 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.202e+02 2.545e+02 2.922e+02 3.885e+02 6.664e+02, threshold=5.844e+02, percent-clipped=4.0 2024-09-19 10:52:40,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.96 vs. limit=10.0 2024-09-19 10:53:12,627 INFO [train.py:1198] (1/2) Epoch 36, batch 2600, loss[loss=0.204, simple_loss=0.2546, pruned_loss=0.05667, ctc_loss=0.1182, cr_loss=0.4088, over 34372.00 frames. ], tot_loss[loss=0.2086, simple_loss=0.2646, pruned_loss=0.05633, ctc_loss=0.1202, cr_loss=0.3958, over 6760834.46 frames. ], batch size: 91, lr: 3.30e-03, grad_scale: 32.0 2024-09-19 10:53:13,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=645278.6666666666, ans=0.125 2024-09-19 10:53:26,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=645278.6666666666, ans=0.125 2024-09-19 10:53:29,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.72 vs. limit=15.0 2024-09-19 10:53:53,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.92 vs. limit=15.0 2024-09-19 10:54:12,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=645418.6666666666, ans=0.125 2024-09-19 10:54:36,547 INFO [train.py:1198] (1/2) Epoch 36, batch 2650, loss[loss=0.2185, simple_loss=0.2803, pruned_loss=0.05767, ctc_loss=0.1242, cr_loss=0.4148, over 34240.00 frames. ], tot_loss[loss=0.2084, simple_loss=0.2647, pruned_loss=0.0562, ctc_loss=0.12, cr_loss=0.3956, over 6768308.59 frames. ], batch size: 117, lr: 3.30e-03, grad_scale: 32.0 2024-09-19 10:54:36,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=645512.0, ans=0.2 2024-09-19 10:54:38,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=645512.0, ans=0.05 2024-09-19 10:54:43,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=645512.0, ans=0.1 2024-09-19 10:55:04,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=645558.6666666666, ans=0.1 2024-09-19 10:55:11,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=645605.3333333334, ans=0.125 2024-09-19 10:55:14,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=645605.3333333334, ans=0.125 2024-09-19 10:55:24,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.45 vs. limit=15.0 2024-09-19 10:55:25,273 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.415e+02 2.744e+02 3.455e+02 5.724e+02, threshold=5.489e+02, percent-clipped=0.0 2024-09-19 10:55:40,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=645698.6666666666, ans=0.1 2024-09-19 10:55:58,390 INFO [train.py:1198] (1/2) Epoch 36, batch 2700, loss[loss=0.2014, simple_loss=0.2642, pruned_loss=0.05109, ctc_loss=0.109, cr_loss=0.3655, over 34621.00 frames. ], tot_loss[loss=0.2082, simple_loss=0.2646, pruned_loss=0.05607, ctc_loss=0.1198, cr_loss=0.3949, over 6762310.77 frames. ], batch size: 102, lr: 3.30e-03, grad_scale: 32.0 2024-09-19 10:56:12,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.41 vs. limit=22.5 2024-09-19 10:56:27,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.49 vs. limit=12.0 2024-09-19 10:56:31,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=645838.6666666666, ans=0.125 2024-09-19 10:57:20,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=645932.0, ans=0.0 2024-09-19 10:57:25,774 INFO [train.py:1198] (1/2) Epoch 36, batch 2750, loss[loss=0.2097, simple_loss=0.2601, pruned_loss=0.0594, ctc_loss=0.123, cr_loss=0.4008, over 34626.00 frames. ], tot_loss[loss=0.2073, simple_loss=0.2636, pruned_loss=0.05572, ctc_loss=0.1191, cr_loss=0.393, over 6759448.12 frames. ], batch size: 88, lr: 3.30e-03, grad_scale: 32.0 2024-09-19 10:57:55,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=646025.3333333334, ans=0.025 2024-09-19 10:58:15,040 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.075e+02 2.510e+02 2.921e+02 3.593e+02 6.977e+02, threshold=5.842e+02, percent-clipped=3.0 2024-09-19 10:58:21,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=646118.6666666666, ans=0.95 2024-09-19 10:58:23,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=646118.6666666666, ans=0.1 2024-09-19 10:58:37,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.63 vs. limit=6.0 2024-09-19 10:58:48,310 INFO [train.py:1198] (1/2) Epoch 36, batch 2800, loss[loss=0.2378, simple_loss=0.2858, pruned_loss=0.07079, ctc_loss=0.1525, cr_loss=0.4406, over 23057.00 frames. ], tot_loss[loss=0.2075, simple_loss=0.2636, pruned_loss=0.0559, ctc_loss=0.1195, cr_loss=0.3934, over 6737582.94 frames. ], batch size: 244, lr: 3.30e-03, grad_scale: 32.0 2024-09-19 10:59:57,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=646398.6666666666, ans=0.125 2024-09-19 11:00:10,372 INFO [train.py:1198] (1/2) Epoch 36, batch 2850, loss[loss=0.2145, simple_loss=0.2705, pruned_loss=0.05837, ctc_loss=0.126, cr_loss=0.4132, over 34476.00 frames. ], tot_loss[loss=0.2079, simple_loss=0.2641, pruned_loss=0.05602, ctc_loss=0.1197, cr_loss=0.3935, over 6722470.83 frames. ], batch size: 90, lr: 3.30e-03, grad_scale: 16.0 2024-09-19 11:00:41,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.54 vs. limit=12.0 2024-09-19 11:00:45,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.58 vs. limit=12.0 2024-09-19 11:00:51,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=646538.6666666666, ans=0.1 2024-09-19 11:00:51,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=646538.6666666666, ans=0.0 2024-09-19 11:00:59,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=646585.3333333334, ans=0.0 2024-09-19 11:01:02,707 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 2.538e+02 2.909e+02 3.698e+02 7.204e+02, threshold=5.817e+02, percent-clipped=1.0 2024-09-19 11:01:10,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=646585.3333333334, ans=0.125 2024-09-19 11:01:14,130 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2024-09-19 11:01:36,010 INFO [train.py:1198] (1/2) Epoch 36, batch 2900, loss[loss=0.1981, simple_loss=0.2538, pruned_loss=0.05212, ctc_loss=0.1122, cr_loss=0.3924, over 34551.00 frames. ], tot_loss[loss=0.2088, simple_loss=0.2652, pruned_loss=0.0563, ctc_loss=0.1202, cr_loss=0.3952, over 6753327.26 frames. ], batch size: 94, lr: 3.30e-03, grad_scale: 16.0 2024-09-19 11:01:47,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=646678.6666666666, ans=0.125 2024-09-19 11:01:51,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=646725.3333333334, ans=0.0 2024-09-19 11:01:56,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=646725.3333333334, ans=0.05 2024-09-19 11:01:56,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=646725.3333333334, ans=0.2 2024-09-19 11:01:59,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=646725.3333333334, ans=0.125 2024-09-19 11:02:11,106 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.963e-01 2024-09-19 11:02:12,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=646772.0, ans=0.125 2024-09-19 11:02:31,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=646818.6666666666, ans=0.0 2024-09-19 11:02:44,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=646865.3333333334, ans=0.07 2024-09-19 11:02:58,545 INFO [train.py:1198] (1/2) Epoch 36, batch 2950, loss[loss=0.2008, simple_loss=0.255, pruned_loss=0.05442, ctc_loss=0.1115, cr_loss=0.3864, over 34640.00 frames. ], tot_loss[loss=0.2077, simple_loss=0.2639, pruned_loss=0.05593, ctc_loss=0.1194, cr_loss=0.3931, over 6748235.39 frames. ], batch size: 88, lr: 3.30e-03, grad_scale: 16.0 2024-09-19 11:03:01,002 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.03 vs. limit=10.0 2024-09-19 11:03:04,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.86 vs. limit=15.0 2024-09-19 11:03:05,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=646912.0, ans=0.125 2024-09-19 11:03:12,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2024-09-19 11:03:26,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=646958.6666666666, ans=0.05 2024-09-19 11:03:40,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=647005.3333333334, ans=0.0 2024-09-19 11:03:49,716 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.048e+02 2.532e+02 2.977e+02 4.139e+02 6.894e+02, threshold=5.954e+02, percent-clipped=2.0 2024-09-19 11:04:05,006 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:04:12,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.93 vs. limit=15.0 2024-09-19 11:04:23,179 INFO [train.py:1198] (1/2) Epoch 36, batch 3000, loss[loss=0.2103, simple_loss=0.2644, pruned_loss=0.05767, ctc_loss=0.1217, cr_loss=0.4147, over 34537.00 frames. ], tot_loss[loss=0.2075, simple_loss=0.2637, pruned_loss=0.05584, ctc_loss=0.1193, cr_loss=0.3934, over 6748829.03 frames. ], batch size: 94, lr: 3.30e-03, grad_scale: 16.0 2024-09-19 11:04:23,179 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 11:04:39,934 INFO [train.py:1230] (1/2) Epoch 36, validation: loss=0.1487, simple_loss=0.2428, pruned_loss=0.02335, ctc_loss=0.03974, cr_loss=2.127e-14, over 944034.00 frames. 2024-09-19 11:04:39,935 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 11:04:54,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=647145.3333333334, ans=0.0 2024-09-19 11:05:15,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=647238.6666666666, ans=0.1 2024-09-19 11:05:59,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=647332.0, ans=0.125 2024-09-19 11:06:03,200 INFO [train.py:1198] (1/2) Epoch 36, batch 3050, loss[loss=0.2009, simple_loss=0.2551, pruned_loss=0.05464, ctc_loss=0.1132, cr_loss=0.3686, over 34603.00 frames. ], tot_loss[loss=0.2082, simple_loss=0.2644, pruned_loss=0.05615, ctc_loss=0.1198, cr_loss=0.3945, over 6740974.36 frames. ], batch size: 89, lr: 3.30e-03, grad_scale: 16.0 2024-09-19 11:06:19,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.02 vs. limit=8.0 2024-09-19 11:06:22,225 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.31 vs. limit=10.0 2024-09-19 11:06:40,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.05 vs. limit=15.0 2024-09-19 11:06:50,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=647518.6666666666, ans=0.125 2024-09-19 11:06:53,197 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.135e+02 2.500e+02 2.730e+02 3.231e+02 4.767e+02, threshold=5.460e+02, percent-clipped=0.0 2024-09-19 11:06:55,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=647518.6666666666, ans=0.0 2024-09-19 11:06:58,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=647518.6666666666, ans=0.0 2024-09-19 11:07:09,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=647565.3333333334, ans=0.125 2024-09-19 11:07:11,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=647565.3333333334, ans=0.125 2024-09-19 11:07:23,776 INFO [train.py:1198] (1/2) Epoch 36, batch 3100, loss[loss=0.2327, simple_loss=0.2927, pruned_loss=0.06407, ctc_loss=0.137, cr_loss=0.4283, over 34181.00 frames. ], tot_loss[loss=0.2083, simple_loss=0.2644, pruned_loss=0.05619, ctc_loss=0.12, cr_loss=0.3945, over 6740085.41 frames. ], batch size: 117, lr: 3.30e-03, grad_scale: 16.0 2024-09-19 11:07:40,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=647658.6666666666, ans=0.125 2024-09-19 11:07:43,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=647658.6666666666, ans=0.2 2024-09-19 11:08:00,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=647705.3333333334, ans=0.2 2024-09-19 11:08:04,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.61 vs. limit=22.5 2024-09-19 11:08:17,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=647752.0, ans=0.0 2024-09-19 11:08:30,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=647798.6666666666, ans=0.2 2024-09-19 11:08:30,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=647798.6666666666, ans=0.0 2024-09-19 11:08:45,010 INFO [train.py:1198] (1/2) Epoch 36, batch 3150, loss[loss=0.2329, simple_loss=0.2854, pruned_loss=0.06734, ctc_loss=0.1406, cr_loss=0.4389, over 33834.00 frames. ], tot_loss[loss=0.2082, simple_loss=0.2642, pruned_loss=0.05617, ctc_loss=0.12, cr_loss=0.3947, over 6746917.09 frames. ], batch size: 122, lr: 3.30e-03, grad_scale: 16.0 2024-09-19 11:08:47,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=647845.3333333334, ans=0.0 2024-09-19 11:08:50,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.73 vs. limit=15.0 2024-09-19 11:08:59,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=647892.0, ans=0.0 2024-09-19 11:09:11,892 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.31 vs. limit=10.0 2024-09-19 11:09:28,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.min_abs, batch_count=647938.6666666666, ans=0.5 2024-09-19 11:09:28,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=647938.6666666666, ans=0.07 2024-09-19 11:09:36,472 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.197e+02 2.615e+02 3.045e+02 3.882e+02 7.604e+02, threshold=6.090e+02, percent-clipped=5.0 2024-09-19 11:09:56,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=648032.0, ans=0.125 2024-09-19 11:09:56,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=648032.0, ans=0.125 2024-09-19 11:10:07,011 INFO [train.py:1198] (1/2) Epoch 36, batch 3200, loss[loss=0.2149, simple_loss=0.2708, pruned_loss=0.05875, ctc_loss=0.1256, cr_loss=0.4064, over 34530.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2632, pruned_loss=0.05564, ctc_loss=0.1191, cr_loss=0.3929, over 6760591.27 frames. ], batch size: 94, lr: 3.30e-03, grad_scale: 32.0 2024-09-19 11:10:10,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=648078.6666666666, ans=0.0 2024-09-19 11:10:18,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=648078.6666666666, ans=0.125 2024-09-19 11:10:31,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=648125.3333333334, ans=0.125 2024-09-19 11:10:49,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2024-09-19 11:10:52,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=648172.0, ans=0.125 2024-09-19 11:10:56,181 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:11:29,790 INFO [train.py:1198] (1/2) Epoch 36, batch 3250, loss[loss=0.211, simple_loss=0.2678, pruned_loss=0.05682, ctc_loss=0.1215, cr_loss=0.4029, over 34647.00 frames. ], tot_loss[loss=0.2075, simple_loss=0.2638, pruned_loss=0.05577, ctc_loss=0.1193, cr_loss=0.3936, over 6769612.35 frames. ], batch size: 98, lr: 3.30e-03, grad_scale: 32.0 2024-09-19 11:12:00,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=648405.3333333334, ans=0.0 2024-09-19 11:12:02,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=648405.3333333334, ans=0.125 2024-09-19 11:12:19,707 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.154e+02 2.616e+02 3.017e+02 3.841e+02 5.793e+02, threshold=6.034e+02, percent-clipped=0.0 2024-09-19 11:12:26,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=648452.0, ans=0.1 2024-09-19 11:12:45,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=648498.6666666666, ans=0.125 2024-09-19 11:12:50,303 INFO [train.py:1198] (1/2) Epoch 36, batch 3300, loss[loss=0.2353, simple_loss=0.292, pruned_loss=0.06618, ctc_loss=0.1411, cr_loss=0.4479, over 33391.00 frames. ], tot_loss[loss=0.2062, simple_loss=0.2625, pruned_loss=0.05531, ctc_loss=0.1184, cr_loss=0.3916, over 6768213.89 frames. ], batch size: 131, lr: 3.30e-03, grad_scale: 32.0 2024-09-19 11:13:50,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=648685.3333333334, ans=0.0 2024-09-19 11:13:53,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=648732.0, ans=0.025 2024-09-19 11:13:53,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=648732.0, ans=0.2 2024-09-19 11:14:09,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=648778.6666666666, ans=0.0 2024-09-19 11:14:10,930 INFO [train.py:1198] (1/2) Epoch 36, batch 3350, loss[loss=0.216, simple_loss=0.2744, pruned_loss=0.05853, ctc_loss=0.1223, cr_loss=0.4026, over 33813.00 frames. ], tot_loss[loss=0.2072, simple_loss=0.2634, pruned_loss=0.05569, ctc_loss=0.1192, cr_loss=0.3934, over 6742814.24 frames. ], batch size: 122, lr: 3.29e-03, grad_scale: 32.0 2024-09-19 11:14:12,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=648778.6666666666, ans=0.125 2024-09-19 11:14:53,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=648872.0, ans=0.1 2024-09-19 11:15:02,247 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.489e+02 2.724e+02 3.160e+02 4.521e+02, threshold=5.448e+02, percent-clipped=0.0 2024-09-19 11:15:07,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=648918.6666666666, ans=0.025 2024-09-19 11:15:20,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=648965.3333333334, ans=0.125 2024-09-19 11:15:26,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=648965.3333333334, ans=0.1 2024-09-19 11:15:32,977 INFO [train.py:1198] (1/2) Epoch 36, batch 3400, loss[loss=0.1808, simple_loss=0.229, pruned_loss=0.04876, ctc_loss=0.1036, cr_loss=0.358, over 34134.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2635, pruned_loss=0.05585, ctc_loss=0.1195, cr_loss=0.3939, over 6734150.75 frames. ], batch size: 78, lr: 3.29e-03, grad_scale: 32.0 2024-09-19 11:15:34,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=649012.0, ans=0.0 2024-09-19 11:15:39,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=649012.0, ans=0.0 2024-09-19 11:15:51,541 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2024-09-19 11:15:54,297 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=649058.6666666666, ans=0.2 2024-09-19 11:16:00,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=649058.6666666666, ans=0.0 2024-09-19 11:16:06,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=649105.3333333334, ans=0.2 2024-09-19 11:16:08,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=649105.3333333334, ans=0.0 2024-09-19 11:16:10,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=649105.3333333334, ans=0.5 2024-09-19 11:16:21,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=649152.0, ans=0.125 2024-09-19 11:16:31,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.56 vs. limit=15.0 2024-09-19 11:16:34,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.76 vs. limit=22.5 2024-09-19 11:16:54,701 INFO [train.py:1198] (1/2) Epoch 36, batch 3450, loss[loss=0.2234, simple_loss=0.2817, pruned_loss=0.06077, ctc_loss=0.1312, cr_loss=0.4329, over 33105.00 frames. ], tot_loss[loss=0.2075, simple_loss=0.2636, pruned_loss=0.05588, ctc_loss=0.1194, cr_loss=0.3939, over 6746666.13 frames. ], batch size: 130, lr: 3.29e-03, grad_scale: 32.0 2024-09-19 11:17:16,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=18.35 vs. limit=22.5 2024-09-19 11:17:44,298 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.193e+02 2.476e+02 2.822e+02 3.549e+02 5.585e+02, threshold=5.643e+02, percent-clipped=1.0 2024-09-19 11:17:48,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.29 vs. limit=22.5 2024-09-19 11:18:07,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=649432.0, ans=0.125 2024-09-19 11:18:14,883 INFO [train.py:1198] (1/2) Epoch 36, batch 3500, loss[loss=0.1817, simple_loss=0.2403, pruned_loss=0.04437, ctc_loss=0.1014, cr_loss=0.3524, over 34492.00 frames. ], tot_loss[loss=0.2073, simple_loss=0.2633, pruned_loss=0.05583, ctc_loss=0.1193, cr_loss=0.3938, over 6749037.64 frames. ], batch size: 85, lr: 3.29e-03, grad_scale: 32.0 2024-09-19 11:18:31,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649525.3333333334, ans=0.1 2024-09-19 11:19:10,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.82 vs. limit=22.5 2024-09-19 11:19:24,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.66 vs. limit=22.5 2024-09-19 11:19:36,291 INFO [train.py:1198] (1/2) Epoch 36, batch 3550, loss[loss=0.2051, simple_loss=0.2659, pruned_loss=0.05297, ctc_loss=0.1143, cr_loss=0.3856, over 34389.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2632, pruned_loss=0.05563, ctc_loss=0.119, cr_loss=0.3936, over 6759388.45 frames. ], batch size: 103, lr: 3.29e-03, grad_scale: 32.0 2024-09-19 11:19:52,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=649758.6666666666, ans=0.05 2024-09-19 11:20:05,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.00 vs. limit=22.5 2024-09-19 11:20:23,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=649852.0, ans=0.1 2024-09-19 11:20:26,806 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.156e+02 2.559e+02 2.987e+02 3.655e+02 1.202e+03, threshold=5.973e+02, percent-clipped=2.0 2024-09-19 11:20:57,375 INFO [train.py:1198] (1/2) Epoch 36, batch 3600, loss[loss=0.1935, simple_loss=0.2485, pruned_loss=0.05009, ctc_loss=0.1126, cr_loss=0.3948, over 34516.00 frames. ], tot_loss[loss=0.2071, simple_loss=0.2634, pruned_loss=0.05563, ctc_loss=0.119, cr_loss=0.3937, over 6767758.88 frames. ], batch size: 90, lr: 3.29e-03, grad_scale: 32.0 2024-09-19 11:21:06,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2024-09-19 11:21:15,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2024-09-19 11:21:16,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=649992.0, ans=0.1 2024-09-19 11:21:37,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=650038.6666666666, ans=0.0 2024-09-19 11:21:47,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=650085.3333333334, ans=15.0 2024-09-19 11:22:07,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=650132.0, ans=0.125 2024-09-19 11:22:17,066 INFO [train.py:1198] (1/2) Epoch 36, batch 3650, loss[loss=0.2251, simple_loss=0.2818, pruned_loss=0.06323, ctc_loss=0.1285, cr_loss=0.4062, over 34417.00 frames. ], tot_loss[loss=0.2067, simple_loss=0.2631, pruned_loss=0.05544, ctc_loss=0.1186, cr_loss=0.3927, over 6770325.77 frames. ], batch size: 110, lr: 3.29e-03, grad_scale: 16.0 2024-09-19 11:22:25,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=650178.6666666666, ans=0.125 2024-09-19 11:22:33,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=650225.3333333334, ans=0.0 2024-09-19 11:22:40,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=650225.3333333334, ans=0.125 2024-09-19 11:22:44,106 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.08 vs. limit=6.0 2024-09-19 11:22:53,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=650272.0, ans=0.125 2024-09-19 11:22:55,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=15.0 2024-09-19 11:23:06,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=650318.6666666666, ans=0.125 2024-09-19 11:23:08,998 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.525e+02 2.884e+02 3.774e+02 7.363e+02, threshold=5.769e+02, percent-clipped=4.0 2024-09-19 11:23:17,723 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2024-09-19 11:23:28,421 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:23:37,635 INFO [train.py:1198] (1/2) Epoch 36, batch 3700, loss[loss=0.2106, simple_loss=0.272, pruned_loss=0.05497, ctc_loss=0.1171, cr_loss=0.3952, over 34658.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.2633, pruned_loss=0.05528, ctc_loss=0.1183, cr_loss=0.3916, over 6785910.78 frames. ], batch size: 102, lr: 3.29e-03, grad_scale: 16.0 2024-09-19 11:23:52,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=650458.6666666666, ans=0.125 2024-09-19 11:23:58,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=650458.6666666666, ans=0.2 2024-09-19 11:24:15,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=650505.3333333334, ans=0.125 2024-09-19 11:24:20,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=650505.3333333334, ans=0.125 2024-09-19 11:24:30,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=650552.0, ans=0.0 2024-09-19 11:24:32,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=650552.0, ans=0.125 2024-09-19 11:24:32,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=650552.0, ans=0.125 2024-09-19 11:24:33,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=650552.0, ans=0.125 2024-09-19 11:24:40,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=650552.0, ans=0.125 2024-09-19 11:24:40,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650552.0, ans=0.1 2024-09-19 11:24:58,489 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:24:59,610 INFO [train.py:1198] (1/2) Epoch 36, batch 3750, loss[loss=0.2168, simple_loss=0.2736, pruned_loss=0.0597, ctc_loss=0.1237, cr_loss=0.3969, over 34405.00 frames. ], tot_loss[loss=0.2097, simple_loss=0.2663, pruned_loss=0.05653, ctc_loss=0.1206, cr_loss=0.3969, over 6787688.39 frames. ], batch size: 113, lr: 3.29e-03, grad_scale: 16.0 2024-09-19 11:25:03,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=650645.3333333334, ans=0.2 2024-09-19 11:25:04,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=650645.3333333334, ans=0.125 2024-09-19 11:25:24,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=650692.0, ans=0.0 2024-09-19 11:25:51,652 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.140e+02 2.390e+02 2.531e+02 2.779e+02 5.390e+02, threshold=5.063e+02, percent-clipped=0.0 2024-09-19 11:25:53,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=650785.3333333334, ans=0.0 2024-09-19 11:26:00,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=650785.3333333334, ans=0.125 2024-09-19 11:26:14,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=650832.0, ans=0.1 2024-09-19 11:26:14,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=650832.0, ans=0.0 2024-09-19 11:26:20,888 INFO [train.py:1198] (1/2) Epoch 36, batch 3800, loss[loss=0.2318, simple_loss=0.2798, pruned_loss=0.06896, ctc_loss=0.1414, cr_loss=0.4409, over 29990.00 frames. ], tot_loss[loss=0.2126, simple_loss=0.2687, pruned_loss=0.05785, ctc_loss=0.1231, cr_loss=0.4025, over 6677183.64 frames. ], batch size: 175, lr: 3.29e-03, grad_scale: 16.0 2024-09-19 11:26:22,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=650878.6666666666, ans=0.125 2024-09-19 11:26:25,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=650878.6666666666, ans=0.125 2024-09-19 11:26:27,221 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=3.963e-02 2024-09-19 11:26:45,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=650925.3333333334, ans=0.125 2024-09-19 11:26:45,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=650925.3333333334, ans=0.0 2024-09-19 11:26:48,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.92 vs. limit=15.0 2024-09-19 11:26:54,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=650972.0, ans=0.2 2024-09-19 11:26:54,117 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:27:03,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=650972.0, ans=10.0 2024-09-19 11:27:10,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=651018.6666666666, ans=0.125 2024-09-19 11:27:12,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=651018.6666666666, ans=0.025 2024-09-19 11:27:17,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.44 vs. limit=22.5 2024-09-19 11:27:30,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=651065.3333333334, ans=0.0 2024-09-19 11:27:34,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=651065.3333333334, ans=0.1 2024-09-19 11:27:45,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.82 vs. limit=22.5 2024-09-19 11:27:46,010 INFO [train.py:1198] (1/2) Epoch 36, batch 3850, loss[loss=0.2326, simple_loss=0.2791, pruned_loss=0.0692, ctc_loss=0.1498, cr_loss=0.4411, over 23427.00 frames. ], tot_loss[loss=0.2155, simple_loss=0.2706, pruned_loss=0.05945, ctc_loss=0.1266, cr_loss=0.4062, over 6249232.47 frames. ], batch size: 245, lr: 3.29e-03, grad_scale: 16.0 2024-09-19 11:27:57,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=651112.0, ans=0.025 2024-09-19 11:27:59,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=651112.0, ans=0.125 2024-09-19 11:28:03,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.90 vs. limit=8.0 2024-09-19 11:28:11,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=651158.6666666666, ans=0.125 2024-09-19 11:29:13,418 INFO [train.py:1198] (1/2) Epoch 37, batch 0, loss[loss=0.1836, simple_loss=0.2413, pruned_loss=0.04607, ctc_loss=0.09856, cr_loss=0.3503, over 34480.00 frames. ], tot_loss[loss=0.1836, simple_loss=0.2413, pruned_loss=0.04607, ctc_loss=0.09856, cr_loss=0.3503, over 34480.00 frames. ], batch size: 85, lr: 3.24e-03, grad_scale: 32.0 2024-09-19 11:29:13,418 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 11:29:30,179 INFO [train.py:1230] (1/2) Epoch 37, validation: loss=0.1485, simple_loss=0.2435, pruned_loss=0.02289, ctc_loss=0.03869, cr_loss=2.15e-14, over 944034.00 frames. 2024-09-19 11:29:30,179 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 11:29:33,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=651233.3333333334, ans=0.125 2024-09-19 11:29:39,896 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.148e+02 2.611e+02 2.775e+02 3.048e+02 5.194e+02, threshold=5.551e+02, percent-clipped=1.0 2024-09-19 11:29:45,349 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=651233.3333333334, ans=0.2 2024-09-19 11:29:55,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=651280.0, ans=0.125 2024-09-19 11:30:00,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=651280.0, ans=0.125 2024-09-19 11:30:11,997 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.42 vs. limit=15.0 2024-09-19 11:30:34,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=651373.3333333334, ans=0.05 2024-09-19 11:30:37,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=651420.0, ans=0.025 2024-09-19 11:30:41,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=651420.0, ans=0.125 2024-09-19 11:30:54,421 INFO [train.py:1198] (1/2) Epoch 37, batch 50, loss[loss=0.1845, simple_loss=0.2385, pruned_loss=0.04789, ctc_loss=0.1042, cr_loss=0.3448, over 34450.00 frames. ], tot_loss[loss=0.2079, simple_loss=0.264, pruned_loss=0.05601, ctc_loss=0.1193, cr_loss=0.3978, over 1480832.24 frames. ], batch size: 82, lr: 3.24e-03, grad_scale: 32.0 2024-09-19 11:31:07,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=651466.6666666666, ans=0.1 2024-09-19 11:31:08,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=651466.6666666666, ans=0.125 2024-09-19 11:31:16,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=651513.3333333334, ans=15.0 2024-09-19 11:31:23,394 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.96 vs. limit=10.0 2024-09-19 11:32:00,996 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=15.0 2024-09-19 11:32:02,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=651653.3333333334, ans=0.0 2024-09-19 11:32:16,640 INFO [train.py:1198] (1/2) Epoch 37, batch 100, loss[loss=0.1891, simple_loss=0.2469, pruned_loss=0.0479, ctc_loss=0.1052, cr_loss=0.3606, over 34564.00 frames. ], tot_loss[loss=0.21, simple_loss=0.2664, pruned_loss=0.05675, ctc_loss=0.121, cr_loss=0.4003, over 2628999.79 frames. ], batch size: 89, lr: 3.24e-03, grad_scale: 32.0 2024-09-19 11:32:28,748 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.159e+02 2.507e+02 2.815e+02 3.556e+02 5.797e+02, threshold=5.629e+02, percent-clipped=1.0 2024-09-19 11:32:47,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=651746.6666666666, ans=0.125 2024-09-19 11:33:32,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=651886.6666666666, ans=0.0 2024-09-19 11:33:42,080 INFO [train.py:1198] (1/2) Epoch 37, batch 150, loss[loss=0.1736, simple_loss=0.2338, pruned_loss=0.04107, ctc_loss=0.09043, cr_loss=0.3266, over 34486.00 frames. ], tot_loss[loss=0.2079, simple_loss=0.2644, pruned_loss=0.05584, ctc_loss=0.1194, cr_loss=0.3956, over 3557579.79 frames. ], batch size: 82, lr: 3.24e-03, grad_scale: 32.0 2024-09-19 11:33:55,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=651933.3333333334, ans=0.125 2024-09-19 11:34:25,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=652026.6666666666, ans=0.0 2024-09-19 11:34:30,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=652073.3333333334, ans=0.0 2024-09-19 11:34:33,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=652073.3333333334, ans=0.0 2024-09-19 11:34:34,370 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.61 vs. limit=15.0 2024-09-19 11:35:04,208 INFO [train.py:1198] (1/2) Epoch 37, batch 200, loss[loss=0.2118, simple_loss=0.2703, pruned_loss=0.05677, ctc_loss=0.1223, cr_loss=0.3851, over 31981.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.2632, pruned_loss=0.05532, ctc_loss=0.1185, cr_loss=0.3937, over 4272895.47 frames. ], batch size: 145, lr: 3.24e-03, grad_scale: 32.0 2024-09-19 11:35:11,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=652166.6666666666, ans=0.125 2024-09-19 11:35:14,033 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.668e+02 3.174e+02 4.671e+02 7.201e+02, threshold=6.348e+02, percent-clipped=8.0 2024-09-19 11:35:24,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=652213.3333333334, ans=0.125 2024-09-19 11:35:37,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=652260.0, ans=0.125 2024-09-19 11:35:39,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=652260.0, ans=0.0 2024-09-19 11:35:55,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=652306.6666666666, ans=0.125 2024-09-19 11:36:00,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.40 vs. limit=6.0 2024-09-19 11:36:26,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=652353.3333333334, ans=0.125 2024-09-19 11:36:29,334 INFO [train.py:1198] (1/2) Epoch 37, batch 250, loss[loss=0.2207, simple_loss=0.2794, pruned_loss=0.05998, ctc_loss=0.1257, cr_loss=0.4219, over 34268.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.2633, pruned_loss=0.05529, ctc_loss=0.1185, cr_loss=0.3935, over 4836053.18 frames. ], batch size: 117, lr: 3.24e-03, grad_scale: 16.0 2024-09-19 11:36:36,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=652400.0, ans=0.05 2024-09-19 11:37:01,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=652493.3333333334, ans=0.125 2024-09-19 11:37:04,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=652493.3333333334, ans=0.125 2024-09-19 11:37:07,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=652493.3333333334, ans=0.2 2024-09-19 11:37:12,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=652493.3333333334, ans=0.95 2024-09-19 11:37:31,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=652540.0, ans=0.125 2024-09-19 11:37:42,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=652586.6666666666, ans=0.0 2024-09-19 11:37:53,416 INFO [train.py:1198] (1/2) Epoch 37, batch 300, loss[loss=0.2312, simple_loss=0.2861, pruned_loss=0.06569, ctc_loss=0.1362, cr_loss=0.4406, over 34373.00 frames. ], tot_loss[loss=0.2063, simple_loss=0.2629, pruned_loss=0.05519, ctc_loss=0.1182, cr_loss=0.3929, over 5263400.14 frames. ], batch size: 107, lr: 3.24e-03, grad_scale: 16.0 2024-09-19 11:38:03,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=652633.3333333334, ans=0.125 2024-09-19 11:38:05,085 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.069e+02 2.475e+02 2.914e+02 3.679e+02 7.761e+02, threshold=5.827e+02, percent-clipped=4.0 2024-09-19 11:38:18,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=652680.0, ans=0.125 2024-09-19 11:38:21,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=652680.0, ans=0.125 2024-09-19 11:38:39,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=652726.6666666666, ans=0.09899494936611666 2024-09-19 11:38:39,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=652726.6666666666, ans=0.0 2024-09-19 11:39:15,889 INFO [train.py:1198] (1/2) Epoch 37, batch 350, loss[loss=0.1875, simple_loss=0.2452, pruned_loss=0.0475, ctc_loss=0.1043, cr_loss=0.3496, over 34313.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.2631, pruned_loss=0.05534, ctc_loss=0.1185, cr_loss=0.3929, over 5598154.89 frames. ], batch size: 83, lr: 3.24e-03, grad_scale: 16.0 2024-09-19 11:39:21,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=652866.6666666666, ans=0.125 2024-09-19 11:39:29,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2024-09-19 11:39:31,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=652913.3333333334, ans=0.0 2024-09-19 11:39:40,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=652913.3333333334, ans=0.0 2024-09-19 11:39:50,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=652960.0, ans=0.0 2024-09-19 11:39:53,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.93 vs. limit=6.0 2024-09-19 11:40:01,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2024-09-19 11:40:09,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=653006.6666666666, ans=0.0 2024-09-19 11:40:12,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=653006.6666666666, ans=0.07 2024-09-19 11:40:40,133 INFO [train.py:1198] (1/2) Epoch 37, batch 400, loss[loss=0.2173, simple_loss=0.2738, pruned_loss=0.05962, ctc_loss=0.1238, cr_loss=0.4186, over 34433.00 frames. ], tot_loss[loss=0.206, simple_loss=0.2626, pruned_loss=0.05509, ctc_loss=0.1179, cr_loss=0.3921, over 5865795.10 frames. ], batch size: 95, lr: 3.24e-03, grad_scale: 32.0 2024-09-19 11:40:45,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.82 vs. limit=12.0 2024-09-19 11:40:52,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=653100.0, ans=0.125 2024-09-19 11:40:53,589 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.183e+02 2.500e+02 2.995e+02 3.548e+02 6.095e+02, threshold=5.991e+02, percent-clipped=1.0 2024-09-19 11:41:11,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.67 vs. limit=15.0 2024-09-19 11:41:13,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=653193.3333333334, ans=0.0 2024-09-19 11:41:40,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=653240.0, ans=0.025 2024-09-19 11:41:54,007 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 11:42:11,598 INFO [train.py:1198] (1/2) Epoch 37, batch 450, loss[loss=0.215, simple_loss=0.2728, pruned_loss=0.05845, ctc_loss=0.1232, cr_loss=0.394, over 34731.00 frames. ], tot_loss[loss=0.2062, simple_loss=0.2628, pruned_loss=0.05515, ctc_loss=0.1181, cr_loss=0.3926, over 6054209.16 frames. ], batch size: 97, lr: 3.24e-03, grad_scale: 32.0 2024-09-19 11:42:26,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=653380.0, ans=0.0 2024-09-19 11:42:28,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=653380.0, ans=0.2 2024-09-19 11:42:40,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2024-09-19 11:42:51,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=15.0 2024-09-19 11:43:11,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=653473.3333333334, ans=0.125 2024-09-19 11:43:12,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=653473.3333333334, ans=0.0 2024-09-19 11:43:33,797 INFO [train.py:1198] (1/2) Epoch 37, batch 500, loss[loss=0.2315, simple_loss=0.2865, pruned_loss=0.06608, ctc_loss=0.1363, cr_loss=0.4274, over 34451.00 frames. ], tot_loss[loss=0.2058, simple_loss=0.2621, pruned_loss=0.0551, ctc_loss=0.1179, cr_loss=0.3918, over 6221824.80 frames. ], batch size: 110, lr: 3.24e-03, grad_scale: 32.0 2024-09-19 11:43:39,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=653566.6666666666, ans=0.2 2024-09-19 11:43:45,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.166e+02 2.432e+02 2.765e+02 3.614e+02 5.958e+02, threshold=5.530e+02, percent-clipped=0.0 2024-09-19 11:43:52,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=653613.3333333334, ans=0.1 2024-09-19 11:43:57,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.54 vs. limit=15.0 2024-09-19 11:44:20,167 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.55 vs. limit=15.0 2024-09-19 11:44:22,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=653660.0, ans=0.125 2024-09-19 11:44:27,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=653706.6666666666, ans=0.0 2024-09-19 11:44:34,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=653706.6666666666, ans=0.025 2024-09-19 11:44:39,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=653706.6666666666, ans=0.125 2024-09-19 11:44:50,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=653753.3333333334, ans=0.1 2024-09-19 11:44:57,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=653753.3333333334, ans=0.2 2024-09-19 11:45:00,481 INFO [train.py:1198] (1/2) Epoch 37, batch 550, loss[loss=0.2187, simple_loss=0.2771, pruned_loss=0.05914, ctc_loss=0.1272, cr_loss=0.4118, over 33885.00 frames. ], tot_loss[loss=0.2061, simple_loss=0.2625, pruned_loss=0.0552, ctc_loss=0.118, cr_loss=0.3921, over 6331758.33 frames. ], batch size: 122, lr: 3.24e-03, grad_scale: 32.0 2024-09-19 11:45:02,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=653800.0, ans=0.125 2024-09-19 11:45:32,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=653893.3333333334, ans=0.0 2024-09-19 11:45:43,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=653893.3333333334, ans=0.1 2024-09-19 11:45:47,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.35 vs. limit=10.0 2024-09-19 11:46:15,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=653986.6666666666, ans=0.07 2024-09-19 11:46:23,033 INFO [train.py:1198] (1/2) Epoch 37, batch 600, loss[loss=0.207, simple_loss=0.2674, pruned_loss=0.05363, ctc_loss=0.117, cr_loss=0.3978, over 34206.00 frames. ], tot_loss[loss=0.2063, simple_loss=0.2628, pruned_loss=0.05528, ctc_loss=0.1181, cr_loss=0.3924, over 6434641.18 frames. ], batch size: 117, lr: 3.24e-03, grad_scale: 32.0 2024-09-19 11:46:25,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=654033.3333333334, ans=0.125 2024-09-19 11:46:34,690 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.156e+02 2.618e+02 3.055e+02 4.014e+02 9.649e+02, threshold=6.111e+02, percent-clipped=2.0 2024-09-19 11:47:05,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=654126.6666666666, ans=0.125 2024-09-19 11:47:13,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=15.0 2024-09-19 11:47:45,335 INFO [train.py:1198] (1/2) Epoch 37, batch 650, loss[loss=0.2011, simple_loss=0.2608, pruned_loss=0.05187, ctc_loss=0.1134, cr_loss=0.3722, over 34560.00 frames. ], tot_loss[loss=0.2054, simple_loss=0.262, pruned_loss=0.05483, ctc_loss=0.1174, cr_loss=0.3908, over 6525841.28 frames. ], batch size: 94, lr: 3.24e-03, grad_scale: 32.0 2024-09-19 11:48:31,129 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2024-09-19 11:48:32,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=654360.0, ans=0.125 2024-09-19 11:48:56,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=654453.3333333334, ans=0.1 2024-09-19 11:49:00,933 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2024-09-19 11:49:08,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=654453.3333333334, ans=0.1 2024-09-19 11:49:11,439 INFO [train.py:1198] (1/2) Epoch 37, batch 700, loss[loss=0.1928, simple_loss=0.2465, pruned_loss=0.05116, ctc_loss=0.1109, cr_loss=0.3665, over 34596.00 frames. ], tot_loss[loss=0.206, simple_loss=0.2626, pruned_loss=0.05509, ctc_loss=0.1179, cr_loss=0.3916, over 6581139.20 frames. ], batch size: 89, lr: 3.23e-03, grad_scale: 32.0 2024-09-19 11:49:22,962 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.109e+02 2.451e+02 2.915e+02 3.978e+02 9.151e+02, threshold=5.830e+02, percent-clipped=5.0 2024-09-19 11:49:34,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=654546.6666666666, ans=0.125 2024-09-19 11:49:49,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=654593.3333333334, ans=0.0 2024-09-19 11:49:56,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=654593.3333333334, ans=0.0 2024-09-19 11:50:09,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=654640.0, ans=0.0 2024-09-19 11:50:09,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=654640.0, ans=0.1 2024-09-19 11:50:33,565 INFO [train.py:1198] (1/2) Epoch 37, batch 750, loss[loss=0.2153, simple_loss=0.2708, pruned_loss=0.05909, ctc_loss=0.1254, cr_loss=0.4114, over 34428.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2622, pruned_loss=0.05489, ctc_loss=0.1176, cr_loss=0.3908, over 6624818.80 frames. ], batch size: 95, lr: 3.23e-03, grad_scale: 32.0 2024-09-19 11:50:37,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=654733.3333333334, ans=0.125 2024-09-19 11:50:55,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654780.0, ans=0.1 2024-09-19 11:51:03,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=654780.0, ans=0.125 2024-09-19 11:51:58,652 INFO [train.py:1198] (1/2) Epoch 37, batch 800, loss[loss=0.1827, simple_loss=0.2399, pruned_loss=0.04518, ctc_loss=0.1063, cr_loss=0.346, over 34511.00 frames. ], tot_loss[loss=0.2058, simple_loss=0.2624, pruned_loss=0.05494, ctc_loss=0.1178, cr_loss=0.3912, over 6660819.30 frames. ], batch size: 85, lr: 3.23e-03, grad_scale: 32.0 2024-09-19 11:52:09,984 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.081e+02 2.539e+02 2.942e+02 3.912e+02 6.382e+02, threshold=5.884e+02, percent-clipped=1.0 2024-09-19 11:53:11,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=655153.3333333334, ans=0.1 2024-09-19 11:53:22,436 INFO [train.py:1198] (1/2) Epoch 37, batch 850, loss[loss=0.2121, simple_loss=0.2737, pruned_loss=0.05528, ctc_loss=0.118, cr_loss=0.4087, over 34374.00 frames. ], tot_loss[loss=0.2055, simple_loss=0.2623, pruned_loss=0.05483, ctc_loss=0.1174, cr_loss=0.3907, over 6693357.79 frames. ], batch size: 103, lr: 3.23e-03, grad_scale: 32.0 2024-09-19 11:53:42,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=655246.6666666666, ans=0.125 2024-09-19 11:54:10,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=655340.0, ans=0.125 2024-09-19 11:54:18,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=655340.0, ans=0.125 2024-09-19 11:54:22,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=655340.0, ans=0.125 2024-09-19 11:54:39,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=655386.6666666666, ans=0.0 2024-09-19 11:54:45,393 INFO [train.py:1198] (1/2) Epoch 37, batch 900, loss[loss=0.1894, simple_loss=0.2439, pruned_loss=0.04916, ctc_loss=0.1065, cr_loss=0.3827, over 34493.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.2626, pruned_loss=0.055, ctc_loss=0.1179, cr_loss=0.3918, over 6699268.49 frames. ], batch size: 85, lr: 3.23e-03, grad_scale: 32.0 2024-09-19 11:54:45,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=655433.3333333334, ans=0.125 2024-09-19 11:54:54,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=22.5 2024-09-19 11:54:56,792 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.501e+02 2.949e+02 4.151e+02 5.929e+02, threshold=5.898e+02, percent-clipped=1.0 2024-09-19 11:55:00,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=655480.0, ans=0.125 2024-09-19 11:55:08,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=655480.0, ans=0.025 2024-09-19 11:55:15,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=655480.0, ans=0.125 2024-09-19 11:55:20,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=655526.6666666666, ans=0.2 2024-09-19 11:55:55,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=655620.0, ans=0.125 2024-09-19 11:56:00,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=655620.0, ans=0.0 2024-09-19 11:56:09,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=655620.0, ans=0.2 2024-09-19 11:56:11,888 INFO [train.py:1198] (1/2) Epoch 37, batch 950, loss[loss=0.1889, simple_loss=0.25, pruned_loss=0.04631, ctc_loss=0.1036, cr_loss=0.3612, over 34723.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.2628, pruned_loss=0.0549, ctc_loss=0.1177, cr_loss=0.391, over 6702565.88 frames. ], batch size: 87, lr: 3.23e-03, grad_scale: 32.0 2024-09-19 11:56:36,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=655713.3333333334, ans=0.125 2024-09-19 11:56:36,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=655713.3333333334, ans=10.0 2024-09-19 11:56:58,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=655760.0, ans=0.025 2024-09-19 11:57:00,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=655806.6666666666, ans=15.0 2024-09-19 11:57:03,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=655806.6666666666, ans=0.125 2024-09-19 11:57:03,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=655806.6666666666, ans=0.025 2024-09-19 11:57:06,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=655806.6666666666, ans=0.125 2024-09-19 11:57:14,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=655806.6666666666, ans=0.0 2024-09-19 11:57:33,738 INFO [train.py:1198] (1/2) Epoch 37, batch 1000, loss[loss=0.1969, simple_loss=0.2542, pruned_loss=0.05111, ctc_loss=0.1105, cr_loss=0.383, over 34511.00 frames. ], tot_loss[loss=0.2065, simple_loss=0.2632, pruned_loss=0.05523, ctc_loss=0.1183, cr_loss=0.3924, over 6695425.97 frames. ], batch size: 90, lr: 3.23e-03, grad_scale: 32.0 2024-09-19 11:57:45,275 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.039e+02 2.667e+02 3.247e+02 3.940e+02 1.025e+03, threshold=6.493e+02, percent-clipped=3.0 2024-09-19 11:58:00,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=655946.6666666666, ans=0.05 2024-09-19 11:58:18,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=655993.3333333334, ans=0.125 2024-09-19 11:58:21,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=656040.0, ans=0.2 2024-09-19 11:58:25,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=656040.0, ans=0.1 2024-09-19 11:58:53,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2024-09-19 11:58:56,545 INFO [train.py:1198] (1/2) Epoch 37, batch 1050, loss[loss=0.2036, simple_loss=0.2685, pruned_loss=0.05119, ctc_loss=0.1093, cr_loss=0.3634, over 34569.00 frames. ], tot_loss[loss=0.2058, simple_loss=0.2624, pruned_loss=0.05502, ctc_loss=0.1178, cr_loss=0.391, over 6705293.88 frames. ], batch size: 99, lr: 3.23e-03, grad_scale: 32.0 2024-09-19 11:59:12,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.37 vs. limit=10.0 2024-09-19 11:59:23,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=15.0 2024-09-19 11:59:26,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=656180.0, ans=0.0 2024-09-19 11:59:43,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=656226.6666666666, ans=0.2 2024-09-19 11:59:50,867 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.45 vs. limit=22.5 2024-09-19 12:00:10,092 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.74 vs. limit=15.0 2024-09-19 12:00:16,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=656320.0, ans=0.025 2024-09-19 12:00:17,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=656320.0, ans=0.1 2024-09-19 12:00:22,521 INFO [train.py:1198] (1/2) Epoch 37, batch 1100, loss[loss=0.1972, simple_loss=0.2541, pruned_loss=0.05137, ctc_loss=0.1127, cr_loss=0.3766, over 34390.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2621, pruned_loss=0.0549, ctc_loss=0.1177, cr_loss=0.3909, over 6717961.58 frames. ], batch size: 91, lr: 3.23e-03, grad_scale: 32.0 2024-09-19 12:00:26,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=656366.6666666666, ans=0.0 2024-09-19 12:00:34,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.132e+02 2.453e+02 2.942e+02 3.551e+02 5.191e+02, threshold=5.883e+02, percent-clipped=0.0 2024-09-19 12:00:49,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=656413.3333333334, ans=0.0 2024-09-19 12:01:04,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=656460.0, ans=0.125 2024-09-19 12:01:14,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=656506.6666666666, ans=0.02 2024-09-19 12:01:44,902 INFO [train.py:1198] (1/2) Epoch 37, batch 1150, loss[loss=0.2077, simple_loss=0.2628, pruned_loss=0.05602, ctc_loss=0.1195, cr_loss=0.4132, over 34364.00 frames. ], tot_loss[loss=0.2053, simple_loss=0.2618, pruned_loss=0.05484, ctc_loss=0.1175, cr_loss=0.3904, over 6715285.69 frames. ], batch size: 91, lr: 3.23e-03, grad_scale: 32.0 2024-09-19 12:01:49,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.22 vs. limit=15.0 2024-09-19 12:01:51,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=656600.0, ans=0.2 2024-09-19 12:02:01,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=656646.6666666666, ans=0.125 2024-09-19 12:02:20,760 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.95 vs. limit=15.0 2024-09-19 12:02:31,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=656693.3333333334, ans=0.125 2024-09-19 12:02:35,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.66 vs. limit=15.0 2024-09-19 12:02:36,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=656740.0, ans=0.0 2024-09-19 12:02:38,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=656740.0, ans=10.0 2024-09-19 12:02:48,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=656740.0, ans=0.0 2024-09-19 12:03:09,902 INFO [train.py:1198] (1/2) Epoch 37, batch 1200, loss[loss=0.2098, simple_loss=0.2647, pruned_loss=0.05731, ctc_loss=0.1222, cr_loss=0.3976, over 34543.00 frames. ], tot_loss[loss=0.206, simple_loss=0.2627, pruned_loss=0.05505, ctc_loss=0.118, cr_loss=0.3915, over 6708091.45 frames. ], batch size: 99, lr: 3.23e-03, grad_scale: 32.0 2024-09-19 12:03:23,394 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.932e+02 2.462e+02 2.830e+02 3.397e+02 5.242e+02, threshold=5.659e+02, percent-clipped=0.0 2024-09-19 12:03:43,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=656926.6666666666, ans=0.0 2024-09-19 12:03:51,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=656926.6666666666, ans=0.07 2024-09-19 12:03:54,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=15.0 2024-09-19 12:04:16,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=657020.0, ans=0.1 2024-09-19 12:04:30,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.35 vs. limit=22.5 2024-09-19 12:04:34,569 INFO [train.py:1198] (1/2) Epoch 37, batch 1250, loss[loss=0.2183, simple_loss=0.2724, pruned_loss=0.06104, ctc_loss=0.126, cr_loss=0.4191, over 34366.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2636, pruned_loss=0.05548, ctc_loss=0.1188, cr_loss=0.3939, over 6741852.20 frames. ], batch size: 107, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 12:04:40,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.23 vs. limit=15.0 2024-09-19 12:04:48,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=657066.6666666666, ans=0.2 2024-09-19 12:05:03,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=657113.3333333334, ans=0.07 2024-09-19 12:05:50,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=657253.3333333334, ans=0.025 2024-09-19 12:05:55,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=657300.0, ans=0.125 2024-09-19 12:05:57,132 INFO [train.py:1198] (1/2) Epoch 37, batch 1300, loss[loss=0.2055, simple_loss=0.2679, pruned_loss=0.05239, ctc_loss=0.1162, cr_loss=0.3751, over 33080.00 frames. ], tot_loss[loss=0.2063, simple_loss=0.2629, pruned_loss=0.05521, ctc_loss=0.1181, cr_loss=0.3917, over 6745665.15 frames. ], batch size: 130, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 12:06:10,086 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.074e+02 2.557e+02 2.827e+02 3.560e+02 8.810e+02, threshold=5.653e+02, percent-clipped=3.0 2024-09-19 12:06:37,203 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.74 vs. limit=15.0 2024-09-19 12:06:39,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=657393.3333333334, ans=0.0 2024-09-19 12:06:56,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=657440.0, ans=0.125 2024-09-19 12:07:08,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=657486.6666666666, ans=0.0 2024-09-19 12:07:12,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=657486.6666666666, ans=0.025 2024-09-19 12:07:23,050 INFO [train.py:1198] (1/2) Epoch 37, batch 1350, loss[loss=0.1951, simple_loss=0.2528, pruned_loss=0.0502, ctc_loss=0.1097, cr_loss=0.3771, over 34518.00 frames. ], tot_loss[loss=0.2061, simple_loss=0.2628, pruned_loss=0.05511, ctc_loss=0.1179, cr_loss=0.3915, over 6766767.10 frames. ], batch size: 94, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 12:07:29,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=657533.3333333334, ans=0.125 2024-09-19 12:07:34,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_na.min_abs, batch_count=657533.3333333334, ans=0.02 2024-09-19 12:07:37,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=657580.0, ans=10.0 2024-09-19 12:07:54,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=657626.6666666666, ans=0.0 2024-09-19 12:08:01,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.99 vs. limit=10.0 2024-09-19 12:08:20,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=657673.3333333334, ans=0.125 2024-09-19 12:08:25,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=657673.3333333334, ans=0.015 2024-09-19 12:08:26,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=657720.0, ans=0.015 2024-09-19 12:08:44,898 INFO [train.py:1198] (1/2) Epoch 37, batch 1400, loss[loss=0.186, simple_loss=0.2415, pruned_loss=0.04737, ctc_loss=0.1046, cr_loss=0.3703, over 34273.00 frames. ], tot_loss[loss=0.2061, simple_loss=0.2628, pruned_loss=0.05511, ctc_loss=0.1179, cr_loss=0.3915, over 6777885.74 frames. ], batch size: 80, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 12:08:48,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=657766.6666666666, ans=0.1 2024-09-19 12:08:57,957 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.219e+02 2.569e+02 3.045e+02 3.519e+02 6.478e+02, threshold=6.090e+02, percent-clipped=1.0 2024-09-19 12:09:42,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=657906.6666666666, ans=0.0 2024-09-19 12:09:43,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.19 vs. limit=10.0 2024-09-19 12:09:48,133 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.96 vs. limit=15.0 2024-09-19 12:10:07,484 INFO [train.py:1198] (1/2) Epoch 37, batch 1450, loss[loss=0.224, simple_loss=0.276, pruned_loss=0.06383, ctc_loss=0.1341, cr_loss=0.4379, over 34449.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.2633, pruned_loss=0.05526, ctc_loss=0.1182, cr_loss=0.3923, over 6773268.90 frames. ], batch size: 110, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 12:10:16,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=658000.0, ans=0.125 2024-09-19 12:10:39,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=658093.3333333334, ans=0.05 2024-09-19 12:11:06,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2024-09-19 12:11:10,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=658140.0, ans=0.0 2024-09-19 12:11:33,573 INFO [train.py:1198] (1/2) Epoch 37, batch 1500, loss[loss=0.2204, simple_loss=0.2803, pruned_loss=0.05933, ctc_loss=0.1263, cr_loss=0.4166, over 34455.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2637, pruned_loss=0.05542, ctc_loss=0.1186, cr_loss=0.3936, over 6773946.31 frames. ], batch size: 100, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 12:11:46,615 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.076e+02 2.487e+02 2.809e+02 3.078e+02 5.126e+02, threshold=5.618e+02, percent-clipped=0.0 2024-09-19 12:11:54,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=8.0 2024-09-19 12:12:08,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=658326.6666666666, ans=0.0 2024-09-19 12:12:33,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=658373.3333333334, ans=0.125 2024-09-19 12:12:38,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=658420.0, ans=0.1 2024-09-19 12:12:39,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=658420.0, ans=0.125 2024-09-19 12:12:50,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=22.5 2024-09-19 12:12:56,028 INFO [train.py:1198] (1/2) Epoch 37, batch 1550, loss[loss=0.2223, simple_loss=0.2779, pruned_loss=0.06173, ctc_loss=0.1294, cr_loss=0.4349, over 34427.00 frames. ], tot_loss[loss=0.2069, simple_loss=0.2634, pruned_loss=0.05544, ctc_loss=0.1186, cr_loss=0.3939, over 6745767.04 frames. ], batch size: 105, lr: 3.23e-03, grad_scale: 16.0 2024-09-19 12:13:04,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=658466.6666666666, ans=0.0 2024-09-19 12:13:40,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=658560.0, ans=0.0 2024-09-19 12:13:42,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=658560.0, ans=0.125 2024-09-19 12:13:58,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=15.0 2024-09-19 12:14:17,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=658700.0, ans=0.0 2024-09-19 12:14:18,443 INFO [train.py:1198] (1/2) Epoch 37, batch 1600, loss[loss=0.1979, simple_loss=0.2599, pruned_loss=0.04955, ctc_loss=0.1098, cr_loss=0.3706, over 34573.00 frames. ], tot_loss[loss=0.2072, simple_loss=0.2636, pruned_loss=0.05566, ctc_loss=0.119, cr_loss=0.3944, over 6725452.87 frames. ], batch size: 99, lr: 3.22e-03, grad_scale: 32.0 2024-09-19 12:14:21,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=658700.0, ans=0.0 2024-09-19 12:14:25,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=658700.0, ans=0.1 2024-09-19 12:14:28,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=658700.0, ans=0.2 2024-09-19 12:14:35,312 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.187e+02 2.520e+02 2.884e+02 3.569e+02 7.461e+02, threshold=5.769e+02, percent-clipped=6.0 2024-09-19 12:14:37,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=658746.6666666666, ans=0.05 2024-09-19 12:14:40,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=658746.6666666666, ans=0.125 2024-09-19 12:14:45,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=658746.6666666666, ans=0.1 2024-09-19 12:14:59,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=658793.3333333334, ans=0.125 2024-09-19 12:15:45,043 INFO [train.py:1198] (1/2) Epoch 37, batch 1650, loss[loss=0.2093, simple_loss=0.2748, pruned_loss=0.05274, ctc_loss=0.1138, cr_loss=0.3886, over 34391.00 frames. ], tot_loss[loss=0.2071, simple_loss=0.2635, pruned_loss=0.0556, ctc_loss=0.1189, cr_loss=0.3938, over 6719167.52 frames. ], batch size: 103, lr: 3.22e-03, grad_scale: 32.0 2024-09-19 12:16:23,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=659026.6666666666, ans=0.125 2024-09-19 12:16:23,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=659026.6666666666, ans=0.0 2024-09-19 12:16:54,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.59 vs. limit=22.5 2024-09-19 12:17:06,813 INFO [train.py:1198] (1/2) Epoch 37, batch 1700, loss[loss=0.1795, simple_loss=0.2326, pruned_loss=0.04623, ctc_loss=0.1011, cr_loss=0.3458, over 34290.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.2631, pruned_loss=0.05537, ctc_loss=0.1185, cr_loss=0.3929, over 6745208.80 frames. ], batch size: 80, lr: 3.22e-03, grad_scale: 32.0 2024-09-19 12:17:19,960 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.151e+02 2.591e+02 3.048e+02 3.612e+02 7.891e+02, threshold=6.096e+02, percent-clipped=4.0 2024-09-19 12:18:13,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=659353.3333333334, ans=0.125 2024-09-19 12:18:28,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=659353.3333333334, ans=0.0 2024-09-19 12:18:33,191 INFO [train.py:1198] (1/2) Epoch 37, batch 1750, loss[loss=0.1784, simple_loss=0.2342, pruned_loss=0.04462, ctc_loss=0.09744, cr_loss=0.3454, over 34168.00 frames. ], tot_loss[loss=0.2065, simple_loss=0.263, pruned_loss=0.0553, ctc_loss=0.1182, cr_loss=0.3926, over 6753950.38 frames. ], batch size: 78, lr: 3.22e-03, grad_scale: 32.0 2024-09-19 12:18:34,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.41 vs. limit=22.5 2024-09-19 12:19:24,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=659540.0, ans=0.0 2024-09-19 12:19:33,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=659540.0, ans=0.1 2024-09-19 12:19:55,065 INFO [train.py:1198] (1/2) Epoch 37, batch 1800, loss[loss=0.2104, simple_loss=0.2675, pruned_loss=0.057, ctc_loss=0.119, cr_loss=0.3883, over 34712.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.2629, pruned_loss=0.05524, ctc_loss=0.118, cr_loss=0.3923, over 6757112.59 frames. ], batch size: 97, lr: 3.22e-03, grad_scale: 32.0 2024-09-19 12:19:55,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=659633.3333333334, ans=0.05 2024-09-19 12:20:00,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659633.3333333334, ans=0.1 2024-09-19 12:20:02,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=659633.3333333334, ans=0.2 2024-09-19 12:20:08,251 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.913e+02 2.536e+02 2.959e+02 3.948e+02 6.538e+02, threshold=5.918e+02, percent-clipped=2.0 2024-09-19 12:20:15,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.56 vs. limit=10.0 2024-09-19 12:20:18,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=659680.0, ans=0.2 2024-09-19 12:20:37,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=659726.6666666666, ans=0.0 2024-09-19 12:20:39,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659726.6666666666, ans=0.1 2024-09-19 12:21:02,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=659820.0, ans=0.0 2024-09-19 12:21:15,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=12.41 vs. limit=15.0 2024-09-19 12:21:16,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=659866.6666666666, ans=0.1 2024-09-19 12:21:18,023 INFO [train.py:1198] (1/2) Epoch 37, batch 1850, loss[loss=0.2095, simple_loss=0.2719, pruned_loss=0.05422, ctc_loss=0.1166, cr_loss=0.3825, over 34464.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.2629, pruned_loss=0.05522, ctc_loss=0.1181, cr_loss=0.3925, over 6763881.68 frames. ], batch size: 100, lr: 3.22e-03, grad_scale: 32.0 2024-09-19 12:21:19,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=659866.6666666666, ans=0.025 2024-09-19 12:21:20,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=659866.6666666666, ans=0.2 2024-09-19 12:21:21,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=659866.6666666666, ans=0.125 2024-09-19 12:21:24,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=659866.6666666666, ans=0.125 2024-09-19 12:21:34,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=659913.3333333334, ans=0.125 2024-09-19 12:21:50,511 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=15.0 2024-09-19 12:21:51,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=659960.0, ans=0.125 2024-09-19 12:22:20,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660006.6666666666, ans=0.1 2024-09-19 12:22:28,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=660053.3333333334, ans=0.2 2024-09-19 12:22:41,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.59 vs. limit=10.0 2024-09-19 12:22:44,310 INFO [train.py:1198] (1/2) Epoch 37, batch 1900, loss[loss=0.2092, simple_loss=0.2716, pruned_loss=0.05387, ctc_loss=0.1162, cr_loss=0.3964, over 34385.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.2632, pruned_loss=0.05512, ctc_loss=0.1181, cr_loss=0.3926, over 6772236.41 frames. ], batch size: 103, lr: 3.22e-03, grad_scale: 32.0 2024-09-19 12:22:44,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=660100.0, ans=0.125 2024-09-19 12:22:52,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=660100.0, ans=0.2 2024-09-19 12:22:57,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.105e+02 2.522e+02 2.899e+02 3.862e+02 7.067e+02, threshold=5.799e+02, percent-clipped=2.0 2024-09-19 12:22:59,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=660146.6666666666, ans=0.125 2024-09-19 12:23:09,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=22.5 2024-09-19 12:23:12,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=660146.6666666666, ans=0.2 2024-09-19 12:23:25,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=660193.3333333334, ans=0.125 2024-09-19 12:23:28,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=660193.3333333334, ans=0.025 2024-09-19 12:23:41,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660240.0, ans=0.1 2024-09-19 12:23:41,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=660240.0, ans=0.125 2024-09-19 12:23:48,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=660286.6666666666, ans=0.125 2024-09-19 12:23:56,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=660286.6666666666, ans=0.0 2024-09-19 12:24:04,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=660333.3333333334, ans=0.125 2024-09-19 12:24:05,506 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2024-09-19 12:24:06,201 INFO [train.py:1198] (1/2) Epoch 37, batch 1950, loss[loss=0.2039, simple_loss=0.2584, pruned_loss=0.05481, ctc_loss=0.1194, cr_loss=0.3969, over 34347.00 frames. ], tot_loss[loss=0.2076, simple_loss=0.2644, pruned_loss=0.05558, ctc_loss=0.1189, cr_loss=0.395, over 6789177.11 frames. ], batch size: 91, lr: 3.22e-03, grad_scale: 32.0 2024-09-19 12:24:21,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=660380.0, ans=0.1 2024-09-19 12:24:43,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=660426.6666666666, ans=0.2 2024-09-19 12:25:00,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=660473.3333333334, ans=0.025 2024-09-19 12:25:14,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=660520.0, ans=0.125 2024-09-19 12:25:19,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=660520.0, ans=0.1 2024-09-19 12:25:23,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=15.0 2024-09-19 12:25:28,662 INFO [train.py:1198] (1/2) Epoch 37, batch 2000, loss[loss=0.1771, simple_loss=0.2311, pruned_loss=0.04468, ctc_loss=0.09901, cr_loss=0.3485, over 34186.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.2646, pruned_loss=0.05569, ctc_loss=0.1192, cr_loss=0.3956, over 6765151.37 frames. ], batch size: 78, lr: 3.22e-03, grad_scale: 32.0 2024-09-19 12:25:44,013 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.068e+02 2.503e+02 2.785e+02 3.380e+02 6.583e+02, threshold=5.570e+02, percent-clipped=1.0 2024-09-19 12:26:07,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=660660.0, ans=0.125 2024-09-19 12:26:13,298 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2024-09-19 12:26:22,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=660706.6666666666, ans=0.125 2024-09-19 12:26:26,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=660706.6666666666, ans=0.0 2024-09-19 12:26:29,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=660706.6666666666, ans=0.025 2024-09-19 12:26:44,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=660753.3333333334, ans=0.09899494936611666 2024-09-19 12:26:52,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=660753.3333333334, ans=0.07 2024-09-19 12:26:55,826 INFO [train.py:1198] (1/2) Epoch 37, batch 2050, loss[loss=0.1925, simple_loss=0.2461, pruned_loss=0.05116, ctc_loss=0.1084, cr_loss=0.3734, over 34483.00 frames. ], tot_loss[loss=0.2069, simple_loss=0.2635, pruned_loss=0.05542, ctc_loss=0.1186, cr_loss=0.3939, over 6756686.00 frames. ], batch size: 82, lr: 3.22e-03, grad_scale: 32.0 2024-09-19 12:27:00,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.45 vs. limit=8.0 2024-09-19 12:27:02,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=660800.0, ans=0.0 2024-09-19 12:27:06,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=660800.0, ans=0.2 2024-09-19 12:27:19,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=660846.6666666666, ans=0.125 2024-09-19 12:27:28,178 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.34 vs. limit=15.0 2024-09-19 12:27:39,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=660893.3333333334, ans=0.125 2024-09-19 12:27:45,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=660940.0, ans=0.125 2024-09-19 12:27:48,230 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.63 vs. limit=10.0 2024-09-19 12:27:50,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660940.0, ans=0.1 2024-09-19 12:27:59,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.63 vs. limit=10.0 2024-09-19 12:28:12,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660986.6666666666, ans=0.1 2024-09-19 12:28:18,425 INFO [train.py:1198] (1/2) Epoch 37, batch 2100, loss[loss=0.1958, simple_loss=0.255, pruned_loss=0.04998, ctc_loss=0.1081, cr_loss=0.3734, over 34532.00 frames. ], tot_loss[loss=0.2069, simple_loss=0.2635, pruned_loss=0.05542, ctc_loss=0.1186, cr_loss=0.3938, over 6769244.20 frames. ], batch size: 94, lr: 3.22e-03, grad_scale: 32.0 2024-09-19 12:28:31,631 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.461e+02 2.918e+02 3.618e+02 6.172e+02, threshold=5.836e+02, percent-clipped=3.0 2024-09-19 12:29:14,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=661173.3333333334, ans=0.04949747468305833 2024-09-19 12:29:42,119 INFO [train.py:1198] (1/2) Epoch 37, batch 2150, loss[loss=0.2103, simple_loss=0.2639, pruned_loss=0.05844, ctc_loss=0.1205, cr_loss=0.3915, over 34354.00 frames. ], tot_loss[loss=0.2063, simple_loss=0.2629, pruned_loss=0.05516, ctc_loss=0.1181, cr_loss=0.3927, over 6788083.02 frames. ], batch size: 91, lr: 3.22e-03, grad_scale: 32.0 2024-09-19 12:29:44,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=661266.6666666666, ans=0.0 2024-09-19 12:29:57,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=661266.6666666666, ans=0.1 2024-09-19 12:30:13,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.57 vs. limit=15.0 2024-09-19 12:30:30,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=661360.0, ans=0.1 2024-09-19 12:30:51,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=661453.3333333334, ans=0.0 2024-09-19 12:31:04,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=661500.0, ans=0.0 2024-09-19 12:31:05,984 INFO [train.py:1198] (1/2) Epoch 37, batch 2200, loss[loss=0.1957, simple_loss=0.2636, pruned_loss=0.04619, ctc_loss=0.1051, cr_loss=0.3636, over 34460.00 frames. ], tot_loss[loss=0.206, simple_loss=0.2627, pruned_loss=0.05504, ctc_loss=0.1179, cr_loss=0.3915, over 6782706.44 frames. ], batch size: 100, lr: 3.22e-03, grad_scale: 16.0 2024-09-19 12:31:20,694 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.173e+02 2.629e+02 3.056e+02 3.983e+02 7.235e+02, threshold=6.113e+02, percent-clipped=4.0 2024-09-19 12:31:24,365 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:31:39,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=661593.3333333334, ans=0.95 2024-09-19 12:31:59,179 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.70 vs. limit=10.0 2024-09-19 12:32:08,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=661640.0, ans=0.125 2024-09-19 12:32:23,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=661686.6666666666, ans=10.0 2024-09-19 12:32:25,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=661686.6666666666, ans=0.025 2024-09-19 12:32:28,364 INFO [train.py:1198] (1/2) Epoch 37, batch 2250, loss[loss=0.2126, simple_loss=0.2698, pruned_loss=0.0575, ctc_loss=0.1215, cr_loss=0.4031, over 34422.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.2627, pruned_loss=0.05495, ctc_loss=0.1177, cr_loss=0.3907, over 6781070.67 frames. ], batch size: 95, lr: 3.22e-03, grad_scale: 16.0 2024-09-19 12:33:09,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=661826.6666666666, ans=0.0 2024-09-19 12:33:21,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=661873.3333333334, ans=0.125 2024-09-19 12:33:42,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=661920.0, ans=0.125 2024-09-19 12:33:54,016 INFO [train.py:1198] (1/2) Epoch 37, batch 2300, loss[loss=0.184, simple_loss=0.2419, pruned_loss=0.04558, ctc_loss=0.1016, cr_loss=0.3642, over 34265.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.2614, pruned_loss=0.05449, ctc_loss=0.1168, cr_loss=0.3885, over 6766064.05 frames. ], batch size: 83, lr: 3.22e-03, grad_scale: 16.0 2024-09-19 12:34:00,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=661966.6666666666, ans=0.0 2024-09-19 12:34:04,118 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:34:06,516 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.93 vs. limit=15.0 2024-09-19 12:34:08,507 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.113e+02 2.733e+02 3.063e+02 3.576e+02 5.618e+02, threshold=6.126e+02, percent-clipped=0.0 2024-09-19 12:34:17,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=662013.3333333334, ans=0.125 2024-09-19 12:34:23,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=662013.3333333334, ans=0.05 2024-09-19 12:34:53,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=662106.6666666666, ans=0.0 2024-09-19 12:34:58,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=662153.3333333334, ans=0.125 2024-09-19 12:35:05,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=22.5 2024-09-19 12:35:05,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.86 vs. limit=22.5 2024-09-19 12:35:16,248 INFO [train.py:1198] (1/2) Epoch 37, batch 2350, loss[loss=0.2142, simple_loss=0.2717, pruned_loss=0.05771, ctc_loss=0.1232, cr_loss=0.4189, over 34700.00 frames. ], tot_loss[loss=0.2054, simple_loss=0.2619, pruned_loss=0.05492, ctc_loss=0.1175, cr_loss=0.3904, over 6772033.51 frames. ], batch size: 97, lr: 3.22e-03, grad_scale: 16.0 2024-09-19 12:35:23,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=662200.0, ans=0.1 2024-09-19 12:35:47,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=662293.3333333334, ans=0.0 2024-09-19 12:36:11,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=662340.0, ans=0.0 2024-09-19 12:36:25,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=662386.6666666666, ans=0.125 2024-09-19 12:36:29,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=662386.6666666666, ans=0.125 2024-09-19 12:36:38,833 INFO [train.py:1198] (1/2) Epoch 37, batch 2400, loss[loss=0.2009, simple_loss=0.2555, pruned_loss=0.0538, ctc_loss=0.1136, cr_loss=0.3984, over 34578.00 frames. ], tot_loss[loss=0.2057, simple_loss=0.2622, pruned_loss=0.05497, ctc_loss=0.1176, cr_loss=0.3911, over 6776251.15 frames. ], batch size: 89, lr: 3.22e-03, grad_scale: 32.0 2024-09-19 12:36:56,822 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.039e+02 2.471e+02 2.992e+02 4.011e+02 5.423e+02, threshold=5.984e+02, percent-clipped=0.0 2024-09-19 12:37:04,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=662480.0, ans=0.0 2024-09-19 12:37:08,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2024-09-19 12:37:15,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=662526.6666666666, ans=0.07 2024-09-19 12:37:31,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=662573.3333333334, ans=0.0 2024-09-19 12:37:32,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=662573.3333333334, ans=0.025 2024-09-19 12:37:47,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=662620.0, ans=0.07 2024-09-19 12:38:05,645 INFO [train.py:1198] (1/2) Epoch 37, batch 2450, loss[loss=0.2221, simple_loss=0.2765, pruned_loss=0.06224, ctc_loss=0.1294, cr_loss=0.4347, over 34427.00 frames. ], tot_loss[loss=0.2063, simple_loss=0.2629, pruned_loss=0.05517, ctc_loss=0.1181, cr_loss=0.3923, over 6750795.44 frames. ], batch size: 95, lr: 3.21e-03, grad_scale: 16.0 2024-09-19 12:38:38,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=662760.0, ans=0.125 2024-09-19 12:38:38,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=662760.0, ans=0.0 2024-09-19 12:38:43,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=662760.0, ans=0.125 2024-09-19 12:38:56,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=662806.6666666666, ans=0.125 2024-09-19 12:39:15,742 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.66 vs. limit=15.0 2024-09-19 12:39:27,833 INFO [train.py:1198] (1/2) Epoch 37, batch 2500, loss[loss=0.2129, simple_loss=0.2704, pruned_loss=0.05735, ctc_loss=0.1213, cr_loss=0.4078, over 34450.00 frames. ], tot_loss[loss=0.2065, simple_loss=0.263, pruned_loss=0.05527, ctc_loss=0.1183, cr_loss=0.3928, over 6761856.71 frames. ], batch size: 100, lr: 3.21e-03, grad_scale: 16.0 2024-09-19 12:39:30,307 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=12.0 2024-09-19 12:39:33,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=662900.0, ans=0.125 2024-09-19 12:39:38,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=662900.0, ans=0.125 2024-09-19 12:39:43,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=662946.6666666666, ans=0.125 2024-09-19 12:39:44,423 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.135e+02 2.458e+02 2.944e+02 3.652e+02 5.619e+02, threshold=5.889e+02, percent-clipped=0.0 2024-09-19 12:39:52,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=662946.6666666666, ans=0.125 2024-09-19 12:40:36,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=663086.6666666666, ans=0.125 2024-09-19 12:40:41,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=663086.6666666666, ans=0.125 2024-09-19 12:40:44,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=663086.6666666666, ans=0.125 2024-09-19 12:40:52,307 INFO [train.py:1198] (1/2) Epoch 37, batch 2550, loss[loss=0.1746, simple_loss=0.2314, pruned_loss=0.04263, ctc_loss=0.09535, cr_loss=0.3378, over 34192.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.2626, pruned_loss=0.05499, ctc_loss=0.1178, cr_loss=0.3922, over 6765214.20 frames. ], batch size: 78, lr: 3.21e-03, grad_scale: 8.0 2024-09-19 12:40:56,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=663133.3333333334, ans=0.125 2024-09-19 12:41:10,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=663180.0, ans=0.2 2024-09-19 12:41:24,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=663180.0, ans=0.0 2024-09-19 12:41:29,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.10 vs. limit=12.0 2024-09-19 12:41:48,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=663273.3333333334, ans=0.125 2024-09-19 12:41:57,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=663273.3333333334, ans=0.0 2024-09-19 12:42:08,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=663320.0, ans=0.0 2024-09-19 12:42:10,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.13 vs. limit=22.5 2024-09-19 12:42:11,973 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:42:16,682 INFO [train.py:1198] (1/2) Epoch 37, batch 2600, loss[loss=0.2089, simple_loss=0.2705, pruned_loss=0.05405, ctc_loss=0.1175, cr_loss=0.3938, over 34364.00 frames. ], tot_loss[loss=0.2065, simple_loss=0.2632, pruned_loss=0.05519, ctc_loss=0.1181, cr_loss=0.393, over 6761614.43 frames. ], batch size: 91, lr: 3.21e-03, grad_scale: 8.0 2024-09-19 12:42:28,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=663366.6666666666, ans=0.125 2024-09-19 12:42:28,783 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2024-09-19 12:42:31,767 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:42:34,601 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.529e+02 2.825e+02 3.677e+02 8.971e+02, threshold=5.650e+02, percent-clipped=1.0 2024-09-19 12:42:36,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=663413.3333333334, ans=0.0 2024-09-19 12:43:34,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=663553.3333333334, ans=0.025 2024-09-19 12:43:39,330 INFO [train.py:1198] (1/2) Epoch 37, batch 2650, loss[loss=0.2036, simple_loss=0.2678, pruned_loss=0.05127, ctc_loss=0.1112, cr_loss=0.3676, over 34241.00 frames. ], tot_loss[loss=0.2063, simple_loss=0.2631, pruned_loss=0.05506, ctc_loss=0.1179, cr_loss=0.3926, over 6768760.15 frames. ], batch size: 117, lr: 3.21e-03, grad_scale: 8.0 2024-09-19 12:44:02,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=663646.6666666666, ans=0.025 2024-09-19 12:44:06,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.36 vs. limit=15.0 2024-09-19 12:44:07,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=663646.6666666666, ans=0.125 2024-09-19 12:44:44,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=663740.0, ans=0.07 2024-09-19 12:45:00,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=663786.6666666666, ans=0.125 2024-09-19 12:45:02,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=663786.6666666666, ans=0.0 2024-09-19 12:45:05,399 INFO [train.py:1198] (1/2) Epoch 37, batch 2700, loss[loss=0.2126, simple_loss=0.2731, pruned_loss=0.05598, ctc_loss=0.1205, cr_loss=0.3981, over 34620.00 frames. ], tot_loss[loss=0.2067, simple_loss=0.2635, pruned_loss=0.05526, ctc_loss=0.1182, cr_loss=0.3931, over 6763714.66 frames. ], batch size: 102, lr: 3.21e-03, grad_scale: 8.0 2024-09-19 12:45:15,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=663833.3333333334, ans=0.2 2024-09-19 12:45:23,533 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.097e+02 2.531e+02 2.871e+02 3.328e+02 6.190e+02, threshold=5.741e+02, percent-clipped=1.0 2024-09-19 12:45:35,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=663880.0, ans=0.125 2024-09-19 12:45:35,490 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2024-09-19 12:45:40,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=663926.6666666666, ans=0.0 2024-09-19 12:45:41,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=663926.6666666666, ans=0.05 2024-09-19 12:46:00,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=663973.3333333334, ans=0.07 2024-09-19 12:46:02,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=663973.3333333334, ans=0.1 2024-09-19 12:46:06,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=663973.3333333334, ans=0.125 2024-09-19 12:46:11,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=664020.0, ans=0.125 2024-09-19 12:46:27,960 INFO [train.py:1198] (1/2) Epoch 37, batch 2750, loss[loss=0.186, simple_loss=0.2443, pruned_loss=0.04638, ctc_loss=0.1036, cr_loss=0.3551, over 34635.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2623, pruned_loss=0.05486, ctc_loss=0.1175, cr_loss=0.3914, over 6760390.11 frames. ], batch size: 88, lr: 3.21e-03, grad_scale: 8.0 2024-09-19 12:46:31,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=664066.6666666666, ans=0.125 2024-09-19 12:46:38,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=664066.6666666666, ans=0.1 2024-09-19 12:46:50,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=664113.3333333334, ans=0.1 2024-09-19 12:47:06,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=664160.0, ans=0.07 2024-09-19 12:47:21,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=664206.6666666666, ans=0.2 2024-09-19 12:47:30,116 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.82 vs. limit=6.0 2024-09-19 12:47:36,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=664253.3333333334, ans=0.1 2024-09-19 12:47:52,630 INFO [train.py:1198] (1/2) Epoch 37, batch 2800, loss[loss=0.2246, simple_loss=0.2764, pruned_loss=0.06437, ctc_loss=0.1362, cr_loss=0.4222, over 24070.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.2625, pruned_loss=0.05504, ctc_loss=0.1179, cr_loss=0.3915, over 6739599.00 frames. ], batch size: 244, lr: 3.21e-03, grad_scale: 16.0 2024-09-19 12:48:10,618 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.029e+02 2.634e+02 3.176e+02 4.013e+02 8.141e+02, threshold=6.353e+02, percent-clipped=4.0 2024-09-19 12:48:16,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2024-09-19 12:48:45,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.24 vs. limit=12.0 2024-09-19 12:48:46,450 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=22.5 2024-09-19 12:48:51,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=664440.0, ans=0.5 2024-09-19 12:49:06,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=664486.6666666666, ans=0.07 2024-09-19 12:49:06,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=664486.6666666666, ans=0.125 2024-09-19 12:49:12,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=664486.6666666666, ans=0.125 2024-09-19 12:49:17,446 INFO [train.py:1198] (1/2) Epoch 37, batch 2850, loss[loss=0.2003, simple_loss=0.2567, pruned_loss=0.0537, ctc_loss=0.1117, cr_loss=0.3508, over 34498.00 frames. ], tot_loss[loss=0.2061, simple_loss=0.2628, pruned_loss=0.0551, ctc_loss=0.118, cr_loss=0.3919, over 6724468.99 frames. ], batch size: 90, lr: 3.21e-03, grad_scale: 16.0 2024-09-19 12:49:22,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=664533.3333333334, ans=0.05 2024-09-19 12:49:41,371 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.85 vs. limit=15.0 2024-09-19 12:49:48,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.58 vs. limit=12.0 2024-09-19 12:50:00,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=664626.6666666666, ans=0.125 2024-09-19 12:50:40,004 INFO [train.py:1198] (1/2) Epoch 37, batch 2900, loss[loss=0.2141, simple_loss=0.2699, pruned_loss=0.05901, ctc_loss=0.1227, cr_loss=0.3945, over 34531.00 frames. ], tot_loss[loss=0.2074, simple_loss=0.2641, pruned_loss=0.05556, ctc_loss=0.1188, cr_loss=0.3942, over 6755036.66 frames. ], batch size: 94, lr: 3.21e-03, grad_scale: 16.0 2024-09-19 12:50:40,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=664766.6666666666, ans=0.0 2024-09-19 12:50:58,000 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 2.534e+02 3.182e+02 4.377e+02 7.171e+02, threshold=6.363e+02, percent-clipped=2.0 2024-09-19 12:51:01,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=664813.3333333334, ans=0.125 2024-09-19 12:51:14,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=664860.0, ans=0.0 2024-09-19 12:51:23,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=664860.0, ans=15.0 2024-09-19 12:51:26,273 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:51:37,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=664906.6666666666, ans=0.125 2024-09-19 12:51:57,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=664953.3333333334, ans=0.2 2024-09-19 12:52:03,858 INFO [train.py:1198] (1/2) Epoch 37, batch 2950, loss[loss=0.2053, simple_loss=0.2571, pruned_loss=0.05653, ctc_loss=0.1217, cr_loss=0.4007, over 34626.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.2626, pruned_loss=0.05498, ctc_loss=0.1177, cr_loss=0.3915, over 6749580.19 frames. ], batch size: 88, lr: 3.21e-03, grad_scale: 16.0 2024-09-19 12:52:31,169 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.69 vs. limit=15.0 2024-09-19 12:52:42,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=665093.3333333334, ans=0.2 2024-09-19 12:52:57,557 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2024-09-19 12:52:58,980 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 12:53:08,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=665140.0, ans=0.0 2024-09-19 12:53:19,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.36 vs. limit=15.0 2024-09-19 12:53:23,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.58 vs. limit=15.0 2024-09-19 12:53:28,293 INFO [train.py:1198] (1/2) Epoch 37, batch 3000, loss[loss=0.2078, simple_loss=0.2649, pruned_loss=0.05585, ctc_loss=0.1175, cr_loss=0.3902, over 34532.00 frames. ], tot_loss[loss=0.2061, simple_loss=0.2628, pruned_loss=0.05506, ctc_loss=0.1178, cr_loss=0.3918, over 6750242.38 frames. ], batch size: 94, lr: 3.21e-03, grad_scale: 16.0 2024-09-19 12:53:28,294 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 12:53:45,239 INFO [train.py:1230] (1/2) Epoch 37, validation: loss=0.1482, simple_loss=0.2423, pruned_loss=0.0231, ctc_loss=0.039, cr_loss=2.12e-14, over 944034.00 frames. 2024-09-19 12:53:45,240 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 12:53:55,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=665233.3333333334, ans=0.0 2024-09-19 12:53:55,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=665233.3333333334, ans=0.05 2024-09-19 12:54:03,483 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.157e+02 2.442e+02 2.727e+02 3.360e+02 6.849e+02, threshold=5.454e+02, percent-clipped=1.0 2024-09-19 12:54:31,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=665326.6666666666, ans=0.125 2024-09-19 12:54:49,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=665420.0, ans=0.125 2024-09-19 12:54:49,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=665420.0, ans=0.125 2024-09-19 12:55:03,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=665420.0, ans=0.0 2024-09-19 12:55:07,337 INFO [train.py:1198] (1/2) Epoch 37, batch 3050, loss[loss=0.1985, simple_loss=0.2565, pruned_loss=0.05162, ctc_loss=0.1102, cr_loss=0.3817, over 34598.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.2634, pruned_loss=0.0552, ctc_loss=0.1181, cr_loss=0.3931, over 6742614.44 frames. ], batch size: 89, lr: 3.21e-03, grad_scale: 16.0 2024-09-19 12:55:33,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=665513.3333333334, ans=0.5 2024-09-19 12:55:36,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=665513.3333333334, ans=0.125 2024-09-19 12:55:41,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=665560.0, ans=0.0 2024-09-19 12:55:54,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=665560.0, ans=0.125 2024-09-19 12:55:57,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=665606.6666666666, ans=0.125 2024-09-19 12:56:29,662 INFO [train.py:1198] (1/2) Epoch 37, batch 3100, loss[loss=0.2203, simple_loss=0.2826, pruned_loss=0.05804, ctc_loss=0.1272, cr_loss=0.4121, over 34229.00 frames. ], tot_loss[loss=0.2068, simple_loss=0.2634, pruned_loss=0.05539, ctc_loss=0.1184, cr_loss=0.3931, over 6741545.58 frames. ], batch size: 117, lr: 3.21e-03, grad_scale: 16.0 2024-09-19 12:56:41,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=665700.0, ans=0.0 2024-09-19 12:56:47,371 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.067e+02 2.441e+02 2.801e+02 3.264e+02 5.759e+02, threshold=5.603e+02, percent-clipped=4.0 2024-09-19 12:56:58,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665746.6666666666, ans=0.1 2024-09-19 12:57:02,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.54 vs. limit=15.0 2024-09-19 12:57:52,219 INFO [train.py:1198] (1/2) Epoch 37, batch 3150, loss[loss=0.2115, simple_loss=0.2705, pruned_loss=0.05668, ctc_loss=0.1212, cr_loss=0.3753, over 33862.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.2633, pruned_loss=0.0553, ctc_loss=0.1183, cr_loss=0.3929, over 6747619.08 frames. ], batch size: 122, lr: 3.21e-03, grad_scale: 16.0 2024-09-19 12:58:02,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=665933.3333333334, ans=0.025 2024-09-19 12:58:09,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.65 vs. limit=10.0 2024-09-19 12:58:12,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=665980.0, ans=0.0 2024-09-19 12:58:21,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=665980.0, ans=0.07 2024-09-19 12:58:41,708 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2024-09-19 12:58:42,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=666073.3333333334, ans=0.2 2024-09-19 12:58:51,312 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.24 vs. limit=15.0 2024-09-19 12:59:12,902 INFO [train.py:1198] (1/2) Epoch 37, batch 3200, loss[loss=0.2152, simple_loss=0.2666, pruned_loss=0.06114, ctc_loss=0.1266, cr_loss=0.4054, over 34546.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.263, pruned_loss=0.05521, ctc_loss=0.1181, cr_loss=0.3928, over 6761630.45 frames. ], batch size: 94, lr: 3.21e-03, grad_scale: 16.0 2024-09-19 12:59:26,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=666166.6666666666, ans=0.125 2024-09-19 12:59:32,227 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.117e+02 2.616e+02 3.092e+02 3.907e+02 8.289e+02, threshold=6.184e+02, percent-clipped=4.0 2024-09-19 12:59:38,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=666213.3333333334, ans=0.125 2024-09-19 12:59:47,160 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=666260.0, ans=0.125 2024-09-19 13:00:11,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=666306.6666666666, ans=0.09899494936611666 2024-09-19 13:00:21,667 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.69 vs. limit=15.0 2024-09-19 13:00:34,190 INFO [train.py:1198] (1/2) Epoch 37, batch 3250, loss[loss=0.2166, simple_loss=0.2762, pruned_loss=0.05789, ctc_loss=0.1253, cr_loss=0.403, over 34658.00 frames. ], tot_loss[loss=0.2069, simple_loss=0.2636, pruned_loss=0.0554, ctc_loss=0.1185, cr_loss=0.3932, over 6770475.30 frames. ], batch size: 98, lr: 3.21e-03, grad_scale: 16.0 2024-09-19 13:00:55,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=666446.6666666666, ans=0.125 2024-09-19 13:01:21,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=666540.0, ans=0.2 2024-09-19 13:01:34,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=666540.0, ans=0.5 2024-09-19 13:01:56,076 INFO [train.py:1198] (1/2) Epoch 37, batch 3300, loss[loss=0.2075, simple_loss=0.2698, pruned_loss=0.05342, ctc_loss=0.1158, cr_loss=0.3811, over 32932.00 frames. ], tot_loss[loss=0.2057, simple_loss=0.2623, pruned_loss=0.05493, ctc_loss=0.1176, cr_loss=0.3915, over 6769607.87 frames. ], batch size: 130, lr: 3.21e-03, grad_scale: 16.0 2024-09-19 13:02:12,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=666680.0, ans=0.05 2024-09-19 13:02:15,604 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.059e+02 2.454e+02 2.745e+02 3.339e+02 5.904e+02, threshold=5.491e+02, percent-clipped=0.0 2024-09-19 13:02:17,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=666680.0, ans=0.2 2024-09-19 13:02:38,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=666726.6666666666, ans=0.05 2024-09-19 13:02:44,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=666773.3333333334, ans=0.0 2024-09-19 13:03:09,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=666820.0, ans=0.2 2024-09-19 13:03:12,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=666820.0, ans=0.1 2024-09-19 13:03:15,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=666866.6666666666, ans=0.09899494936611666 2024-09-19 13:03:17,012 INFO [train.py:1198] (1/2) Epoch 37, batch 3350, loss[loss=0.2128, simple_loss=0.2699, pruned_loss=0.05763, ctc_loss=0.1233, cr_loss=0.396, over 33818.00 frames. ], tot_loss[loss=0.2063, simple_loss=0.2629, pruned_loss=0.05518, ctc_loss=0.118, cr_loss=0.392, over 6743462.29 frames. ], batch size: 122, lr: 3.20e-03, grad_scale: 16.0 2024-09-19 13:03:17,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=666866.6666666666, ans=0.2 2024-09-19 13:03:28,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=666866.6666666666, ans=0.125 2024-09-19 13:03:31,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=666866.6666666666, ans=0.1 2024-09-19 13:03:40,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.33 vs. limit=12.0 2024-09-19 13:03:47,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=666913.3333333334, ans=0.125 2024-09-19 13:03:49,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=666960.0, ans=0.125 2024-09-19 13:03:57,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.37 vs. limit=15.0 2024-09-19 13:04:39,175 INFO [train.py:1198] (1/2) Epoch 37, batch 3400, loss[loss=0.1783, simple_loss=0.2351, pruned_loss=0.0438, ctc_loss=0.09787, cr_loss=0.3571, over 34192.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.263, pruned_loss=0.05526, ctc_loss=0.1183, cr_loss=0.3924, over 6734459.69 frames. ], batch size: 78, lr: 3.20e-03, grad_scale: 16.0 2024-09-19 13:04:58,409 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.102e+02 2.561e+02 2.961e+02 3.562e+02 6.111e+02, threshold=5.921e+02, percent-clipped=2.0 2024-09-19 13:05:11,925 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2024-09-19 13:05:24,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=667193.3333333334, ans=0.125 2024-09-19 13:05:32,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=667240.0, ans=0.2 2024-09-19 13:05:33,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=667240.0, ans=0.1 2024-09-19 13:05:38,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=667240.0, ans=0.1 2024-09-19 13:05:45,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=667286.6666666666, ans=0.2 2024-09-19 13:05:45,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=667286.6666666666, ans=0.125 2024-09-19 13:05:49,329 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.56 vs. limit=22.5 2024-09-19 13:05:59,775 INFO [train.py:1198] (1/2) Epoch 37, batch 3450, loss[loss=0.2211, simple_loss=0.2793, pruned_loss=0.05996, ctc_loss=0.1309, cr_loss=0.4173, over 33094.00 frames. ], tot_loss[loss=0.2073, simple_loss=0.2637, pruned_loss=0.05565, ctc_loss=0.119, cr_loss=0.3943, over 6746027.17 frames. ], batch size: 130, lr: 3.20e-03, grad_scale: 16.0 2024-09-19 13:06:11,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=667333.3333333334, ans=0.025 2024-09-19 13:06:23,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=667380.0, ans=0.125 2024-09-19 13:06:28,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=667380.0, ans=0.1 2024-09-19 13:06:31,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=667426.6666666666, ans=0.125 2024-09-19 13:06:39,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=667426.6666666666, ans=0.0 2024-09-19 13:06:53,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=667473.3333333334, ans=0.125 2024-09-19 13:06:58,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=667473.3333333334, ans=0.2 2024-09-19 13:07:02,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=667473.3333333334, ans=0.025 2024-09-19 13:07:14,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=667520.0, ans=0.125 2024-09-19 13:07:20,864 INFO [train.py:1198] (1/2) Epoch 37, batch 3500, loss[loss=0.1852, simple_loss=0.2436, pruned_loss=0.04598, ctc_loss=0.1025, cr_loss=0.3583, over 34455.00 frames. ], tot_loss[loss=0.2067, simple_loss=0.2632, pruned_loss=0.05543, ctc_loss=0.1187, cr_loss=0.3934, over 6748382.88 frames. ], batch size: 85, lr: 3.20e-03, grad_scale: 16.0 2024-09-19 13:07:40,154 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 2.426e+02 2.836e+02 3.275e+02 5.955e+02, threshold=5.672e+02, percent-clipped=1.0 2024-09-19 13:07:54,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=667660.0, ans=0.125 2024-09-19 13:08:18,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=667706.6666666666, ans=0.0 2024-09-19 13:08:29,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=667753.3333333334, ans=0.125 2024-09-19 13:08:42,163 INFO [train.py:1198] (1/2) Epoch 37, batch 3550, loss[loss=0.2175, simple_loss=0.2781, pruned_loss=0.05789, ctc_loss=0.124, cr_loss=0.4105, over 34394.00 frames. ], tot_loss[loss=0.2067, simple_loss=0.2633, pruned_loss=0.05532, ctc_loss=0.1185, cr_loss=0.3935, over 6758262.96 frames. ], batch size: 103, lr: 3.20e-03, grad_scale: 16.0 2024-09-19 13:09:05,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=667846.6666666666, ans=15.0 2024-09-19 13:09:09,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=667846.6666666666, ans=0.125 2024-09-19 13:09:17,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=667893.3333333334, ans=0.125 2024-09-19 13:09:30,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=667940.0, ans=0.09899494936611666 2024-09-19 13:09:37,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=667940.0, ans=0.2 2024-09-19 13:09:42,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=667940.0, ans=0.125 2024-09-19 13:10:02,710 INFO [train.py:1198] (1/2) Epoch 37, batch 3600, loss[loss=0.1925, simple_loss=0.2475, pruned_loss=0.05085, ctc_loss=0.1084, cr_loss=0.3523, over 34489.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2635, pruned_loss=0.05551, ctc_loss=0.1188, cr_loss=0.394, over 6766746.52 frames. ], batch size: 90, lr: 3.20e-03, grad_scale: 32.0 2024-09-19 13:10:07,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=668033.3333333334, ans=0.125 2024-09-19 13:10:08,296 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.14 vs. limit=15.0 2024-09-19 13:10:11,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=668033.3333333334, ans=0.0 2024-09-19 13:10:14,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=668033.3333333334, ans=0.125 2024-09-19 13:10:16,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=668033.3333333334, ans=0.125 2024-09-19 13:10:22,660 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 2.581e+02 3.020e+02 3.520e+02 7.285e+02, threshold=6.041e+02, percent-clipped=1.0 2024-09-19 13:10:32,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=668080.0, ans=0.0 2024-09-19 13:10:58,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=668173.3333333334, ans=0.125 2024-09-19 13:11:24,010 INFO [train.py:1198] (1/2) Epoch 37, batch 3650, loss[loss=0.2128, simple_loss=0.2704, pruned_loss=0.05679, ctc_loss=0.1234, cr_loss=0.425, over 34430.00 frames. ], tot_loss[loss=0.2057, simple_loss=0.2625, pruned_loss=0.05486, ctc_loss=0.1177, cr_loss=0.3915, over 6768632.19 frames. ], batch size: 110, lr: 3.20e-03, grad_scale: 32.0 2024-09-19 13:11:32,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668266.6666666666, ans=0.1 2024-09-19 13:11:47,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=668313.3333333334, ans=0.125 2024-09-19 13:11:49,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=668313.3333333334, ans=0.125 2024-09-19 13:11:54,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=668313.3333333334, ans=0.125 2024-09-19 13:12:00,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=668360.0, ans=0.125 2024-09-19 13:12:01,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=22.5 2024-09-19 13:12:01,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.28 vs. limit=15.0 2024-09-19 13:12:07,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=668360.0, ans=0.125 2024-09-19 13:12:18,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=668406.6666666666, ans=0.125 2024-09-19 13:12:19,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=668406.6666666666, ans=0.025 2024-09-19 13:12:44,883 INFO [train.py:1198] (1/2) Epoch 37, batch 3700, loss[loss=0.2083, simple_loss=0.2709, pruned_loss=0.05354, ctc_loss=0.1166, cr_loss=0.3851, over 34627.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2626, pruned_loss=0.05471, ctc_loss=0.1173, cr_loss=0.3911, over 6783997.66 frames. ], batch size: 102, lr: 3.20e-03, grad_scale: 32.0 2024-09-19 13:12:53,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=668500.0, ans=0.125 2024-09-19 13:13:04,446 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.053e+02 2.484e+02 2.948e+02 4.030e+02 6.744e+02, threshold=5.896e+02, percent-clipped=2.0 2024-09-19 13:13:21,148 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=668593.3333333334, ans=0.0 2024-09-19 13:13:24,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668593.3333333334, ans=0.1 2024-09-19 13:13:41,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_ff2.min_abs, batch_count=668640.0, ans=0.1 2024-09-19 13:14:06,438 INFO [train.py:1198] (1/2) Epoch 37, batch 3750, loss[loss=0.2122, simple_loss=0.2712, pruned_loss=0.05622, ctc_loss=0.1237, cr_loss=0.4, over 34320.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.266, pruned_loss=0.05623, ctc_loss=0.1203, cr_loss=0.398, over 6785000.36 frames. ], batch size: 113, lr: 3.20e-03, grad_scale: 16.0 2024-09-19 13:14:08,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=668733.3333333334, ans=0.125 2024-09-19 13:14:21,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=668780.0, ans=0.125 2024-09-19 13:14:30,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=668780.0, ans=0.0 2024-09-19 13:14:41,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=15.0 2024-09-19 13:14:46,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.83 vs. limit=15.0 2024-09-19 13:14:50,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=668826.6666666666, ans=0.0 2024-09-19 13:14:54,221 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.16 vs. limit=12.0 2024-09-19 13:15:06,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=668873.3333333334, ans=0.2 2024-09-19 13:15:13,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=668920.0, ans=0.05 2024-09-19 13:15:27,420 INFO [train.py:1198] (1/2) Epoch 37, batch 3800, loss[loss=0.2314, simple_loss=0.2798, pruned_loss=0.0682, ctc_loss=0.1441, cr_loss=0.4441, over 30019.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.2683, pruned_loss=0.0576, ctc_loss=0.1228, cr_loss=0.4031, over 6672627.33 frames. ], batch size: 175, lr: 3.20e-03, grad_scale: 16.0 2024-09-19 13:15:38,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=668966.6666666666, ans=0.1 2024-09-19 13:15:39,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=668966.6666666666, ans=0.1 2024-09-19 13:15:49,500 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.176e+02 2.362e+02 2.444e+02 2.677e+02 8.716e+02, threshold=4.888e+02, percent-clipped=2.0 2024-09-19 13:16:09,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=669060.0, ans=0.025 2024-09-19 13:16:21,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=669106.6666666666, ans=0.1 2024-09-19 13:16:41,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=669153.3333333334, ans=0.125 2024-09-19 13:16:41,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=669153.3333333334, ans=0.1 2024-09-19 13:16:45,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=669153.3333333334, ans=0.025 2024-09-19 13:16:48,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.18 vs. limit=15.0 2024-09-19 13:16:50,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=669200.0, ans=0.125 2024-09-19 13:16:51,764 INFO [train.py:1198] (1/2) Epoch 37, batch 3850, loss[loss=0.2346, simple_loss=0.2831, pruned_loss=0.07008, ctc_loss=0.1446, cr_loss=0.4264, over 23389.00 frames. ], tot_loss[loss=0.215, simple_loss=0.2702, pruned_loss=0.0592, ctc_loss=0.1261, cr_loss=0.4067, over 6245417.97 frames. ], batch size: 244, lr: 3.20e-03, grad_scale: 16.0 2024-09-19 13:17:05,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=669200.0, ans=0.125 2024-09-19 13:17:16,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=22.19 vs. limit=22.5 2024-09-19 13:17:22,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=669246.6666666666, ans=15.0 2024-09-19 13:17:22,144 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.95 vs. limit=15.0 2024-09-19 13:18:17,942 INFO [train.py:1198] (1/2) Epoch 38, batch 0, loss[loss=0.1872, simple_loss=0.2471, pruned_loss=0.04644, ctc_loss=0.1014, cr_loss=0.3543, over 34496.00 frames. ], tot_loss[loss=0.1872, simple_loss=0.2471, pruned_loss=0.04644, ctc_loss=0.1014, cr_loss=0.3543, over 34496.00 frames. ], batch size: 85, lr: 3.16e-03, grad_scale: 32.0 2024-09-19 13:18:17,942 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 13:18:34,780 INFO [train.py:1230] (1/2) Epoch 38, validation: loss=0.1479, simple_loss=0.2431, pruned_loss=0.02248, ctc_loss=0.03851, cr_loss=2.115e-14, over 944034.00 frames. 2024-09-19 13:18:34,781 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 13:18:56,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=669368.0, ans=0.1 2024-09-19 13:19:10,331 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.72 vs. limit=15.0 2024-09-19 13:19:13,881 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.94 vs. limit=15.0 2024-09-19 13:19:15,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.71 vs. limit=15.0 2024-09-19 13:19:32,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=669461.3333333334, ans=0.015 2024-09-19 13:19:35,986 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.678e+02 2.919e+02 3.176e+02 6.272e+02, threshold=5.839e+02, percent-clipped=5.0 2024-09-19 13:19:36,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=669461.3333333334, ans=0.0 2024-09-19 13:19:58,439 INFO [train.py:1198] (1/2) Epoch 38, batch 50, loss[loss=0.1741, simple_loss=0.2269, pruned_loss=0.04427, ctc_loss=0.09769, cr_loss=0.33, over 34492.00 frames. ], tot_loss[loss=0.2082, simple_loss=0.2646, pruned_loss=0.05597, ctc_loss=0.12, cr_loss=0.3975, over 1479837.77 frames. ], batch size: 82, lr: 3.16e-03, grad_scale: 32.0 2024-09-19 13:20:03,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=669554.6666666666, ans=0.125 2024-09-19 13:20:10,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=669554.6666666666, ans=0.125 2024-09-19 13:20:16,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.60 vs. limit=10.0 2024-09-19 13:20:18,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=669601.3333333334, ans=0.2 2024-09-19 13:20:26,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=669601.3333333334, ans=0.125 2024-09-19 13:20:49,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.21 vs. limit=22.5 2024-09-19 13:20:51,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=669694.6666666666, ans=0.125 2024-09-19 13:21:10,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=669741.3333333334, ans=0.05 2024-09-19 13:21:15,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=669741.3333333334, ans=0.125 2024-09-19 13:21:15,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=669741.3333333334, ans=0.125 2024-09-19 13:21:22,886 INFO [train.py:1198] (1/2) Epoch 38, batch 100, loss[loss=0.2052, simple_loss=0.2597, pruned_loss=0.05623, ctc_loss=0.116, cr_loss=0.3755, over 34575.00 frames. ], tot_loss[loss=0.21, simple_loss=0.2666, pruned_loss=0.05663, ctc_loss=0.1212, cr_loss=0.3997, over 2629665.87 frames. ], batch size: 89, lr: 3.15e-03, grad_scale: 32.0 2024-09-19 13:21:31,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=669788.0, ans=0.09899494936611666 2024-09-19 13:21:34,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=669788.0, ans=0.125 2024-09-19 13:21:36,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.64 vs. limit=6.0 2024-09-19 13:21:55,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=669881.3333333334, ans=0.0 2024-09-19 13:22:23,272 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.014e+02 2.474e+02 2.915e+02 3.466e+02 6.202e+02, threshold=5.830e+02, percent-clipped=1.0 2024-09-19 13:22:34,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=669974.6666666666, ans=0.025 2024-09-19 13:22:38,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=669974.6666666666, ans=0.0 2024-09-19 13:22:44,286 INFO [train.py:1198] (1/2) Epoch 38, batch 150, loss[loss=0.1804, simple_loss=0.2369, pruned_loss=0.04499, ctc_loss=0.09969, cr_loss=0.3511, over 34463.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2638, pruned_loss=0.05535, ctc_loss=0.1188, cr_loss=0.3949, over 3557424.02 frames. ], batch size: 82, lr: 3.15e-03, grad_scale: 32.0 2024-09-19 13:23:21,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=670114.6666666666, ans=0.125 2024-09-19 13:23:25,110 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.25 vs. limit=15.0 2024-09-19 13:23:53,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=670208.0, ans=0.05 2024-09-19 13:24:08,965 INFO [train.py:1198] (1/2) Epoch 38, batch 200, loss[loss=0.2208, simple_loss=0.277, pruned_loss=0.06091, ctc_loss=0.1313, cr_loss=0.413, over 31753.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2625, pruned_loss=0.05474, ctc_loss=0.1176, cr_loss=0.3921, over 4272207.07 frames. ], batch size: 145, lr: 3.15e-03, grad_scale: 32.0 2024-09-19 13:24:09,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=670254.6666666666, ans=0.0 2024-09-19 13:24:39,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=670301.3333333334, ans=0.2 2024-09-19 13:24:52,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=670348.0, ans=0.2 2024-09-19 13:24:59,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=22.5 2024-09-19 13:25:03,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=670394.6666666666, ans=0.0 2024-09-19 13:25:10,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=670394.6666666666, ans=0.2 2024-09-19 13:25:11,504 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.096e+02 2.671e+02 3.259e+02 4.441e+02 8.595e+02, threshold=6.519e+02, percent-clipped=10.0 2024-09-19 13:25:13,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=670394.6666666666, ans=0.2 2024-09-19 13:25:20,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=670441.3333333334, ans=0.0 2024-09-19 13:25:32,839 INFO [train.py:1198] (1/2) Epoch 38, batch 250, loss[loss=0.2191, simple_loss=0.2772, pruned_loss=0.05954, ctc_loss=0.1259, cr_loss=0.4174, over 34276.00 frames. ], tot_loss[loss=0.2061, simple_loss=0.263, pruned_loss=0.05495, ctc_loss=0.1179, cr_loss=0.3926, over 4833835.05 frames. ], batch size: 117, lr: 3.15e-03, grad_scale: 32.0 2024-09-19 13:25:41,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=670488.0, ans=0.0 2024-09-19 13:25:44,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=670488.0, ans=0.0 2024-09-19 13:25:44,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=670488.0, ans=0.2 2024-09-19 13:25:44,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=670488.0, ans=0.1 2024-09-19 13:25:48,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=670534.6666666666, ans=0.1 2024-09-19 13:25:49,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.27 vs. limit=15.0 2024-09-19 13:25:53,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=670534.6666666666, ans=0.09899494936611666 2024-09-19 13:25:56,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=670534.6666666666, ans=0.0 2024-09-19 13:26:14,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=670581.3333333334, ans=0.07 2024-09-19 13:26:22,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=670628.0, ans=0.1 2024-09-19 13:26:28,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=670628.0, ans=0.125 2024-09-19 13:26:47,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=670674.6666666666, ans=0.1 2024-09-19 13:26:55,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.91 vs. limit=10.0 2024-09-19 13:26:55,497 INFO [train.py:1198] (1/2) Epoch 38, batch 300, loss[loss=0.2333, simple_loss=0.2887, pruned_loss=0.06643, ctc_loss=0.1388, cr_loss=0.4318, over 34341.00 frames. ], tot_loss[loss=0.2055, simple_loss=0.2623, pruned_loss=0.05479, ctc_loss=0.1175, cr_loss=0.3914, over 5263109.13 frames. ], batch size: 107, lr: 3.15e-03, grad_scale: 32.0 2024-09-19 13:27:00,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=670721.3333333334, ans=0.125 2024-09-19 13:27:07,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=670721.3333333334, ans=0.025 2024-09-19 13:27:29,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=670814.6666666666, ans=0.0 2024-09-19 13:27:47,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=670861.3333333334, ans=0.125 2024-09-19 13:27:57,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=670861.3333333334, ans=0.125 2024-09-19 13:28:00,324 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.201e+02 2.511e+02 2.735e+02 3.192e+02 7.221e+02, threshold=5.470e+02, percent-clipped=1.0 2024-09-19 13:28:22,054 INFO [train.py:1198] (1/2) Epoch 38, batch 350, loss[loss=0.1738, simple_loss=0.2357, pruned_loss=0.04076, ctc_loss=0.08986, cr_loss=0.3126, over 34647.00 frames. ], tot_loss[loss=0.2062, simple_loss=0.263, pruned_loss=0.055, ctc_loss=0.1179, cr_loss=0.3929, over 5597743.46 frames. ], batch size: 84, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 13:28:32,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=670954.6666666666, ans=0.125 2024-09-19 13:28:35,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=670954.6666666666, ans=0.0 2024-09-19 13:28:38,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=671001.3333333334, ans=0.1 2024-09-19 13:29:26,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=671141.3333333334, ans=0.125 2024-09-19 13:29:44,415 INFO [train.py:1198] (1/2) Epoch 38, batch 400, loss[loss=0.2128, simple_loss=0.2739, pruned_loss=0.05644, ctc_loss=0.1168, cr_loss=0.3895, over 34404.00 frames. ], tot_loss[loss=0.206, simple_loss=0.2628, pruned_loss=0.05498, ctc_loss=0.1178, cr_loss=0.3926, over 5865161.17 frames. ], batch size: 95, lr: 3.15e-03, grad_scale: 32.0 2024-09-19 13:29:48,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=671188.0, ans=0.0 2024-09-19 13:29:54,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=671188.0, ans=0.5 2024-09-19 13:30:30,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_na.min_abs, batch_count=671281.3333333334, ans=0.02 2024-09-19 13:30:34,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=671328.0, ans=0.0 2024-09-19 13:30:47,270 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.090e+02 2.411e+02 2.782e+02 3.454e+02 5.634e+02, threshold=5.563e+02, percent-clipped=1.0 2024-09-19 13:30:48,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.43 vs. limit=10.0 2024-09-19 13:31:00,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=671374.6666666666, ans=0.125 2024-09-19 13:31:05,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=671421.3333333334, ans=0.2 2024-09-19 13:31:06,995 INFO [train.py:1198] (1/2) Epoch 38, batch 450, loss[loss=0.2198, simple_loss=0.2781, pruned_loss=0.05992, ctc_loss=0.1273, cr_loss=0.4066, over 34704.00 frames. ], tot_loss[loss=0.2061, simple_loss=0.263, pruned_loss=0.05497, ctc_loss=0.1177, cr_loss=0.3926, over 6056193.56 frames. ], batch size: 97, lr: 3.15e-03, grad_scale: 32.0 2024-09-19 13:31:07,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=671421.3333333334, ans=0.0 2024-09-19 13:31:07,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2024-09-19 13:31:16,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=671421.3333333334, ans=0.125 2024-09-19 13:31:39,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=671468.0, ans=0.0 2024-09-19 13:32:11,980 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2024-09-19 13:32:32,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=671654.6666666666, ans=0.125 2024-09-19 13:32:33,956 INFO [train.py:1198] (1/2) Epoch 38, batch 500, loss[loss=0.2232, simple_loss=0.2773, pruned_loss=0.06301, ctc_loss=0.1293, cr_loss=0.4308, over 34465.00 frames. ], tot_loss[loss=0.205, simple_loss=0.2619, pruned_loss=0.05457, ctc_loss=0.1169, cr_loss=0.3901, over 6220906.28 frames. ], batch size: 110, lr: 3.15e-03, grad_scale: 32.0 2024-09-19 13:32:39,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=671654.6666666666, ans=0.125 2024-09-19 13:32:46,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.77 vs. limit=15.0 2024-09-19 13:32:55,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=671701.3333333334, ans=0.0 2024-09-19 13:33:00,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=22.5 2024-09-19 13:33:03,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.36 vs. limit=15.0 2024-09-19 13:33:12,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=671748.0, ans=0.125 2024-09-19 13:33:38,249 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.061e+02 2.466e+02 2.892e+02 3.717e+02 5.684e+02, threshold=5.784e+02, percent-clipped=2.0 2024-09-19 13:33:38,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=671841.3333333334, ans=0.125 2024-09-19 13:33:39,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.63 vs. limit=12.0 2024-09-19 13:33:46,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=671841.3333333334, ans=0.125 2024-09-19 13:33:55,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=671888.0, ans=0.125 2024-09-19 13:33:56,390 INFO [train.py:1198] (1/2) Epoch 38, batch 550, loss[loss=0.228, simple_loss=0.2857, pruned_loss=0.06343, ctc_loss=0.1312, cr_loss=0.4291, over 33792.00 frames. ], tot_loss[loss=0.2047, simple_loss=0.2617, pruned_loss=0.0544, ctc_loss=0.1166, cr_loss=0.3895, over 6328700.77 frames. ], batch size: 122, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 13:33:57,291 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.67 vs. limit=12.0 2024-09-19 13:34:20,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.94 vs. limit=22.5 2024-09-19 13:34:27,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=671981.3333333334, ans=0.025 2024-09-19 13:34:43,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=671981.3333333334, ans=0.0 2024-09-19 13:34:55,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=672028.0, ans=0.0 2024-09-19 13:34:56,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.54 vs. limit=15.0 2024-09-19 13:35:02,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=672028.0, ans=0.0 2024-09-19 13:35:12,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=672074.6666666666, ans=0.125 2024-09-19 13:35:22,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.46 vs. limit=22.5 2024-09-19 13:35:26,712 INFO [train.py:1198] (1/2) Epoch 38, batch 600, loss[loss=0.2266, simple_loss=0.2846, pruned_loss=0.06258, ctc_loss=0.1314, cr_loss=0.4284, over 34227.00 frames. ], tot_loss[loss=0.205, simple_loss=0.262, pruned_loss=0.05452, ctc_loss=0.1167, cr_loss=0.39, over 6431720.06 frames. ], batch size: 117, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 13:35:28,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=672121.3333333334, ans=0.2 2024-09-19 13:36:09,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=672214.6666666666, ans=0.125 2024-09-19 13:36:31,977 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.057e+02 2.433e+02 2.903e+02 3.465e+02 8.508e+02, threshold=5.805e+02, percent-clipped=3.0 2024-09-19 13:36:38,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=672308.0, ans=0.1 2024-09-19 13:36:40,519 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:36:46,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=672308.0, ans=0.125 2024-09-19 13:36:48,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=672354.6666666666, ans=0.125 2024-09-19 13:36:49,952 INFO [train.py:1198] (1/2) Epoch 38, batch 650, loss[loss=0.2092, simple_loss=0.2644, pruned_loss=0.05694, ctc_loss=0.1213, cr_loss=0.3978, over 34518.00 frames. ], tot_loss[loss=0.2039, simple_loss=0.261, pruned_loss=0.05402, ctc_loss=0.1159, cr_loss=0.3885, over 6523893.29 frames. ], batch size: 94, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 13:37:05,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.52 vs. limit=15.0 2024-09-19 13:37:18,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=672401.3333333334, ans=0.125 2024-09-19 13:38:01,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=672541.3333333334, ans=0.0 2024-09-19 13:38:11,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.85 vs. limit=22.5 2024-09-19 13:38:12,489 INFO [train.py:1198] (1/2) Epoch 38, batch 700, loss[loss=0.2038, simple_loss=0.2565, pruned_loss=0.05592, ctc_loss=0.1169, cr_loss=0.3949, over 34589.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.2618, pruned_loss=0.05429, ctc_loss=0.1165, cr_loss=0.3894, over 6579018.17 frames. ], batch size: 89, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 13:38:34,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=672634.6666666666, ans=0.0 2024-09-19 13:38:48,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2024-09-19 13:38:57,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=672681.3333333334, ans=0.1 2024-09-19 13:39:16,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=672728.0, ans=0.0 2024-09-19 13:39:19,028 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.067e+02 2.508e+02 2.898e+02 3.725e+02 5.665e+02, threshold=5.797e+02, percent-clipped=0.0 2024-09-19 13:39:32,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.50 vs. limit=15.0 2024-09-19 13:39:37,038 INFO [train.py:1198] (1/2) Epoch 38, batch 750, loss[loss=0.2041, simple_loss=0.2625, pruned_loss=0.05335, ctc_loss=0.1158, cr_loss=0.3933, over 34376.00 frames. ], tot_loss[loss=0.2043, simple_loss=0.2615, pruned_loss=0.05419, ctc_loss=0.1163, cr_loss=0.3891, over 6622346.54 frames. ], batch size: 95, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 13:40:32,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=672961.3333333334, ans=0.125 2024-09-19 13:41:01,360 INFO [train.py:1198] (1/2) Epoch 38, batch 800, loss[loss=0.1797, simple_loss=0.2386, pruned_loss=0.04421, ctc_loss=0.0966, cr_loss=0.3251, over 34470.00 frames. ], tot_loss[loss=0.2045, simple_loss=0.2615, pruned_loss=0.05432, ctc_loss=0.1165, cr_loss=0.3894, over 6659448.62 frames. ], batch size: 85, lr: 3.15e-03, grad_scale: 32.0 2024-09-19 13:41:05,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=673054.6666666666, ans=0.1 2024-09-19 13:41:13,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=673054.6666666666, ans=0.125 2024-09-19 13:42:07,159 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.536e+02 3.053e+02 3.656e+02 5.350e+02, threshold=6.105e+02, percent-clipped=0.0 2024-09-19 13:42:23,395 INFO [train.py:1198] (1/2) Epoch 38, batch 850, loss[loss=0.2065, simple_loss=0.2701, pruned_loss=0.05229, ctc_loss=0.1157, cr_loss=0.3787, over 34367.00 frames. ], tot_loss[loss=0.2042, simple_loss=0.2612, pruned_loss=0.05419, ctc_loss=0.1163, cr_loss=0.3893, over 6691642.22 frames. ], batch size: 103, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 13:42:26,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=22.5 2024-09-19 13:43:01,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=673381.3333333334, ans=0.125 2024-09-19 13:43:03,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=673381.3333333334, ans=0.2 2024-09-19 13:43:16,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673428.0, ans=0.1 2024-09-19 13:43:31,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=673474.6666666666, ans=0.125 2024-09-19 13:43:47,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.21 vs. limit=15.0 2024-09-19 13:43:49,564 INFO [train.py:1198] (1/2) Epoch 38, batch 900, loss[loss=0.1928, simple_loss=0.2492, pruned_loss=0.05017, ctc_loss=0.1093, cr_loss=0.3581, over 34463.00 frames. ], tot_loss[loss=0.2051, simple_loss=0.2619, pruned_loss=0.05457, ctc_loss=0.1171, cr_loss=0.3908, over 6698032.08 frames. ], batch size: 85, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 13:43:53,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.25 vs. limit=15.0 2024-09-19 13:44:22,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=673614.6666666666, ans=0.125 2024-09-19 13:44:40,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.39 vs. limit=15.0 2024-09-19 13:44:55,296 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.153e+02 2.491e+02 2.975e+02 3.816e+02 6.992e+02, threshold=5.950e+02, percent-clipped=2.0 2024-09-19 13:44:55,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=673708.0, ans=0.05 2024-09-19 13:45:03,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=673708.0, ans=0.025 2024-09-19 13:45:10,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673754.6666666666, ans=0.1 2024-09-19 13:45:11,540 INFO [train.py:1198] (1/2) Epoch 38, batch 950, loss[loss=0.1948, simple_loss=0.2525, pruned_loss=0.05044, ctc_loss=0.1077, cr_loss=0.3661, over 34662.00 frames. ], tot_loss[loss=0.2053, simple_loss=0.2621, pruned_loss=0.05469, ctc_loss=0.1172, cr_loss=0.3909, over 6701740.20 frames. ], batch size: 87, lr: 3.15e-03, grad_scale: 16.0 2024-09-19 13:45:18,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=673754.6666666666, ans=0.025 2024-09-19 13:46:11,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=673894.6666666666, ans=0.0 2024-09-19 13:46:11,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=673894.6666666666, ans=0.1 2024-09-19 13:46:21,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=673941.3333333334, ans=0.0 2024-09-19 13:46:23,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=673941.3333333334, ans=0.125 2024-09-19 13:46:23,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.29 vs. limit=15.0 2024-09-19 13:46:32,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=673988.0, ans=0.025 2024-09-19 13:46:34,297 INFO [train.py:1198] (1/2) Epoch 38, batch 1000, loss[loss=0.2025, simple_loss=0.2584, pruned_loss=0.05432, ctc_loss=0.1156, cr_loss=0.3711, over 34488.00 frames. ], tot_loss[loss=0.2058, simple_loss=0.2627, pruned_loss=0.0549, ctc_loss=0.1176, cr_loss=0.3917, over 6696317.47 frames. ], batch size: 90, lr: 3.14e-03, grad_scale: 16.0 2024-09-19 13:46:34,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=673988.0, ans=0.0 2024-09-19 13:47:22,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=674081.3333333334, ans=0.0 2024-09-19 13:47:27,303 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=8.0 2024-09-19 13:47:41,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.53 vs. limit=15.0 2024-09-19 13:47:43,849 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.110e+02 2.475e+02 2.970e+02 3.460e+02 1.327e+03, threshold=5.939e+02, percent-clipped=3.0 2024-09-19 13:47:45,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=674174.6666666666, ans=0.035 2024-09-19 13:47:54,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=674174.6666666666, ans=0.2 2024-09-19 13:48:00,279 INFO [train.py:1198] (1/2) Epoch 38, batch 1050, loss[loss=0.2029, simple_loss=0.2676, pruned_loss=0.05052, ctc_loss=0.1124, cr_loss=0.3654, over 34573.00 frames. ], tot_loss[loss=0.205, simple_loss=0.2618, pruned_loss=0.05458, ctc_loss=0.1171, cr_loss=0.3904, over 6705508.30 frames. ], batch size: 99, lr: 3.14e-03, grad_scale: 16.0 2024-09-19 13:48:02,741 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.70 vs. limit=15.0 2024-09-19 13:48:06,050 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2024-09-19 13:48:18,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=674268.0, ans=0.0 2024-09-19 13:48:37,364 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.94 vs. limit=15.0 2024-09-19 13:48:48,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=674361.3333333334, ans=0.0 2024-09-19 13:49:03,286 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:49:04,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=674408.0, ans=0.5 2024-09-19 13:49:08,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=674408.0, ans=0.2 2024-09-19 13:49:17,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=674408.0, ans=0.125 2024-09-19 13:49:21,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=674454.6666666666, ans=0.0 2024-09-19 13:49:22,524 INFO [train.py:1198] (1/2) Epoch 38, batch 1100, loss[loss=0.2009, simple_loss=0.2589, pruned_loss=0.05268, ctc_loss=0.1127, cr_loss=0.3751, over 34341.00 frames. ], tot_loss[loss=0.205, simple_loss=0.2617, pruned_loss=0.05458, ctc_loss=0.1171, cr_loss=0.3903, over 6717897.36 frames. ], batch size: 91, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 13:49:30,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=674454.6666666666, ans=0.0 2024-09-19 13:49:35,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=674454.6666666666, ans=0.0 2024-09-19 13:50:13,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=674594.6666666666, ans=0.0 2024-09-19 13:50:32,136 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.138e+02 2.395e+02 2.920e+02 3.454e+02 5.207e+02, threshold=5.840e+02, percent-clipped=0.0 2024-09-19 13:50:48,888 INFO [train.py:1198] (1/2) Epoch 38, batch 1150, loss[loss=0.2033, simple_loss=0.2547, pruned_loss=0.05606, ctc_loss=0.1191, cr_loss=0.3989, over 34359.00 frames. ], tot_loss[loss=0.205, simple_loss=0.2616, pruned_loss=0.05466, ctc_loss=0.1171, cr_loss=0.3903, over 6715615.22 frames. ], batch size: 91, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 13:51:00,660 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:51:10,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=674734.6666666666, ans=0.125 2024-09-19 13:51:33,018 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.21 vs. limit=6.0 2024-09-19 13:52:11,236 INFO [train.py:1198] (1/2) Epoch 38, batch 1200, loss[loss=0.2181, simple_loss=0.2735, pruned_loss=0.06026, ctc_loss=0.1276, cr_loss=0.4169, over 34582.00 frames. ], tot_loss[loss=0.2055, simple_loss=0.2624, pruned_loss=0.05476, ctc_loss=0.1175, cr_loss=0.3916, over 6708443.67 frames. ], batch size: 99, lr: 3.14e-03, grad_scale: 16.0 2024-09-19 13:52:13,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=674921.3333333334, ans=0.1 2024-09-19 13:52:18,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=674921.3333333334, ans=0.2 2024-09-19 13:52:28,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=674968.0, ans=0.0 2024-09-19 13:52:52,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=675014.6666666666, ans=0.1 2024-09-19 13:52:54,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=675014.6666666666, ans=0.125 2024-09-19 13:53:10,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.67 vs. limit=22.5 2024-09-19 13:53:18,540 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.196e+02 2.553e+02 2.935e+02 3.775e+02 6.784e+02, threshold=5.869e+02, percent-clipped=4.0 2024-09-19 13:53:33,258 INFO [train.py:1198] (1/2) Epoch 38, batch 1250, loss[loss=0.2302, simple_loss=0.2849, pruned_loss=0.06538, ctc_loss=0.1368, cr_loss=0.4353, over 34360.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.2633, pruned_loss=0.05524, ctc_loss=0.1184, cr_loss=0.3939, over 6742471.63 frames. ], batch size: 107, lr: 3.14e-03, grad_scale: 16.0 2024-09-19 13:53:40,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=675154.6666666666, ans=0.025 2024-09-19 13:53:45,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.69 vs. limit=22.5 2024-09-19 13:54:50,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=675341.3333333334, ans=0.1 2024-09-19 13:54:59,757 INFO [train.py:1198] (1/2) Epoch 38, batch 1300, loss[loss=0.2169, simple_loss=0.2771, pruned_loss=0.05742, ctc_loss=0.1262, cr_loss=0.4151, over 33125.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.2627, pruned_loss=0.05495, ctc_loss=0.118, cr_loss=0.3925, over 6745990.14 frames. ], batch size: 130, lr: 3.14e-03, grad_scale: 16.0 2024-09-19 13:55:11,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=675388.0, ans=0.1 2024-09-19 13:55:24,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=675434.6666666666, ans=10.0 2024-09-19 13:55:55,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=675528.0, ans=0.125 2024-09-19 13:55:56,091 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 13:56:07,015 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.151e+02 2.420e+02 2.804e+02 3.452e+02 7.696e+02, threshold=5.607e+02, percent-clipped=1.0 2024-09-19 13:56:10,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=675574.6666666666, ans=0.1 2024-09-19 13:56:13,246 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.33 vs. limit=15.0 2024-09-19 13:56:14,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=675574.6666666666, ans=0.0 2024-09-19 13:56:16,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.28 vs. limit=15.0 2024-09-19 13:56:20,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=675621.3333333334, ans=0.09899494936611666 2024-09-19 13:56:21,964 INFO [train.py:1198] (1/2) Epoch 38, batch 1350, loss[loss=0.2076, simple_loss=0.2631, pruned_loss=0.05622, ctc_loss=0.1179, cr_loss=0.4005, over 34541.00 frames. ], tot_loss[loss=0.2053, simple_loss=0.2621, pruned_loss=0.05469, ctc_loss=0.1173, cr_loss=0.3911, over 6766003.25 frames. ], batch size: 94, lr: 3.14e-03, grad_scale: 16.0 2024-09-19 13:56:40,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=675668.0, ans=0.0 2024-09-19 13:56:47,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.30 vs. limit=15.0 2024-09-19 13:56:57,265 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2024-09-19 13:57:03,697 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.62 vs. limit=10.0 2024-09-19 13:57:42,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=675854.6666666666, ans=0.09899494936611666 2024-09-19 13:57:44,298 INFO [train.py:1198] (1/2) Epoch 38, batch 1400, loss[loss=0.1742, simple_loss=0.2297, pruned_loss=0.04266, ctc_loss=0.09619, cr_loss=0.3524, over 34276.00 frames. ], tot_loss[loss=0.2052, simple_loss=0.2619, pruned_loss=0.05468, ctc_loss=0.1173, cr_loss=0.3912, over 6777934.47 frames. ], batch size: 80, lr: 3.14e-03, grad_scale: 16.0 2024-09-19 13:57:49,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=675854.6666666666, ans=0.125 2024-09-19 13:57:49,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=675854.6666666666, ans=0.2 2024-09-19 13:57:49,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=675854.6666666666, ans=0.125 2024-09-19 13:57:52,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675854.6666666666, ans=0.1 2024-09-19 13:58:14,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=675901.3333333334, ans=0.125 2024-09-19 13:58:47,072 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=675994.6666666666, ans=0.125 2024-09-19 13:58:55,015 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.707e+02 3.156e+02 3.682e+02 6.763e+02, threshold=6.312e+02, percent-clipped=1.0 2024-09-19 13:58:59,326 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.25 vs. limit=10.0 2024-09-19 13:59:00,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=676041.3333333334, ans=0.1 2024-09-19 13:59:09,773 INFO [train.py:1198] (1/2) Epoch 38, batch 1450, loss[loss=0.2165, simple_loss=0.2688, pruned_loss=0.06, ctc_loss=0.1299, cr_loss=0.4546, over 34448.00 frames. ], tot_loss[loss=0.2052, simple_loss=0.2621, pruned_loss=0.05458, ctc_loss=0.1173, cr_loss=0.3914, over 6774608.59 frames. ], batch size: 110, lr: 3.14e-03, grad_scale: 16.0 2024-09-19 13:59:24,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=676134.6666666666, ans=0.125 2024-09-19 13:59:28,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=676134.6666666666, ans=0.125 2024-09-19 13:59:30,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.13 vs. limit=15.0 2024-09-19 13:59:31,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=676134.6666666666, ans=0.07 2024-09-19 13:59:38,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.65 vs. limit=12.0 2024-09-19 13:59:51,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=676181.3333333334, ans=0.09899494936611666 2024-09-19 13:59:52,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=676181.3333333334, ans=0.09899494936611666 2024-09-19 14:00:13,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=676274.6666666666, ans=0.125 2024-09-19 14:00:30,616 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=15.0 2024-09-19 14:00:31,620 INFO [train.py:1198] (1/2) Epoch 38, batch 1500, loss[loss=0.2182, simple_loss=0.2795, pruned_loss=0.05804, ctc_loss=0.1233, cr_loss=0.4026, over 34439.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2627, pruned_loss=0.05471, ctc_loss=0.1174, cr_loss=0.3917, over 6774345.00 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 16.0 2024-09-19 14:00:33,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=676321.3333333334, ans=0.0 2024-09-19 14:00:50,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=676368.0, ans=0.2 2024-09-19 14:00:53,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=676368.0, ans=0.95 2024-09-19 14:00:55,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=676368.0, ans=0.0 2024-09-19 14:01:11,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=676414.6666666666, ans=0.125 2024-09-19 14:01:24,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=676461.3333333334, ans=0.07 2024-09-19 14:01:39,349 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.168e+02 2.471e+02 2.869e+02 3.728e+02 6.226e+02, threshold=5.739e+02, percent-clipped=0.0 2024-09-19 14:01:56,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=676554.6666666666, ans=0.125 2024-09-19 14:01:58,055 INFO [train.py:1198] (1/2) Epoch 38, batch 1550, loss[loss=0.2166, simple_loss=0.2736, pruned_loss=0.05889, ctc_loss=0.1264, cr_loss=0.4154, over 34434.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.2628, pruned_loss=0.05491, ctc_loss=0.1177, cr_loss=0.392, over 6747005.35 frames. ], batch size: 105, lr: 3.14e-03, grad_scale: 16.0 2024-09-19 14:02:21,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=676601.3333333334, ans=0.125 2024-09-19 14:02:54,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=676694.6666666666, ans=0.0 2024-09-19 14:03:17,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=676741.3333333334, ans=0.0 2024-09-19 14:03:20,034 INFO [train.py:1198] (1/2) Epoch 38, batch 1600, loss[loss=0.2194, simple_loss=0.2744, pruned_loss=0.06138, ctc_loss=0.1283, cr_loss=0.399, over 34579.00 frames. ], tot_loss[loss=0.2061, simple_loss=0.2629, pruned_loss=0.05503, ctc_loss=0.1179, cr_loss=0.3928, over 6726909.18 frames. ], batch size: 99, lr: 3.14e-03, grad_scale: 32.0 2024-09-19 14:03:26,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=676788.0, ans=0.125 2024-09-19 14:03:47,362 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.46 vs. limit=15.0 2024-09-19 14:03:48,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=676834.6666666666, ans=0.125 2024-09-19 14:03:51,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=676881.3333333334, ans=0.125 2024-09-19 14:04:03,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=676881.3333333334, ans=0.2 2024-09-19 14:04:13,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=676928.0, ans=0.0 2024-09-19 14:04:13,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=12.0 2024-09-19 14:04:15,892 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2024-09-19 14:04:27,758 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.147e+02 2.566e+02 2.856e+02 3.809e+02 7.171e+02, threshold=5.712e+02, percent-clipped=2.0 2024-09-19 14:04:42,473 INFO [train.py:1198] (1/2) Epoch 38, batch 1650, loss[loss=0.213, simple_loss=0.2724, pruned_loss=0.05619, ctc_loss=0.1222, cr_loss=0.4209, over 34343.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2625, pruned_loss=0.0548, ctc_loss=0.1176, cr_loss=0.3917, over 6720892.68 frames. ], batch size: 103, lr: 3.14e-03, grad_scale: 32.0 2024-09-19 14:04:51,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=677021.3333333334, ans=0.0 2024-09-19 14:04:54,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=677021.3333333334, ans=0.0 2024-09-19 14:04:56,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=677021.3333333334, ans=15.0 2024-09-19 14:04:57,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=677068.0, ans=0.95 2024-09-19 14:05:14,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.58 vs. limit=15.0 2024-09-19 14:05:30,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=677161.3333333334, ans=0.1 2024-09-19 14:05:34,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=677161.3333333334, ans=0.0 2024-09-19 14:05:44,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=677161.3333333334, ans=0.125 2024-09-19 14:05:51,018 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:05:53,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.04 vs. limit=22.5 2024-09-19 14:05:54,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=677208.0, ans=0.125 2024-09-19 14:06:03,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=677208.0, ans=0.125 2024-09-19 14:06:07,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.65 vs. limit=15.0 2024-09-19 14:06:08,370 INFO [train.py:1198] (1/2) Epoch 38, batch 1700, loss[loss=0.1743, simple_loss=0.2313, pruned_loss=0.04217, ctc_loss=0.09715, cr_loss=0.3365, over 34309.00 frames. ], tot_loss[loss=0.2054, simple_loss=0.2624, pruned_loss=0.05471, ctc_loss=0.1174, cr_loss=0.3907, over 6745312.45 frames. ], batch size: 80, lr: 3.14e-03, grad_scale: 32.0 2024-09-19 14:06:33,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=677301.3333333334, ans=0.125 2024-09-19 14:06:39,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=677348.0, ans=0.1 2024-09-19 14:06:40,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=15.0 2024-09-19 14:06:53,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=677348.0, ans=0.1 2024-09-19 14:06:55,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=22.5 2024-09-19 14:07:14,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=677441.3333333334, ans=0.025 2024-09-19 14:07:18,935 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.110e+02 2.468e+02 2.794e+02 3.858e+02 8.258e+02, threshold=5.587e+02, percent-clipped=5.0 2024-09-19 14:07:19,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=677441.3333333334, ans=0.2 2024-09-19 14:07:27,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.48 vs. limit=15.0 2024-09-19 14:07:30,389 INFO [train.py:1198] (1/2) Epoch 38, batch 1750, loss[loss=0.1771, simple_loss=0.2309, pruned_loss=0.045, ctc_loss=0.09677, cr_loss=0.3456, over 34161.00 frames. ], tot_loss[loss=0.2051, simple_loss=0.262, pruned_loss=0.05461, ctc_loss=0.1171, cr_loss=0.3902, over 6754898.18 frames. ], batch size: 78, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 14:07:30,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=677488.0, ans=0.125 2024-09-19 14:07:31,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.96 vs. limit=22.5 2024-09-19 14:08:04,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.40 vs. limit=15.0 2024-09-19 14:08:12,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=677581.3333333334, ans=0.2 2024-09-19 14:08:28,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=677628.0, ans=0.125 2024-09-19 14:08:29,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.63 vs. limit=22.5 2024-09-19 14:08:32,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=677628.0, ans=0.1 2024-09-19 14:08:41,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=677674.6666666666, ans=0.125 2024-09-19 14:08:45,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=677674.6666666666, ans=0.0 2024-09-19 14:08:52,799 INFO [train.py:1198] (1/2) Epoch 38, batch 1800, loss[loss=0.2071, simple_loss=0.2682, pruned_loss=0.05399, ctc_loss=0.1155, cr_loss=0.3732, over 34693.00 frames. ], tot_loss[loss=0.2055, simple_loss=0.2623, pruned_loss=0.0548, ctc_loss=0.1174, cr_loss=0.3911, over 6759087.02 frames. ], batch size: 97, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 14:09:15,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.34 vs. limit=12.0 2024-09-19 14:09:16,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=677768.0, ans=0.95 2024-09-19 14:09:16,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=677768.0, ans=0.2 2024-09-19 14:09:50,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.63 vs. limit=12.0 2024-09-19 14:09:59,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=677861.3333333334, ans=0.125 2024-09-19 14:10:07,627 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.048e+02 2.535e+02 2.959e+02 4.010e+02 5.513e+02, threshold=5.918e+02, percent-clipped=0.0 2024-09-19 14:10:19,396 INFO [train.py:1198] (1/2) Epoch 38, batch 1850, loss[loss=0.1951, simple_loss=0.2572, pruned_loss=0.04843, ctc_loss=0.1067, cr_loss=0.3719, over 34467.00 frames. ], tot_loss[loss=0.2053, simple_loss=0.2623, pruned_loss=0.05466, ctc_loss=0.117, cr_loss=0.3906, over 6764253.86 frames. ], batch size: 100, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 14:11:36,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678141.3333333334, ans=0.1 2024-09-19 14:11:36,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=678141.3333333334, ans=0.125 2024-09-19 14:11:40,864 INFO [train.py:1198] (1/2) Epoch 38, batch 1900, loss[loss=0.2106, simple_loss=0.2691, pruned_loss=0.05611, ctc_loss=0.1192, cr_loss=0.4005, over 34344.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.2629, pruned_loss=0.05482, ctc_loss=0.1175, cr_loss=0.392, over 6773098.29 frames. ], batch size: 103, lr: 3.14e-03, grad_scale: 8.0 2024-09-19 14:11:48,516 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=15.0 2024-09-19 14:11:58,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.96 vs. limit=15.0 2024-09-19 14:12:02,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=678234.6666666666, ans=0.125 2024-09-19 14:12:12,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=678281.3333333334, ans=0.0 2024-09-19 14:12:19,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=678281.3333333334, ans=0.125 2024-09-19 14:12:27,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678281.3333333334, ans=0.1 2024-09-19 14:12:35,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=678328.0, ans=0.125 2024-09-19 14:12:47,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=678374.6666666666, ans=0.0 2024-09-19 14:12:51,402 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.107e+02 2.493e+02 2.820e+02 3.469e+02 7.609e+02, threshold=5.640e+02, percent-clipped=1.0 2024-09-19 14:13:04,602 INFO [train.py:1198] (1/2) Epoch 38, batch 1950, loss[loss=0.2076, simple_loss=0.2624, pruned_loss=0.05653, ctc_loss=0.1191, cr_loss=0.3985, over 34398.00 frames. ], tot_loss[loss=0.2067, simple_loss=0.2638, pruned_loss=0.05514, ctc_loss=0.118, cr_loss=0.3935, over 6789593.47 frames. ], batch size: 91, lr: 3.13e-03, grad_scale: 8.0 2024-09-19 14:13:10,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=678421.3333333334, ans=0.025 2024-09-19 14:13:10,069 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=678421.3333333334, ans=0.1 2024-09-19 14:13:13,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=678421.3333333334, ans=0.0 2024-09-19 14:13:14,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=678421.3333333334, ans=0.125 2024-09-19 14:13:19,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=678421.3333333334, ans=15.0 2024-09-19 14:13:34,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=678468.0, ans=10.0 2024-09-19 14:14:04,034 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=15.0 2024-09-19 14:14:18,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=678608.0, ans=0.2 2024-09-19 14:14:29,270 INFO [train.py:1198] (1/2) Epoch 38, batch 2000, loss[loss=0.18, simple_loss=0.235, pruned_loss=0.04545, ctc_loss=0.09982, cr_loss=0.3513, over 34145.00 frames. ], tot_loss[loss=0.2069, simple_loss=0.2641, pruned_loss=0.05513, ctc_loss=0.1181, cr_loss=0.3938, over 6764012.80 frames. ], batch size: 78, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:14:31,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=678654.6666666666, ans=0.1 2024-09-19 14:14:36,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=678654.6666666666, ans=0.125 2024-09-19 14:14:41,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=678654.6666666666, ans=0.025 2024-09-19 14:14:41,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=678654.6666666666, ans=0.1 2024-09-19 14:15:12,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=678748.0, ans=0.125 2024-09-19 14:15:26,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.04 vs. limit=10.0 2024-09-19 14:15:40,493 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.430e+02 2.788e+02 3.569e+02 1.173e+03, threshold=5.576e+02, percent-clipped=6.0 2024-09-19 14:15:52,127 INFO [train.py:1198] (1/2) Epoch 38, batch 2050, loss[loss=0.1869, simple_loss=0.2457, pruned_loss=0.04675, ctc_loss=0.1029, cr_loss=0.3513, over 34443.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.2634, pruned_loss=0.05507, ctc_loss=0.1179, cr_loss=0.3932, over 6753372.32 frames. ], batch size: 82, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:15:58,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=678888.0, ans=0.125 2024-09-19 14:16:04,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.29 vs. limit=10.0 2024-09-19 14:16:05,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=678888.0, ans=0.125 2024-09-19 14:16:31,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=678981.3333333334, ans=0.0 2024-09-19 14:16:32,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.54 vs. limit=10.0 2024-09-19 14:16:43,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=679028.0, ans=0.125 2024-09-19 14:16:48,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=679028.0, ans=0.025 2024-09-19 14:16:53,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=679028.0, ans=0.2 2024-09-19 14:16:53,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=679028.0, ans=0.04949747468305833 2024-09-19 14:17:17,685 INFO [train.py:1198] (1/2) Epoch 38, batch 2100, loss[loss=0.206, simple_loss=0.2663, pruned_loss=0.05404, ctc_loss=0.1133, cr_loss=0.3755, over 34548.00 frames. ], tot_loss[loss=0.206, simple_loss=0.2628, pruned_loss=0.05493, ctc_loss=0.1176, cr_loss=0.3921, over 6768586.22 frames. ], batch size: 94, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:17:31,167 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:17:36,903 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.84 vs. limit=22.5 2024-09-19 14:17:52,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=679214.6666666666, ans=0.125 2024-09-19 14:18:06,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=679261.3333333334, ans=0.025 2024-09-19 14:18:27,930 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.084e+02 2.401e+02 2.820e+02 3.681e+02 6.348e+02, threshold=5.641e+02, percent-clipped=5.0 2024-09-19 14:18:39,291 INFO [train.py:1198] (1/2) Epoch 38, batch 2150, loss[loss=0.1968, simple_loss=0.2535, pruned_loss=0.05128, ctc_loss=0.1127, cr_loss=0.3764, over 34343.00 frames. ], tot_loss[loss=0.2052, simple_loss=0.2623, pruned_loss=0.05455, ctc_loss=0.1169, cr_loss=0.3909, over 6787594.24 frames. ], batch size: 91, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:18:43,441 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=22.5 2024-09-19 14:18:56,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=679401.3333333334, ans=0.0 2024-09-19 14:19:00,437 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2024-09-19 14:19:02,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=679401.3333333334, ans=0.025 2024-09-19 14:19:22,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=679448.0, ans=0.125 2024-09-19 14:19:34,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=679494.6666666666, ans=0.0 2024-09-19 14:19:41,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=679494.6666666666, ans=0.125 2024-09-19 14:19:44,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=679541.3333333334, ans=0.1 2024-09-19 14:19:46,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=679541.3333333334, ans=0.125 2024-09-19 14:19:47,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=679541.3333333334, ans=0.1 2024-09-19 14:20:02,078 INFO [train.py:1198] (1/2) Epoch 38, batch 2200, loss[loss=0.2053, simple_loss=0.2666, pruned_loss=0.05284, ctc_loss=0.114, cr_loss=0.3879, over 34446.00 frames. ], tot_loss[loss=0.2055, simple_loss=0.2625, pruned_loss=0.05469, ctc_loss=0.1171, cr_loss=0.3918, over 6782760.82 frames. ], batch size: 100, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:20:11,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.58 vs. limit=22.5 2024-09-19 14:21:15,745 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:21:16,812 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.124e+02 2.510e+02 3.009e+02 3.952e+02 6.075e+02, threshold=6.018e+02, percent-clipped=2.0 2024-09-19 14:21:17,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=679774.6666666666, ans=0.125 2024-09-19 14:21:18,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=679774.6666666666, ans=0.025 2024-09-19 14:21:28,328 INFO [train.py:1198] (1/2) Epoch 38, batch 2250, loss[loss=0.2119, simple_loss=0.2684, pruned_loss=0.0572, ctc_loss=0.1241, cr_loss=0.4043, over 34399.00 frames. ], tot_loss[loss=0.2053, simple_loss=0.2623, pruned_loss=0.05459, ctc_loss=0.117, cr_loss=0.3912, over 6780243.61 frames. ], batch size: 95, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:21:47,278 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.79 vs. limit=12.0 2024-09-19 14:21:53,295 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:21:53,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=679868.0, ans=0.0 2024-09-19 14:22:02,245 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2024-09-19 14:22:12,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=679914.6666666666, ans=0.125 2024-09-19 14:22:17,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=679961.3333333334, ans=0.1 2024-09-19 14:22:29,498 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:22:50,272 INFO [train.py:1198] (1/2) Epoch 38, batch 2300, loss[loss=0.1822, simple_loss=0.2393, pruned_loss=0.04521, ctc_loss=0.1002, cr_loss=0.3627, over 34285.00 frames. ], tot_loss[loss=0.2043, simple_loss=0.2612, pruned_loss=0.05426, ctc_loss=0.1164, cr_loss=0.3895, over 6765719.30 frames. ], batch size: 83, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:23:15,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=680101.3333333334, ans=0.125 2024-09-19 14:23:33,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=680148.0, ans=0.125 2024-09-19 14:23:56,422 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:24:01,173 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.922e+02 2.576e+02 3.040e+02 3.503e+02 4.979e+02, threshold=6.080e+02, percent-clipped=0.0 2024-09-19 14:24:12,591 INFO [train.py:1198] (1/2) Epoch 38, batch 2350, loss[loss=0.1993, simple_loss=0.2609, pruned_loss=0.04997, ctc_loss=0.1116, cr_loss=0.3845, over 34708.00 frames. ], tot_loss[loss=0.2048, simple_loss=0.2617, pruned_loss=0.05449, ctc_loss=0.1169, cr_loss=0.391, over 6772926.49 frames. ], batch size: 97, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:24:27,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=680288.0, ans=0.2 2024-09-19 14:25:06,878 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.04 vs. limit=15.0 2024-09-19 14:25:14,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=680428.0, ans=0.0 2024-09-19 14:25:30,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=680474.6666666666, ans=0.2 2024-09-19 14:25:32,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=680474.6666666666, ans=0.2 2024-09-19 14:25:35,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=680474.6666666666, ans=0.125 2024-09-19 14:25:38,811 INFO [train.py:1198] (1/2) Epoch 38, batch 2400, loss[loss=0.2061, simple_loss=0.2582, pruned_loss=0.05746, ctc_loss=0.1175, cr_loss=0.3878, over 34595.00 frames. ], tot_loss[loss=0.2054, simple_loss=0.2622, pruned_loss=0.05473, ctc_loss=0.1174, cr_loss=0.392, over 6776652.27 frames. ], batch size: 89, lr: 3.13e-03, grad_scale: 32.0 2024-09-19 14:25:40,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=680521.3333333334, ans=0.0 2024-09-19 14:25:45,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=680521.3333333334, ans=0.0 2024-09-19 14:25:55,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=680568.0, ans=0.0 2024-09-19 14:26:05,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=680568.0, ans=0.125 2024-09-19 14:26:17,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=15.0 2024-09-19 14:26:51,285 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.193e+02 2.564e+02 3.049e+02 3.825e+02 7.418e+02, threshold=6.098e+02, percent-clipped=1.0 2024-09-19 14:26:54,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=680708.0, ans=0.125 2024-09-19 14:27:01,208 INFO [train.py:1198] (1/2) Epoch 38, batch 2450, loss[loss=0.228, simple_loss=0.2829, pruned_loss=0.06414, ctc_loss=0.1372, cr_loss=0.4348, over 34401.00 frames. ], tot_loss[loss=0.2062, simple_loss=0.263, pruned_loss=0.05503, ctc_loss=0.1179, cr_loss=0.3927, over 6750345.79 frames. ], batch size: 95, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:27:19,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=680801.3333333334, ans=0.125 2024-09-19 14:28:13,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=680941.3333333334, ans=0.125 2024-09-19 14:28:15,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=680941.3333333334, ans=0.2 2024-09-19 14:28:25,065 INFO [train.py:1198] (1/2) Epoch 38, batch 2500, loss[loss=0.2202, simple_loss=0.2792, pruned_loss=0.05953, ctc_loss=0.1264, cr_loss=0.4198, over 34465.00 frames. ], tot_loss[loss=0.2063, simple_loss=0.2631, pruned_loss=0.05512, ctc_loss=0.118, cr_loss=0.3928, over 6762908.24 frames. ], batch size: 100, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:28:58,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=681081.3333333334, ans=0.125 2024-09-19 14:28:58,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=681081.3333333334, ans=0.125 2024-09-19 14:29:11,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=681081.3333333334, ans=0.0 2024-09-19 14:29:21,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=681128.0, ans=0.07 2024-09-19 14:29:21,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=681128.0, ans=0.125 2024-09-19 14:29:28,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=681128.0, ans=0.125 2024-09-19 14:29:28,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_abs, batch_count=681128.0, ans=0.5 2024-09-19 14:29:31,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681174.6666666666, ans=0.1 2024-09-19 14:29:39,633 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.160e+02 2.459e+02 2.773e+02 3.321e+02 6.788e+02, threshold=5.546e+02, percent-clipped=1.0 2024-09-19 14:29:41,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=681174.6666666666, ans=0.0 2024-09-19 14:29:43,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=681174.6666666666, ans=0.5 2024-09-19 14:29:44,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=681174.6666666666, ans=0.0 2024-09-19 14:29:47,255 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.13 vs. limit=10.0 2024-09-19 14:29:48,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=681221.3333333334, ans=0.125 2024-09-19 14:29:49,606 INFO [train.py:1198] (1/2) Epoch 38, batch 2550, loss[loss=0.1833, simple_loss=0.2379, pruned_loss=0.04736, ctc_loss=0.1021, cr_loss=0.3373, over 34163.00 frames. ], tot_loss[loss=0.2061, simple_loss=0.263, pruned_loss=0.05496, ctc_loss=0.1178, cr_loss=0.3921, over 6766845.94 frames. ], batch size: 78, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:29:53,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=681221.3333333334, ans=0.125 2024-09-19 14:30:01,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=681221.3333333334, ans=0.0 2024-09-19 14:30:03,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=681221.3333333334, ans=0.2 2024-09-19 14:30:06,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=681268.0, ans=0.0 2024-09-19 14:30:09,663 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:30:18,629 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2024-09-19 14:30:22,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681314.6666666666, ans=0.1 2024-09-19 14:30:29,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=681314.6666666666, ans=0.125 2024-09-19 14:30:36,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=681314.6666666666, ans=0.125 2024-09-19 14:30:39,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=681361.3333333334, ans=0.1 2024-09-19 14:31:12,829 INFO [train.py:1198] (1/2) Epoch 38, batch 2600, loss[loss=0.2006, simple_loss=0.2531, pruned_loss=0.0545, ctc_loss=0.1164, cr_loss=0.3964, over 34354.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.2633, pruned_loss=0.05507, ctc_loss=0.118, cr_loss=0.393, over 6762841.34 frames. ], batch size: 91, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:31:14,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=681454.6666666666, ans=0.0 2024-09-19 14:31:27,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=681501.3333333334, ans=0.07 2024-09-19 14:31:48,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.29 vs. limit=12.0 2024-09-19 14:32:11,128 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.82 vs. limit=22.5 2024-09-19 14:32:22,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=681641.3333333334, ans=0.0 2024-09-19 14:32:22,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=681641.3333333334, ans=0.1 2024-09-19 14:32:28,618 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.097e+02 2.582e+02 2.920e+02 3.732e+02 7.507e+02, threshold=5.840e+02, percent-clipped=6.0 2024-09-19 14:32:38,275 INFO [train.py:1198] (1/2) Epoch 38, batch 2650, loss[loss=0.2169, simple_loss=0.2745, pruned_loss=0.05907, ctc_loss=0.123, cr_loss=0.4175, over 34179.00 frames. ], tot_loss[loss=0.2063, simple_loss=0.2634, pruned_loss=0.05501, ctc_loss=0.1179, cr_loss=0.3929, over 6769639.66 frames. ], batch size: 117, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:34:00,561 INFO [train.py:1198] (1/2) Epoch 38, batch 2700, loss[loss=0.2008, simple_loss=0.2625, pruned_loss=0.05124, ctc_loss=0.1101, cr_loss=0.363, over 34617.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.2635, pruned_loss=0.05503, ctc_loss=0.1178, cr_loss=0.3929, over 6764669.86 frames. ], batch size: 102, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:34:02,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=681921.3333333334, ans=0.125 2024-09-19 14:34:02,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=681921.3333333334, ans=0.0 2024-09-19 14:34:09,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=681921.3333333334, ans=0.1 2024-09-19 14:34:25,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=681968.0, ans=0.125 2024-09-19 14:34:31,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.88 vs. limit=15.0 2024-09-19 14:34:31,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=682014.6666666666, ans=0.1 2024-09-19 14:34:40,372 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:34:40,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=682014.6666666666, ans=10.0 2024-09-19 14:34:43,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=682014.6666666666, ans=0.0 2024-09-19 14:35:06,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=682108.0, ans=0.0 2024-09-19 14:35:12,913 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.155e+02 2.599e+02 3.053e+02 3.829e+02 6.922e+02, threshold=6.105e+02, percent-clipped=1.0 2024-09-19 14:35:24,551 INFO [train.py:1198] (1/2) Epoch 38, batch 2750, loss[loss=0.2009, simple_loss=0.2552, pruned_loss=0.05406, ctc_loss=0.1137, cr_loss=0.395, over 34633.00 frames. ], tot_loss[loss=0.205, simple_loss=0.262, pruned_loss=0.05445, ctc_loss=0.1168, cr_loss=0.3902, over 6762117.92 frames. ], batch size: 88, lr: 3.13e-03, grad_scale: 16.0 2024-09-19 14:35:46,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=682201.3333333334, ans=0.125 2024-09-19 14:35:54,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=682201.3333333334, ans=0.125 2024-09-19 14:36:04,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=682248.0, ans=0.025 2024-09-19 14:36:19,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=682294.6666666666, ans=0.125 2024-09-19 14:36:24,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=682294.6666666666, ans=0.125 2024-09-19 14:36:34,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=682341.3333333334, ans=0.125 2024-09-19 14:36:39,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682341.3333333334, ans=0.1 2024-09-19 14:36:49,131 INFO [train.py:1198] (1/2) Epoch 38, batch 2800, loss[loss=0.2296, simple_loss=0.2823, pruned_loss=0.06612, ctc_loss=0.1415, cr_loss=0.4111, over 24478.00 frames. ], tot_loss[loss=0.2053, simple_loss=0.2623, pruned_loss=0.05464, ctc_loss=0.1171, cr_loss=0.3907, over 6741411.69 frames. ], batch size: 244, lr: 3.13e-03, grad_scale: 32.0 2024-09-19 14:37:00,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=682388.0, ans=0.125 2024-09-19 14:37:12,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=682434.6666666666, ans=0.09899494936611666 2024-09-19 14:37:20,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=682481.3333333334, ans=0.0 2024-09-19 14:37:25,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=682481.3333333334, ans=0.0 2024-09-19 14:37:39,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2024-09-19 14:38:01,681 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.001e+02 2.503e+02 2.860e+02 3.600e+02 5.714e+02, threshold=5.720e+02, percent-clipped=0.0 2024-09-19 14:38:11,543 INFO [train.py:1198] (1/2) Epoch 38, batch 2850, loss[loss=0.1938, simple_loss=0.251, pruned_loss=0.05022, ctc_loss=0.1084, cr_loss=0.3614, over 34503.00 frames. ], tot_loss[loss=0.2062, simple_loss=0.263, pruned_loss=0.05505, ctc_loss=0.1179, cr_loss=0.3923, over 6723928.75 frames. ], batch size: 90, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:38:13,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=682621.3333333334, ans=0.0 2024-09-19 14:38:33,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=682668.0, ans=0.125 2024-09-19 14:38:42,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2024-09-19 14:38:44,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=682714.6666666666, ans=0.025 2024-09-19 14:38:46,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682714.6666666666, ans=0.1 2024-09-19 14:38:54,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.28 vs. limit=15.0 2024-09-19 14:39:04,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2024-09-19 14:39:11,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=682761.3333333334, ans=0.07 2024-09-19 14:39:16,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=682761.3333333334, ans=0.2 2024-09-19 14:39:35,431 INFO [train.py:1198] (1/2) Epoch 38, batch 2900, loss[loss=0.2116, simple_loss=0.2657, pruned_loss=0.05823, ctc_loss=0.1229, cr_loss=0.4099, over 34522.00 frames. ], tot_loss[loss=0.2075, simple_loss=0.2643, pruned_loss=0.05554, ctc_loss=0.1189, cr_loss=0.395, over 6753797.25 frames. ], batch size: 94, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:39:48,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=682854.6666666666, ans=0.125 2024-09-19 14:39:57,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=682901.3333333334, ans=0.0 2024-09-19 14:40:27,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=682994.6666666666, ans=0.0 2024-09-19 14:40:28,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-09-19 14:40:29,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=682994.6666666666, ans=0.1 2024-09-19 14:40:45,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=683041.3333333334, ans=0.2 2024-09-19 14:40:50,016 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.575e+02 3.026e+02 3.921e+02 7.295e+02, threshold=6.051e+02, percent-clipped=3.0 2024-09-19 14:40:59,883 INFO [train.py:1198] (1/2) Epoch 38, batch 2950, loss[loss=0.2044, simple_loss=0.2571, pruned_loss=0.05587, ctc_loss=0.119, cr_loss=0.4048, over 34638.00 frames. ], tot_loss[loss=0.2063, simple_loss=0.263, pruned_loss=0.05509, ctc_loss=0.1181, cr_loss=0.3925, over 6748699.37 frames. ], batch size: 88, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:41:19,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=683134.6666666666, ans=0.1 2024-09-19 14:41:24,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=683134.6666666666, ans=0.125 2024-09-19 14:41:28,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=683134.6666666666, ans=0.125 2024-09-19 14:41:34,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=683181.3333333334, ans=0.07 2024-09-19 14:41:36,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=683181.3333333334, ans=10.0 2024-09-19 14:41:38,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=683181.3333333334, ans=0.07 2024-09-19 14:42:09,844 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:42:22,638 INFO [train.py:1198] (1/2) Epoch 38, batch 3000, loss[loss=0.2161, simple_loss=0.2745, pruned_loss=0.05809, ctc_loss=0.1246, cr_loss=0.4148, over 34547.00 frames. ], tot_loss[loss=0.2061, simple_loss=0.2629, pruned_loss=0.055, ctc_loss=0.118, cr_loss=0.3927, over 6748023.61 frames. ], batch size: 94, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:42:22,638 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 14:42:34,926 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.5155, 5.2412, 4.0428, 2.4105], device='cuda:1') 2024-09-19 14:42:38,841 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9511, 3.8054, 3.4925, 3.6903], device='cuda:1') 2024-09-19 14:42:39,676 INFO [train.py:1230] (1/2) Epoch 38, validation: loss=0.1481, simple_loss=0.2422, pruned_loss=0.02319, ctc_loss=0.03868, cr_loss=2.127e-14, over 944034.00 frames. 2024-09-19 14:42:39,676 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 14:42:48,323 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:42:51,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=683321.3333333334, ans=0.02 2024-09-19 14:42:55,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.54 vs. limit=15.0 2024-09-19 14:43:01,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=683368.0, ans=0.0 2024-09-19 14:43:32,163 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:43:35,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.51 vs. limit=15.0 2024-09-19 14:43:38,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=683461.3333333334, ans=0.125 2024-09-19 14:43:53,097 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.114e+02 2.438e+02 2.732e+02 3.147e+02 4.935e+02, threshold=5.463e+02, percent-clipped=0.0 2024-09-19 14:43:53,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=683508.0, ans=0.0 2024-09-19 14:44:02,966 INFO [train.py:1198] (1/2) Epoch 38, batch 3050, loss[loss=0.1988, simple_loss=0.2521, pruned_loss=0.05359, ctc_loss=0.1134, cr_loss=0.3904, over 34565.00 frames. ], tot_loss[loss=0.2068, simple_loss=0.2637, pruned_loss=0.05522, ctc_loss=0.1184, cr_loss=0.3937, over 6739638.17 frames. ], batch size: 89, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:44:03,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=683554.6666666666, ans=0.125 2024-09-19 14:44:09,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=683554.6666666666, ans=0.1 2024-09-19 14:44:14,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=683554.6666666666, ans=0.0 2024-09-19 14:44:19,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=683601.3333333334, ans=0.0 2024-09-19 14:44:23,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=15.0 2024-09-19 14:44:37,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=683648.0, ans=0.0 2024-09-19 14:44:47,633 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.84 vs. limit=6.0 2024-09-19 14:44:55,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=683694.6666666666, ans=0.0 2024-09-19 14:44:55,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=683694.6666666666, ans=0.125 2024-09-19 14:44:57,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2024-09-19 14:44:59,134 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.44 vs. limit=15.0 2024-09-19 14:45:08,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=683741.3333333334, ans=0.125 2024-09-19 14:45:24,028 INFO [train.py:1198] (1/2) Epoch 38, batch 3100, loss[loss=0.216, simple_loss=0.2767, pruned_loss=0.05709, ctc_loss=0.123, cr_loss=0.4128, over 34233.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.2633, pruned_loss=0.05511, ctc_loss=0.1181, cr_loss=0.3931, over 6739411.19 frames. ], batch size: 117, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:45:25,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=683788.0, ans=0.025 2024-09-19 14:45:29,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=683788.0, ans=0.07 2024-09-19 14:45:34,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=8.18 vs. limit=15.0 2024-09-19 14:45:53,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=683834.6666666666, ans=0.125 2024-09-19 14:45:59,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=683881.3333333334, ans=0.2 2024-09-19 14:46:00,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.48 vs. limit=15.0 2024-09-19 14:46:31,258 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=15.0 2024-09-19 14:46:36,577 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.034e+02 2.487e+02 2.751e+02 3.415e+02 6.762e+02, threshold=5.502e+02, percent-clipped=2.0 2024-09-19 14:46:44,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.52 vs. limit=22.5 2024-09-19 14:46:44,775 INFO [train.py:1198] (1/2) Epoch 38, batch 3150, loss[loss=0.2282, simple_loss=0.2882, pruned_loss=0.06277, ctc_loss=0.1336, cr_loss=0.3986, over 33804.00 frames. ], tot_loss[loss=0.206, simple_loss=0.2631, pruned_loss=0.05481, ctc_loss=0.1176, cr_loss=0.3925, over 6746666.68 frames. ], batch size: 122, lr: 3.12e-03, grad_scale: 16.0 2024-09-19 14:46:52,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=684021.3333333334, ans=0.95 2024-09-19 14:47:04,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=684068.0, ans=0.125 2024-09-19 14:47:10,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=684068.0, ans=0.07 2024-09-19 14:47:25,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=684114.6666666666, ans=0.1 2024-09-19 14:47:41,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=684161.3333333334, ans=10.0 2024-09-19 14:47:54,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=684208.0, ans=0.125 2024-09-19 14:48:00,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=684208.0, ans=0.0 2024-09-19 14:48:05,267 INFO [train.py:1198] (1/2) Epoch 38, batch 3200, loss[loss=0.2019, simple_loss=0.2598, pruned_loss=0.05282, ctc_loss=0.115, cr_loss=0.387, over 34558.00 frames. ], tot_loss[loss=0.2054, simple_loss=0.2625, pruned_loss=0.05454, ctc_loss=0.1172, cr_loss=0.3916, over 6759918.43 frames. ], batch size: 94, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:48:34,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684301.3333333334, ans=0.1 2024-09-19 14:48:40,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=684348.0, ans=0.0 2024-09-19 14:48:42,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=684348.0, ans=0.1 2024-09-19 14:48:52,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=684348.0, ans=0.0 2024-09-19 14:49:08,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=684394.6666666666, ans=0.0 2024-09-19 14:49:11,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=684441.3333333334, ans=0.025 2024-09-19 14:49:19,289 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.062e+02 2.460e+02 2.786e+02 3.444e+02 1.434e+03, threshold=5.572e+02, percent-clipped=1.0 2024-09-19 14:49:27,304 INFO [train.py:1198] (1/2) Epoch 38, batch 3250, loss[loss=0.205, simple_loss=0.2641, pruned_loss=0.05385, ctc_loss=0.116, cr_loss=0.3784, over 34652.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.263, pruned_loss=0.05474, ctc_loss=0.1176, cr_loss=0.3926, over 6769676.72 frames. ], batch size: 98, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:49:31,539 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.18 vs. limit=10.0 2024-09-19 14:49:37,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2024-09-19 14:50:04,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684581.3333333334, ans=0.1 2024-09-19 14:50:49,314 INFO [train.py:1198] (1/2) Epoch 38, batch 3300, loss[loss=0.2165, simple_loss=0.2738, pruned_loss=0.05886, ctc_loss=0.1263, cr_loss=0.4043, over 32991.00 frames. ], tot_loss[loss=0.2045, simple_loss=0.2616, pruned_loss=0.05428, ctc_loss=0.1166, cr_loss=0.3896, over 6767854.00 frames. ], batch size: 130, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:50:53,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=684721.3333333334, ans=0.0 2024-09-19 14:51:11,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.16 vs. limit=15.0 2024-09-19 14:51:22,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=684814.6666666666, ans=0.025 2024-09-19 14:51:25,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=684814.6666666666, ans=0.0 2024-09-19 14:51:51,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=684861.3333333334, ans=0.0 2024-09-19 14:52:02,457 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.128e+02 2.474e+02 2.729e+02 3.162e+02 5.190e+02, threshold=5.459e+02, percent-clipped=0.0 2024-09-19 14:52:03,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=684908.0, ans=10.0 2024-09-19 14:52:10,498 INFO [train.py:1198] (1/2) Epoch 38, batch 3350, loss[loss=0.208, simple_loss=0.2674, pruned_loss=0.05485, ctc_loss=0.1167, cr_loss=0.3868, over 33928.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2625, pruned_loss=0.05479, ctc_loss=0.1175, cr_loss=0.3918, over 6741489.10 frames. ], batch size: 122, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:52:10,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=684954.6666666666, ans=0.1 2024-09-19 14:52:15,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=684954.6666666666, ans=0.125 2024-09-19 14:52:17,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=684954.6666666666, ans=0.125 2024-09-19 14:52:23,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=684954.6666666666, ans=0.125 2024-09-19 14:52:44,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685048.0, ans=0.1 2024-09-19 14:52:47,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=685048.0, ans=0.125 2024-09-19 14:52:49,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=685048.0, ans=0.0 2024-09-19 14:52:56,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=685048.0, ans=0.2 2024-09-19 14:52:56,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=685048.0, ans=0.1 2024-09-19 14:53:01,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=685094.6666666666, ans=0.125 2024-09-19 14:53:04,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=685094.6666666666, ans=0.0 2024-09-19 14:53:28,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=685141.3333333334, ans=0.0 2024-09-19 14:53:32,638 INFO [train.py:1198] (1/2) Epoch 38, batch 3400, loss[loss=0.1838, simple_loss=0.2388, pruned_loss=0.04696, ctc_loss=0.1032, cr_loss=0.3544, over 34139.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2624, pruned_loss=0.05483, ctc_loss=0.1176, cr_loss=0.392, over 6733563.93 frames. ], batch size: 78, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:53:45,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=685188.0, ans=0.09899494936611666 2024-09-19 14:53:59,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=685234.6666666666, ans=0.1 2024-09-19 14:54:10,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=685281.3333333334, ans=0.0 2024-09-19 14:54:16,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=685281.3333333334, ans=0.1 2024-09-19 14:54:31,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=685328.0, ans=0.1 2024-09-19 14:54:45,192 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.141e+02 2.473e+02 2.811e+02 3.265e+02 6.202e+02, threshold=5.623e+02, percent-clipped=2.0 2024-09-19 14:54:51,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=685421.3333333334, ans=0.125 2024-09-19 14:54:53,254 INFO [train.py:1198] (1/2) Epoch 38, batch 3450, loss[loss=0.207, simple_loss=0.2666, pruned_loss=0.05451, ctc_loss=0.1156, cr_loss=0.3821, over 33049.00 frames. ], tot_loss[loss=0.206, simple_loss=0.2628, pruned_loss=0.05499, ctc_loss=0.1179, cr_loss=0.3926, over 6745588.96 frames. ], batch size: 130, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:55:12,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=685468.0, ans=0.0 2024-09-19 14:55:16,412 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.92 vs. limit=12.0 2024-09-19 14:55:41,433 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=685561.3333333334, ans=0.025 2024-09-19 14:56:14,445 INFO [train.py:1198] (1/2) Epoch 38, batch 3500, loss[loss=0.1806, simple_loss=0.241, pruned_loss=0.04382, ctc_loss=0.09605, cr_loss=0.3317, over 34480.00 frames. ], tot_loss[loss=0.2055, simple_loss=0.2624, pruned_loss=0.05474, ctc_loss=0.1174, cr_loss=0.3916, over 6746883.80 frames. ], batch size: 85, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:56:32,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=685701.3333333334, ans=0.125 2024-09-19 14:56:47,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=685748.0, ans=0.2 2024-09-19 14:57:26,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.171e+02 2.548e+02 3.019e+02 3.742e+02 7.424e+02, threshold=6.039e+02, percent-clipped=3.0 2024-09-19 14:57:34,433 INFO [train.py:1198] (1/2) Epoch 38, batch 3550, loss[loss=0.2142, simple_loss=0.2742, pruned_loss=0.05699, ctc_loss=0.1206, cr_loss=0.4023, over 34393.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2625, pruned_loss=0.05476, ctc_loss=0.1174, cr_loss=0.3919, over 6756875.32 frames. ], batch size: 103, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:58:04,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=685934.6666666666, ans=0.125 2024-09-19 14:58:27,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=686028.0, ans=0.125 2024-09-19 14:58:32,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.79 vs. limit=10.0 2024-09-19 14:58:33,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=686028.0, ans=0.125 2024-09-19 14:58:46,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=686074.6666666666, ans=0.125 2024-09-19 14:58:52,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=686074.6666666666, ans=0.0 2024-09-19 14:58:54,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=686121.3333333334, ans=0.0 2024-09-19 14:58:55,509 INFO [train.py:1198] (1/2) Epoch 38, batch 3600, loss[loss=0.2037, simple_loss=0.2555, pruned_loss=0.05532, ctc_loss=0.1218, cr_loss=0.4257, over 34484.00 frames. ], tot_loss[loss=0.2058, simple_loss=0.2629, pruned_loss=0.0548, ctc_loss=0.1176, cr_loss=0.3927, over 6766123.87 frames. ], batch size: 90, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 14:59:25,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=686168.0, ans=0.125 2024-09-19 14:59:30,404 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 14:59:31,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=686214.6666666666, ans=0.025 2024-09-19 14:59:33,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=686214.6666666666, ans=0.0 2024-09-19 14:59:47,109 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2024-09-19 15:00:08,019 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.119e+02 2.672e+02 3.390e+02 4.550e+02 8.392e+02, threshold=6.780e+02, percent-clipped=9.0 2024-09-19 15:00:08,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=686308.0, ans=0.09899494936611666 2024-09-19 15:00:16,176 INFO [train.py:1198] (1/2) Epoch 38, batch 3650, loss[loss=0.2185, simple_loss=0.2761, pruned_loss=0.05963, ctc_loss=0.1249, cr_loss=0.4161, over 34443.00 frames. ], tot_loss[loss=0.2052, simple_loss=0.2622, pruned_loss=0.05453, ctc_loss=0.117, cr_loss=0.3908, over 6769177.51 frames. ], batch size: 110, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 15:00:16,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=686354.6666666666, ans=0.0 2024-09-19 15:00:17,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.39 vs. limit=15.0 2024-09-19 15:00:44,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686401.3333333334, ans=0.1 2024-09-19 15:00:47,349 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:00:54,028 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2024-09-19 15:00:54,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.51 vs. limit=15.0 2024-09-19 15:01:00,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=15.0 2024-09-19 15:01:14,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=686494.6666666666, ans=0.125 2024-09-19 15:01:35,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=686588.0, ans=0.125 2024-09-19 15:01:36,909 INFO [train.py:1198] (1/2) Epoch 38, batch 3700, loss[loss=0.2112, simple_loss=0.2693, pruned_loss=0.05689, ctc_loss=0.1176, cr_loss=0.3981, over 34593.00 frames. ], tot_loss[loss=0.2052, simple_loss=0.2625, pruned_loss=0.05447, ctc_loss=0.1169, cr_loss=0.3911, over 6783563.12 frames. ], batch size: 102, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 15:01:37,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=1.91 vs. limit=15.0 2024-09-19 15:01:45,927 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=8.52 vs. limit=15.0 2024-09-19 15:01:53,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=686634.6666666666, ans=0.0 2024-09-19 15:01:55,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=686634.6666666666, ans=0.0 2024-09-19 15:02:24,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=686728.0, ans=0.2 2024-09-19 15:02:31,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=686728.0, ans=10.0 2024-09-19 15:02:41,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=686774.6666666666, ans=0.1 2024-09-19 15:02:44,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=686774.6666666666, ans=0.0 2024-09-19 15:02:50,771 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.195e+02 2.387e+02 2.578e+02 2.831e+02 6.150e+02, threshold=5.155e+02, percent-clipped=0.0 2024-09-19 15:02:58,848 INFO [train.py:1198] (1/2) Epoch 38, batch 3750, loss[loss=0.2217, simple_loss=0.2793, pruned_loss=0.06063, ctc_loss=0.1301, cr_loss=0.4188, over 34307.00 frames. ], tot_loss[loss=0.2084, simple_loss=0.2657, pruned_loss=0.05573, ctc_loss=0.1193, cr_loss=0.3964, over 6784916.27 frames. ], batch size: 113, lr: 3.12e-03, grad_scale: 32.0 2024-09-19 15:03:03,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=686821.3333333334, ans=0.1 2024-09-19 15:03:08,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=686821.3333333334, ans=0.125 2024-09-19 15:03:23,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=686868.0, ans=0.09899494936611666 2024-09-19 15:03:51,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=686961.3333333334, ans=0.125 2024-09-19 15:03:57,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=686961.3333333334, ans=0.125 2024-09-19 15:04:09,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=687008.0, ans=10.0 2024-09-19 15:04:19,845 INFO [train.py:1198] (1/2) Epoch 38, batch 3800, loss[loss=0.2284, simple_loss=0.2768, pruned_loss=0.06723, ctc_loss=0.1415, cr_loss=0.43, over 30021.00 frames. ], tot_loss[loss=0.2112, simple_loss=0.268, pruned_loss=0.05695, ctc_loss=0.1217, cr_loss=0.4018, over 6673509.52 frames. ], batch size: 175, lr: 3.11e-03, grad_scale: 32.0 2024-09-19 15:04:54,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=687148.0, ans=0.1 2024-09-19 15:05:14,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2024-09-19 15:05:16,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.73 vs. limit=22.5 2024-09-19 15:05:22,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=687194.6666666666, ans=0.025 2024-09-19 15:05:22,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=687194.6666666666, ans=0.0 2024-09-19 15:05:35,203 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 2.497e+02 2.732e+02 3.034e+02 5.198e+02, threshold=5.464e+02, percent-clipped=1.0 2024-09-19 15:05:35,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=687241.3333333334, ans=0.125 2024-09-19 15:05:43,421 INFO [train.py:1198] (1/2) Epoch 38, batch 3850, loss[loss=0.2234, simple_loss=0.276, pruned_loss=0.06297, ctc_loss=0.1397, cr_loss=0.4212, over 24077.00 frames. ], tot_loss[loss=0.214, simple_loss=0.2698, pruned_loss=0.0585, ctc_loss=0.1249, cr_loss=0.4051, over 6250636.95 frames. ], batch size: 244, lr: 3.11e-03, grad_scale: 32.0 2024-09-19 15:06:16,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=687381.3333333334, ans=0.0 2024-09-19 15:07:08,237 INFO [train.py:1198] (1/2) Epoch 39, batch 0, loss[loss=0.2025, simple_loss=0.2559, pruned_loss=0.05477, ctc_loss=0.118, cr_loss=0.3998, over 34455.00 frames. ], tot_loss[loss=0.2025, simple_loss=0.2559, pruned_loss=0.05477, ctc_loss=0.118, cr_loss=0.3998, over 34455.00 frames. ], batch size: 85, lr: 3.07e-03, grad_scale: 32.0 2024-09-19 15:07:08,238 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 15:07:11,863 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.4.self_attn_weights, attn_weights_entropy = tensor([3.0856, 2.4241, 2.5427, 2.8659, 2.2171, 2.6495, 2.6497, 2.7123], device='cuda:1') 2024-09-19 15:07:14,705 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.6426, 3.1255, 5.9465, 5.6265], device='cuda:1') 2024-09-19 15:07:24,871 INFO [train.py:1230] (1/2) Epoch 39, validation: loss=0.1491, simple_loss=0.2436, pruned_loss=0.02341, ctc_loss=0.03942, cr_loss=2.239e-14, over 944034.00 frames. 2024-09-19 15:07:24,872 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 15:07:37,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=687409.3333333334, ans=0.125 2024-09-19 15:07:57,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=687502.6666666666, ans=0.0 2024-09-19 15:08:00,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=687502.6666666666, ans=0.125 2024-09-19 15:08:06,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2024-09-19 15:08:42,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=687596.0, ans=0.125 2024-09-19 15:08:49,894 INFO [train.py:1198] (1/2) Epoch 39, batch 50, loss[loss=0.1741, simple_loss=0.2324, pruned_loss=0.04211, ctc_loss=0.09385, cr_loss=0.3201, over 34484.00 frames. ], tot_loss[loss=0.208, simple_loss=0.2645, pruned_loss=0.05583, ctc_loss=0.1195, cr_loss=0.4001, over 1480747.23 frames. ], batch size: 82, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 15:09:22,955 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.100e+02 2.541e+02 2.779e+02 3.177e+02 6.303e+02, threshold=5.559e+02, percent-clipped=2.0 2024-09-19 15:09:23,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=687736.0, ans=0.1 2024-09-19 15:09:56,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=687829.3333333334, ans=0.125 2024-09-19 15:10:05,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=687829.3333333334, ans=0.125 2024-09-19 15:10:07,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=687829.3333333334, ans=0.125 2024-09-19 15:10:12,459 INFO [train.py:1198] (1/2) Epoch 39, batch 100, loss[loss=0.203, simple_loss=0.2574, pruned_loss=0.05511, ctc_loss=0.1146, cr_loss=0.3867, over 34581.00 frames. ], tot_loss[loss=0.2098, simple_loss=0.2663, pruned_loss=0.05657, ctc_loss=0.1209, cr_loss=0.401, over 2629341.92 frames. ], batch size: 89, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 15:10:39,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=6.57 vs. limit=15.0 2024-09-19 15:11:00,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2024-09-19 15:11:07,719 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.29 vs. limit=10.0 2024-09-19 15:11:35,107 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.69 vs. limit=10.0 2024-09-19 15:11:35,627 INFO [train.py:1198] (1/2) Epoch 39, batch 150, loss[loss=0.1818, simple_loss=0.2394, pruned_loss=0.04482, ctc_loss=0.1007, cr_loss=0.3579, over 34465.00 frames. ], tot_loss[loss=0.2068, simple_loss=0.2636, pruned_loss=0.05525, ctc_loss=0.1185, cr_loss=0.3951, over 3557270.20 frames. ], batch size: 82, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 15:12:10,586 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.117e+02 2.515e+02 2.914e+02 3.637e+02 5.774e+02, threshold=5.829e+02, percent-clipped=2.0 2024-09-19 15:12:14,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=688202.6666666666, ans=0.1 2024-09-19 15:12:17,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=688202.6666666666, ans=0.0 2024-09-19 15:12:37,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2024-09-19 15:12:39,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=22.5 2024-09-19 15:12:43,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=688296.0, ans=0.07 2024-09-19 15:12:59,742 INFO [train.py:1198] (1/2) Epoch 39, batch 200, loss[loss=0.2226, simple_loss=0.2765, pruned_loss=0.06285, ctc_loss=0.1299, cr_loss=0.4263, over 31988.00 frames. ], tot_loss[loss=0.206, simple_loss=0.2627, pruned_loss=0.05493, ctc_loss=0.1179, cr_loss=0.3946, over 4270262.96 frames. ], batch size: 145, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 15:13:10,556 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.18 vs. limit=12.0 2024-09-19 15:13:36,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=688436.0, ans=0.125 2024-09-19 15:13:37,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=688436.0, ans=0.0 2024-09-19 15:13:42,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=688436.0, ans=0.0 2024-09-19 15:13:46,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=688436.0, ans=0.0 2024-09-19 15:13:57,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=688482.6666666666, ans=0.125 2024-09-19 15:14:21,848 INFO [train.py:1198] (1/2) Epoch 39, batch 250, loss[loss=0.2248, simple_loss=0.2805, pruned_loss=0.06231, ctc_loss=0.1334, cr_loss=0.4461, over 34302.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2624, pruned_loss=0.0548, ctc_loss=0.1175, cr_loss=0.3932, over 4832591.26 frames. ], batch size: 117, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 15:14:22,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=688576.0, ans=0.1 2024-09-19 15:14:32,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=688576.0, ans=0.125 2024-09-19 15:14:43,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=688622.6666666666, ans=0.0 2024-09-19 15:14:50,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=688622.6666666666, ans=0.0 2024-09-19 15:14:51,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=8.0 2024-09-19 15:14:56,728 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.136e+02 2.713e+02 3.392e+02 4.412e+02 9.541e+02, threshold=6.784e+02, percent-clipped=7.0 2024-09-19 15:14:58,868 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:15:10,807 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.78 vs. limit=12.0 2024-09-19 15:15:16,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=688716.0, ans=0.1 2024-09-19 15:15:20,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=688716.0, ans=0.125 2024-09-19 15:15:32,504 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.92 vs. limit=22.5 2024-09-19 15:15:48,260 INFO [train.py:1198] (1/2) Epoch 39, batch 300, loss[loss=0.2225, simple_loss=0.2778, pruned_loss=0.06248, ctc_loss=0.1287, cr_loss=0.411, over 34328.00 frames. ], tot_loss[loss=0.2054, simple_loss=0.2622, pruned_loss=0.05472, ctc_loss=0.1172, cr_loss=0.3928, over 5261031.94 frames. ], batch size: 107, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 15:16:09,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=688856.0, ans=0.0 2024-09-19 15:16:39,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=688949.3333333334, ans=0.0 2024-09-19 15:17:00,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=688996.0, ans=0.1 2024-09-19 15:17:08,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=689042.6666666666, ans=0.2 2024-09-19 15:17:09,816 INFO [train.py:1198] (1/2) Epoch 39, batch 350, loss[loss=0.1747, simple_loss=0.2346, pruned_loss=0.04139, ctc_loss=0.09331, cr_loss=0.3332, over 34281.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.2626, pruned_loss=0.05494, ctc_loss=0.1177, cr_loss=0.394, over 5596721.68 frames. ], batch size: 83, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 15:17:13,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=689042.6666666666, ans=0.0 2024-09-19 15:17:24,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=689089.3333333334, ans=0.125 2024-09-19 15:17:42,235 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.088e+02 2.512e+02 2.871e+02 3.739e+02 6.128e+02, threshold=5.742e+02, percent-clipped=0.0 2024-09-19 15:17:50,106 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.42 vs. limit=22.5 2024-09-19 15:18:08,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=689182.6666666666, ans=0.07 2024-09-19 15:18:20,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=689229.3333333334, ans=0.125 2024-09-19 15:18:22,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=689229.3333333334, ans=0.2 2024-09-19 15:18:22,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=689229.3333333334, ans=0.125 2024-09-19 15:18:31,425 INFO [train.py:1198] (1/2) Epoch 39, batch 400, loss[loss=0.2167, simple_loss=0.2731, pruned_loss=0.0594, ctc_loss=0.1246, cr_loss=0.414, over 34421.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2624, pruned_loss=0.05487, ctc_loss=0.1173, cr_loss=0.3926, over 5864352.50 frames. ], batch size: 95, lr: 3.07e-03, grad_scale: 32.0 2024-09-19 15:18:40,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.77 vs. limit=12.0 2024-09-19 15:18:49,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=689322.6666666666, ans=0.125 2024-09-19 15:18:54,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=689322.6666666666, ans=0.025 2024-09-19 15:19:10,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=689369.3333333334, ans=0.125 2024-09-19 15:19:15,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=689369.3333333334, ans=0.125 2024-09-19 15:19:20,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=689369.3333333334, ans=0.125 2024-09-19 15:19:21,278 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=12.0 2024-09-19 15:19:22,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=689369.3333333334, ans=0.2 2024-09-19 15:19:27,510 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.034e-01 2024-09-19 15:19:42,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=689462.6666666666, ans=0.2 2024-09-19 15:19:47,395 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:19:47,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=689462.6666666666, ans=0.95 2024-09-19 15:19:58,262 INFO [train.py:1198] (1/2) Epoch 39, batch 450, loss[loss=0.2201, simple_loss=0.2764, pruned_loss=0.06047, ctc_loss=0.1283, cr_loss=0.4319, over 34685.00 frames. ], tot_loss[loss=0.2053, simple_loss=0.2621, pruned_loss=0.05465, ctc_loss=0.1171, cr_loss=0.3926, over 6053291.85 frames. ], batch size: 97, lr: 3.07e-03, grad_scale: 32.0 2024-09-19 15:20:03,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=689509.3333333334, ans=0.125 2024-09-19 15:20:10,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=689509.3333333334, ans=0.0 2024-09-19 15:20:27,059 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.46 vs. limit=10.0 2024-09-19 15:20:29,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=689602.6666666666, ans=0.125 2024-09-19 15:20:31,395 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.430e+02 2.863e+02 3.661e+02 6.646e+02, threshold=5.726e+02, percent-clipped=6.0 2024-09-19 15:20:36,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=689602.6666666666, ans=0.2 2024-09-19 15:21:21,161 INFO [train.py:1198] (1/2) Epoch 39, batch 500, loss[loss=0.2248, simple_loss=0.2829, pruned_loss=0.06196, ctc_loss=0.1303, cr_loss=0.4204, over 34414.00 frames. ], tot_loss[loss=0.204, simple_loss=0.261, pruned_loss=0.05407, ctc_loss=0.116, cr_loss=0.3899, over 6219510.52 frames. ], batch size: 110, lr: 3.07e-03, grad_scale: 32.0 2024-09-19 15:21:23,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=689742.6666666666, ans=0.1 2024-09-19 15:21:31,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=689742.6666666666, ans=0.0 2024-09-19 15:21:39,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=689789.3333333334, ans=0.0 2024-09-19 15:21:41,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=689789.3333333334, ans=0.0 2024-09-19 15:21:59,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=689836.0, ans=0.1 2024-09-19 15:22:15,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=689882.6666666666, ans=0.0 2024-09-19 15:22:20,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=689882.6666666666, ans=0.0 2024-09-19 15:22:29,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=689929.3333333334, ans=0.0 2024-09-19 15:22:34,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=689929.3333333334, ans=0.0 2024-09-19 15:22:45,738 INFO [train.py:1198] (1/2) Epoch 39, batch 550, loss[loss=0.2187, simple_loss=0.2773, pruned_loss=0.05867, ctc_loss=0.1273, cr_loss=0.4341, over 33893.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2612, pruned_loss=0.05407, ctc_loss=0.116, cr_loss=0.3898, over 6330543.29 frames. ], batch size: 122, lr: 3.07e-03, grad_scale: 32.0 2024-09-19 15:23:04,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=690022.6666666666, ans=0.125 2024-09-19 15:23:14,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=690022.6666666666, ans=0.2 2024-09-19 15:23:20,596 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.131e+02 2.452e+02 2.713e+02 3.397e+02 7.487e+02, threshold=5.426e+02, percent-clipped=2.0 2024-09-19 15:23:35,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=690116.0, ans=0.0 2024-09-19 15:23:52,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=690162.6666666666, ans=0.125 2024-09-19 15:24:00,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=690162.6666666666, ans=0.125 2024-09-19 15:24:02,977 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=15.0 2024-09-19 15:24:10,147 INFO [train.py:1198] (1/2) Epoch 39, batch 600, loss[loss=0.2151, simple_loss=0.2786, pruned_loss=0.05593, ctc_loss=0.1207, cr_loss=0.3904, over 34204.00 frames. ], tot_loss[loss=0.2045, simple_loss=0.2616, pruned_loss=0.05424, ctc_loss=0.1163, cr_loss=0.3903, over 6430919.62 frames. ], batch size: 117, lr: 3.07e-03, grad_scale: 32.0 2024-09-19 15:24:12,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2024-09-19 15:24:23,694 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:24:28,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=690256.0, ans=0.125 2024-09-19 15:24:31,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=690256.0, ans=0.125 2024-09-19 15:24:34,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=690256.0, ans=0.1 2024-09-19 15:24:50,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=690302.6666666666, ans=15.0 2024-09-19 15:24:54,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=690302.6666666666, ans=0.125 2024-09-19 15:25:09,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.20 vs. limit=15.0 2024-09-19 15:25:12,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=690349.3333333334, ans=0.0 2024-09-19 15:25:14,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.29 vs. limit=15.0 2024-09-19 15:25:15,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=690396.0, ans=0.1 2024-09-19 15:25:31,684 INFO [train.py:1198] (1/2) Epoch 39, batch 650, loss[loss=0.2037, simple_loss=0.2609, pruned_loss=0.05362, ctc_loss=0.1171, cr_loss=0.3942, over 34568.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2609, pruned_loss=0.05397, ctc_loss=0.1159, cr_loss=0.389, over 6521956.39 frames. ], batch size: 94, lr: 3.07e-03, grad_scale: 32.0 2024-09-19 15:25:44,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=690442.6666666666, ans=0.1 2024-09-19 15:25:58,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=690489.3333333334, ans=0.1 2024-09-19 15:26:04,301 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.183e+02 2.535e+02 2.919e+02 3.654e+02 6.676e+02, threshold=5.838e+02, percent-clipped=5.0 2024-09-19 15:26:04,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=690536.0, ans=0.125 2024-09-19 15:26:08,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=690536.0, ans=0.125 2024-09-19 15:26:34,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=690582.6666666666, ans=0.125 2024-09-19 15:26:49,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=690629.3333333334, ans=0.0 2024-09-19 15:27:00,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=690629.3333333334, ans=0.2 2024-09-19 15:27:03,531 INFO [train.py:1198] (1/2) Epoch 39, batch 700, loss[loss=0.1948, simple_loss=0.2514, pruned_loss=0.05085, ctc_loss=0.11, cr_loss=0.3662, over 34577.00 frames. ], tot_loss[loss=0.2045, simple_loss=0.2616, pruned_loss=0.05424, ctc_loss=0.1163, cr_loss=0.39, over 6577993.60 frames. ], batch size: 89, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 15:27:05,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=690676.0, ans=0.1 2024-09-19 15:27:19,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2024-09-19 15:27:30,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=690722.6666666666, ans=0.125 2024-09-19 15:27:38,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=690769.3333333334, ans=0.125 2024-09-19 15:27:38,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=690769.3333333334, ans=0.125 2024-09-19 15:27:55,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=690816.0, ans=0.0 2024-09-19 15:28:01,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=690816.0, ans=0.125 2024-09-19 15:28:26,512 INFO [train.py:1198] (1/2) Epoch 39, batch 750, loss[loss=0.2199, simple_loss=0.2774, pruned_loss=0.06043, ctc_loss=0.1267, cr_loss=0.4043, over 34409.00 frames. ], tot_loss[loss=0.2043, simple_loss=0.2614, pruned_loss=0.05415, ctc_loss=0.1163, cr_loss=0.3896, over 6620745.39 frames. ], batch size: 95, lr: 3.07e-03, grad_scale: 16.0 2024-09-19 15:28:34,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=690909.3333333334, ans=0.0 2024-09-19 15:28:49,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=690956.0, ans=0.0 2024-09-19 15:28:57,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=691002.6666666666, ans=0.125 2024-09-19 15:29:00,505 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.120e+02 2.568e+02 3.111e+02 3.737e+02 5.481e+02, threshold=6.222e+02, percent-clipped=0.0 2024-09-19 15:29:01,317 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=12.0 2024-09-19 15:29:48,365 INFO [train.py:1198] (1/2) Epoch 39, batch 800, loss[loss=0.1963, simple_loss=0.2509, pruned_loss=0.05169, ctc_loss=0.1133, cr_loss=0.3931, over 34471.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2611, pruned_loss=0.05408, ctc_loss=0.1162, cr_loss=0.3895, over 6658019.59 frames. ], batch size: 85, lr: 3.06e-03, grad_scale: 32.0 2024-09-19 15:29:56,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=691142.6666666666, ans=0.025 2024-09-19 15:30:11,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=691189.3333333334, ans=0.125 2024-09-19 15:30:51,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=691282.6666666666, ans=0.1 2024-09-19 15:30:53,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=691282.6666666666, ans=0.025 2024-09-19 15:30:56,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=691329.3333333334, ans=0.0 2024-09-19 15:31:14,130 INFO [train.py:1198] (1/2) Epoch 39, batch 850, loss[loss=0.2136, simple_loss=0.2702, pruned_loss=0.05798, ctc_loss=0.1243, cr_loss=0.4051, over 34356.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.261, pruned_loss=0.05391, ctc_loss=0.116, cr_loss=0.389, over 6690731.48 frames. ], batch size: 103, lr: 3.06e-03, grad_scale: 32.0 2024-09-19 15:31:19,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=691376.0, ans=0.2 2024-09-19 15:31:22,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=691376.0, ans=0.125 2024-09-19 15:31:48,272 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.036e+02 2.510e+02 2.891e+02 3.537e+02 5.553e+02, threshold=5.781e+02, percent-clipped=0.0 2024-09-19 15:32:11,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=691516.0, ans=0.0 2024-09-19 15:32:36,386 INFO [train.py:1198] (1/2) Epoch 39, batch 900, loss[loss=0.1911, simple_loss=0.2474, pruned_loss=0.04945, ctc_loss=0.1058, cr_loss=0.3665, over 34471.00 frames. ], tot_loss[loss=0.2043, simple_loss=0.2614, pruned_loss=0.05416, ctc_loss=0.1165, cr_loss=0.3901, over 6697767.57 frames. ], batch size: 85, lr: 3.06e-03, grad_scale: 32.0 2024-09-19 15:33:06,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=691656.0, ans=0.0 2024-09-19 15:33:23,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.88 vs. limit=15.0 2024-09-19 15:33:24,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=691749.3333333334, ans=0.125 2024-09-19 15:33:32,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=691749.3333333334, ans=0.2 2024-09-19 15:33:58,599 INFO [train.py:1198] (1/2) Epoch 39, batch 950, loss[loss=0.1981, simple_loss=0.2547, pruned_loss=0.05195, ctc_loss=0.1117, cr_loss=0.3831, over 34696.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2614, pruned_loss=0.05395, ctc_loss=0.1161, cr_loss=0.3893, over 6702117.42 frames. ], batch size: 87, lr: 3.06e-03, grad_scale: 32.0 2024-09-19 15:34:05,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=691842.6666666666, ans=0.1 2024-09-19 15:34:37,155 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.150e+02 2.733e+02 3.151e+02 3.985e+02 6.300e+02, threshold=6.303e+02, percent-clipped=2.0 2024-09-19 15:34:42,792 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2024-09-19 15:34:45,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=691936.0, ans=0.025 2024-09-19 15:34:49,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=691936.0, ans=0.125 2024-09-19 15:35:02,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=22.5 2024-09-19 15:35:24,568 INFO [train.py:1198] (1/2) Epoch 39, batch 1000, loss[loss=0.1902, simple_loss=0.2488, pruned_loss=0.04793, ctc_loss=0.1054, cr_loss=0.3648, over 34532.00 frames. ], tot_loss[loss=0.2045, simple_loss=0.2617, pruned_loss=0.05414, ctc_loss=0.1165, cr_loss=0.3902, over 6694747.30 frames. ], batch size: 90, lr: 3.06e-03, grad_scale: 32.0 2024-09-19 15:36:24,582 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2024-09-19 15:36:30,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=692262.6666666666, ans=0.2 2024-09-19 15:36:46,991 INFO [train.py:1198] (1/2) Epoch 39, batch 1050, loss[loss=0.2034, simple_loss=0.2636, pruned_loss=0.05256, ctc_loss=0.114, cr_loss=0.3823, over 34553.00 frames. ], tot_loss[loss=0.2039, simple_loss=0.2612, pruned_loss=0.05391, ctc_loss=0.116, cr_loss=0.3893, over 6702975.18 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2024-09-19 15:36:47,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=692309.3333333334, ans=0.125 2024-09-19 15:37:03,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=692356.0, ans=0.125 2024-09-19 15:37:05,410 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=692356.0, ans=0.125 2024-09-19 15:37:21,599 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.055e+02 2.566e+02 2.881e+02 3.729e+02 5.727e+02, threshold=5.763e+02, percent-clipped=0.0 2024-09-19 15:37:21,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=692402.6666666666, ans=0.0 2024-09-19 15:37:48,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.12 vs. limit=15.0 2024-09-19 15:38:02,541 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.71 vs. limit=10.0 2024-09-19 15:38:12,021 INFO [train.py:1198] (1/2) Epoch 39, batch 1100, loss[loss=0.2077, simple_loss=0.2615, pruned_loss=0.05653, ctc_loss=0.1219, cr_loss=0.4092, over 34358.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.2608, pruned_loss=0.05375, ctc_loss=0.1157, cr_loss=0.3888, over 6716825.06 frames. ], batch size: 91, lr: 3.06e-03, grad_scale: 32.0 2024-09-19 15:38:23,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=692542.6666666666, ans=0.1 2024-09-19 15:38:38,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=692589.3333333334, ans=0.125 2024-09-19 15:38:45,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=692636.0, ans=0.125 2024-09-19 15:39:34,392 INFO [train.py:1198] (1/2) Epoch 39, batch 1150, loss[loss=0.2167, simple_loss=0.2686, pruned_loss=0.06106, ctc_loss=0.128, cr_loss=0.4259, over 34360.00 frames. ], tot_loss[loss=0.2039, simple_loss=0.2611, pruned_loss=0.05396, ctc_loss=0.116, cr_loss=0.3892, over 6716282.18 frames. ], batch size: 91, lr: 3.06e-03, grad_scale: 32.0 2024-09-19 15:39:41,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2024-09-19 15:39:59,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=692822.6666666666, ans=0.125 2024-09-19 15:40:10,716 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.135e+02 2.501e+02 2.864e+02 3.464e+02 5.307e+02, threshold=5.728e+02, percent-clipped=0.0 2024-09-19 15:40:19,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=692869.3333333334, ans=0.0 2024-09-19 15:40:42,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=692962.6666666666, ans=0.1 2024-09-19 15:40:56,971 INFO [train.py:1198] (1/2) Epoch 39, batch 1200, loss[loss=0.2041, simple_loss=0.2658, pruned_loss=0.05179, ctc_loss=0.1135, cr_loss=0.4013, over 34585.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.2619, pruned_loss=0.0542, ctc_loss=0.1165, cr_loss=0.3907, over 6708258.58 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 32.0 2024-09-19 15:41:02,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.76 vs. limit=6.0 2024-09-19 15:41:05,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=693009.3333333334, ans=0.1 2024-09-19 15:41:05,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=693009.3333333334, ans=0.0 2024-09-19 15:41:13,053 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.99 vs. limit=15.0 2024-09-19 15:42:05,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=693196.0, ans=0.2 2024-09-19 15:42:07,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.30 vs. limit=12.0 2024-09-19 15:42:14,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.77 vs. limit=12.0 2024-09-19 15:42:23,138 INFO [train.py:1198] (1/2) Epoch 39, batch 1250, loss[loss=0.2332, simple_loss=0.2851, pruned_loss=0.06725, ctc_loss=0.141, cr_loss=0.4647, over 34350.00 frames. ], tot_loss[loss=0.2051, simple_loss=0.2625, pruned_loss=0.05438, ctc_loss=0.1167, cr_loss=0.3916, over 6741373.50 frames. ], batch size: 107, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 15:43:01,056 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.097e+02 2.443e+02 2.694e+02 3.437e+02 5.582e+02, threshold=5.389e+02, percent-clipped=0.0 2024-09-19 15:43:01,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=693336.0, ans=0.0 2024-09-19 15:43:04,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=693336.0, ans=0.125 2024-09-19 15:43:06,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=693336.0, ans=0.125 2024-09-19 15:43:16,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=693382.6666666666, ans=0.1 2024-09-19 15:43:31,599 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.27 vs. limit=15.0 2024-09-19 15:43:45,886 INFO [train.py:1198] (1/2) Epoch 39, batch 1300, loss[loss=0.2194, simple_loss=0.2783, pruned_loss=0.05936, ctc_loss=0.1275, cr_loss=0.4043, over 33171.00 frames. ], tot_loss[loss=0.2047, simple_loss=0.262, pruned_loss=0.05424, ctc_loss=0.1164, cr_loss=0.3905, over 6745189.78 frames. ], batch size: 130, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 15:43:56,678 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.70 vs. limit=22.5 2024-09-19 15:44:04,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=693522.6666666666, ans=0.0 2024-09-19 15:44:06,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.40 vs. limit=15.0 2024-09-19 15:44:41,040 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.01 vs. limit=15.0 2024-09-19 15:44:47,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=693616.0, ans=0.2 2024-09-19 15:44:50,592 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:44:51,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.38 vs. limit=15.0 2024-09-19 15:44:52,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=693662.6666666666, ans=0.125 2024-09-19 15:45:02,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=693662.6666666666, ans=0.2 2024-09-19 15:45:08,252 INFO [train.py:1198] (1/2) Epoch 39, batch 1350, loss[loss=0.2153, simple_loss=0.268, pruned_loss=0.06038, ctc_loss=0.1276, cr_loss=0.4091, over 34548.00 frames. ], tot_loss[loss=0.2049, simple_loss=0.2621, pruned_loss=0.05432, ctc_loss=0.1166, cr_loss=0.3914, over 6765120.31 frames. ], batch size: 94, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 15:45:50,845 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.124e+02 2.639e+02 3.340e+02 4.452e+02 6.573e+02, threshold=6.679e+02, percent-clipped=9.0 2024-09-19 15:45:59,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=693849.3333333334, ans=0.125 2024-09-19 15:46:04,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=693849.3333333334, ans=0.125 2024-09-19 15:46:14,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.60 vs. limit=15.0 2024-09-19 15:46:33,511 INFO [train.py:1198] (1/2) Epoch 39, batch 1400, loss[loss=0.1761, simple_loss=0.2342, pruned_loss=0.04328, ctc_loss=0.09182, cr_loss=0.3273, over 34289.00 frames. ], tot_loss[loss=0.2043, simple_loss=0.2616, pruned_loss=0.05412, ctc_loss=0.1162, cr_loss=0.3902, over 6777584.16 frames. ], batch size: 80, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 15:46:42,486 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.03 vs. limit=15.0 2024-09-19 15:46:53,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=693989.3333333334, ans=0.0 2024-09-19 15:47:21,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=694082.6666666666, ans=0.2 2024-09-19 15:47:36,437 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=694082.6666666666, ans=0.0 2024-09-19 15:47:49,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=694129.3333333334, ans=0.125 2024-09-19 15:47:55,986 INFO [train.py:1198] (1/2) Epoch 39, batch 1450, loss[loss=0.2187, simple_loss=0.2792, pruned_loss=0.05808, ctc_loss=0.1241, cr_loss=0.4308, over 34434.00 frames. ], tot_loss[loss=0.2047, simple_loss=0.2621, pruned_loss=0.05419, ctc_loss=0.1164, cr_loss=0.3907, over 6774228.04 frames. ], batch size: 110, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 15:48:20,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=694222.6666666666, ans=0.1 2024-09-19 15:48:22,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=694222.6666666666, ans=0.04949747468305833 2024-09-19 15:48:25,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=694222.6666666666, ans=0.0 2024-09-19 15:48:29,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=694269.3333333334, ans=0.0 2024-09-19 15:48:29,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=694269.3333333334, ans=0.0 2024-09-19 15:48:35,270 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.159e+02 2.444e+02 2.726e+02 3.442e+02 6.304e+02, threshold=5.452e+02, percent-clipped=0.0 2024-09-19 15:48:55,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=694316.0, ans=0.0 2024-09-19 15:49:22,105 INFO [train.py:1198] (1/2) Epoch 39, batch 1500, loss[loss=0.2163, simple_loss=0.2751, pruned_loss=0.05807, ctc_loss=0.1244, cr_loss=0.4137, over 34458.00 frames. ], tot_loss[loss=0.2049, simple_loss=0.2623, pruned_loss=0.05424, ctc_loss=0.1166, cr_loss=0.391, over 6775018.02 frames. ], batch size: 100, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 15:49:29,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=694409.3333333334, ans=0.125 2024-09-19 15:49:37,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.89 vs. limit=6.0 2024-09-19 15:49:40,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=694456.0, ans=0.1 2024-09-19 15:49:55,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=694502.6666666666, ans=0.1 2024-09-19 15:50:01,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=694502.6666666666, ans=0.125 2024-09-19 15:50:09,418 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.24 vs. limit=12.0 2024-09-19 15:50:28,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2024-09-19 15:50:31,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=694596.0, ans=0.125 2024-09-19 15:50:39,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=694596.0, ans=0.025 2024-09-19 15:50:43,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=694642.6666666666, ans=0.125 2024-09-19 15:50:44,543 INFO [train.py:1198] (1/2) Epoch 39, batch 1550, loss[loss=0.2282, simple_loss=0.2795, pruned_loss=0.06569, ctc_loss=0.1386, cr_loss=0.4437, over 34395.00 frames. ], tot_loss[loss=0.2052, simple_loss=0.2623, pruned_loss=0.05447, ctc_loss=0.1169, cr_loss=0.3912, over 6745360.86 frames. ], batch size: 105, lr: 3.06e-03, grad_scale: 8.0 2024-09-19 15:50:51,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=694642.6666666666, ans=0.125 2024-09-19 15:50:51,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694642.6666666666, ans=0.1 2024-09-19 15:50:51,729 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.79 vs. limit=15.0 2024-09-19 15:51:23,587 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.135e+02 2.511e+02 2.791e+02 3.666e+02 6.823e+02, threshold=5.582e+02, percent-clipped=4.0 2024-09-19 15:51:47,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=694782.6666666666, ans=0.09899494936611666 2024-09-19 15:51:53,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=694829.3333333334, ans=0.125 2024-09-19 15:51:55,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.97 vs. limit=22.5 2024-09-19 15:51:56,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=694829.3333333334, ans=0.2 2024-09-19 15:52:06,272 INFO [train.py:1198] (1/2) Epoch 39, batch 1600, loss[loss=0.2129, simple_loss=0.2752, pruned_loss=0.05496, ctc_loss=0.1225, cr_loss=0.4024, over 34550.00 frames. ], tot_loss[loss=0.2051, simple_loss=0.2623, pruned_loss=0.05443, ctc_loss=0.1169, cr_loss=0.3903, over 6724240.87 frames. ], batch size: 99, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 15:52:13,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=694876.0, ans=0.0 2024-09-19 15:52:14,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=694876.0, ans=0.2 2024-09-19 15:52:29,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=694922.6666666666, ans=0.125 2024-09-19 15:52:40,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694969.3333333334, ans=0.1 2024-09-19 15:53:06,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=695016.0, ans=0.025 2024-09-19 15:53:09,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.76 vs. limit=22.5 2024-09-19 15:53:25,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=695062.6666666666, ans=0.025 2024-09-19 15:53:25,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.53 vs. limit=12.0 2024-09-19 15:53:29,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=695062.6666666666, ans=0.125 2024-09-19 15:53:32,522 INFO [train.py:1198] (1/2) Epoch 39, batch 1650, loss[loss=0.2031, simple_loss=0.2669, pruned_loss=0.05075, ctc_loss=0.1123, cr_loss=0.3826, over 34386.00 frames. ], tot_loss[loss=0.2049, simple_loss=0.2622, pruned_loss=0.05435, ctc_loss=0.1166, cr_loss=0.39, over 6717088.55 frames. ], batch size: 103, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 15:53:47,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=695156.0, ans=0.2 2024-09-19 15:54:11,872 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.084e+02 2.516e+02 2.885e+02 3.323e+02 8.322e+02, threshold=5.771e+02, percent-clipped=5.0 2024-09-19 15:54:14,408 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.50 vs. limit=22.5 2024-09-19 15:54:46,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=695296.0, ans=0.125 2024-09-19 15:54:53,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=695342.6666666666, ans=0.0 2024-09-19 15:54:54,834 INFO [train.py:1198] (1/2) Epoch 39, batch 1700, loss[loss=0.1728, simple_loss=0.2293, pruned_loss=0.04219, ctc_loss=0.09277, cr_loss=0.3354, over 34278.00 frames. ], tot_loss[loss=0.2044, simple_loss=0.2617, pruned_loss=0.05413, ctc_loss=0.1162, cr_loss=0.3888, over 6743161.58 frames. ], batch size: 80, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 15:54:58,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=695342.6666666666, ans=0.125 2024-09-19 15:55:20,403 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 15:55:31,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=695436.0, ans=0.2 2024-09-19 15:55:46,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695482.6666666666, ans=0.1 2024-09-19 15:56:09,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=695529.3333333334, ans=0.2 2024-09-19 15:56:13,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.45 vs. limit=6.0 2024-09-19 15:56:16,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=695576.0, ans=0.125 2024-09-19 15:56:17,577 INFO [train.py:1198] (1/2) Epoch 39, batch 1750, loss[loss=0.1848, simple_loss=0.2364, pruned_loss=0.04883, ctc_loss=0.1067, cr_loss=0.3535, over 34091.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2611, pruned_loss=0.0539, ctc_loss=0.1159, cr_loss=0.3882, over 6753160.01 frames. ], batch size: 78, lr: 3.06e-03, grad_scale: 16.0 2024-09-19 15:56:28,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.58 vs. limit=6.0 2024-09-19 15:56:51,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=695622.6666666666, ans=0.2 2024-09-19 15:56:51,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.25 vs. limit=10.0 2024-09-19 15:56:52,840 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=695669.3333333334, ans=0.0 2024-09-19 15:56:56,870 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=12.0 2024-09-19 15:56:57,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=695669.3333333334, ans=0.0 2024-09-19 15:57:00,741 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.990e+02 2.638e+02 3.041e+02 3.896e+02 7.003e+02, threshold=6.081e+02, percent-clipped=2.0 2024-09-19 15:57:17,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=8.31 vs. limit=15.0 2024-09-19 15:57:31,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=695762.6666666666, ans=0.125 2024-09-19 15:57:43,177 INFO [train.py:1198] (1/2) Epoch 39, batch 1800, loss[loss=0.2132, simple_loss=0.2709, pruned_loss=0.05754, ctc_loss=0.1248, cr_loss=0.3835, over 34709.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2613, pruned_loss=0.054, ctc_loss=0.1161, cr_loss=0.3891, over 6755902.37 frames. ], batch size: 97, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 15:57:44,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.82 vs. limit=6.0 2024-09-19 15:57:51,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=695809.3333333334, ans=0.0 2024-09-19 15:58:03,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=695856.0, ans=0.125 2024-09-19 15:58:21,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=695902.6666666666, ans=0.125 2024-09-19 15:58:49,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=695996.0, ans=0.0 2024-09-19 15:58:57,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=695996.0, ans=0.125 2024-09-19 15:59:05,512 INFO [train.py:1198] (1/2) Epoch 39, batch 1850, loss[loss=0.1999, simple_loss=0.2643, pruned_loss=0.04962, ctc_loss=0.1086, cr_loss=0.3614, over 34460.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2612, pruned_loss=0.05408, ctc_loss=0.1163, cr_loss=0.3894, over 6762924.41 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 15:59:07,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=696042.6666666666, ans=0.125 2024-09-19 15:59:40,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=696136.0, ans=0.125 2024-09-19 15:59:44,848 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.104e+02 2.620e+02 3.224e+02 4.379e+02 7.363e+02, threshold=6.448e+02, percent-clipped=1.0 2024-09-19 15:59:46,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=696136.0, ans=0.125 2024-09-19 16:00:27,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=696229.3333333334, ans=0.125 2024-09-19 16:00:29,923 INFO [train.py:1198] (1/2) Epoch 39, batch 1900, loss[loss=0.2141, simple_loss=0.2724, pruned_loss=0.05762, ctc_loss=0.1239, cr_loss=0.3962, over 34363.00 frames. ], tot_loss[loss=0.2044, simple_loss=0.2617, pruned_loss=0.05413, ctc_loss=0.1163, cr_loss=0.3898, over 6772043.90 frames. ], batch size: 103, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 16:00:42,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=696276.0, ans=0.125 2024-09-19 16:00:47,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=696322.6666666666, ans=0.1 2024-09-19 16:00:53,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=696322.6666666666, ans=0.125 2024-09-19 16:00:56,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=696322.6666666666, ans=0.125 2024-09-19 16:01:37,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=696462.6666666666, ans=0.025 2024-09-19 16:01:54,063 INFO [train.py:1198] (1/2) Epoch 39, batch 1950, loss[loss=0.2162, simple_loss=0.268, pruned_loss=0.06077, ctc_loss=0.1281, cr_loss=0.4297, over 34339.00 frames. ], tot_loss[loss=0.2054, simple_loss=0.2629, pruned_loss=0.05448, ctc_loss=0.1169, cr_loss=0.3914, over 6789302.57 frames. ], batch size: 91, lr: 3.05e-03, grad_scale: 16.0 2024-09-19 16:02:11,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.10 vs. limit=15.0 2024-09-19 16:02:21,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=696556.0, ans=0.125 2024-09-19 16:02:33,979 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.118e+02 2.517e+02 2.776e+02 3.251e+02 4.322e+02, threshold=5.552e+02, percent-clipped=0.0 2024-09-19 16:02:37,745 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:02:44,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=696649.3333333334, ans=0.125 2024-09-19 16:02:45,957 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:02:59,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=696696.0, ans=0.0 2024-09-19 16:03:16,910 INFO [train.py:1198] (1/2) Epoch 39, batch 2000, loss[loss=0.1844, simple_loss=0.2363, pruned_loss=0.04877, ctc_loss=0.1042, cr_loss=0.3537, over 34225.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.2637, pruned_loss=0.05491, ctc_loss=0.1178, cr_loss=0.3929, over 6764850.15 frames. ], batch size: 78, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:04:04,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.69 vs. limit=22.5 2024-09-19 16:04:13,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=696882.6666666666, ans=0.125 2024-09-19 16:04:32,216 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.83 vs. limit=12.0 2024-09-19 16:04:42,757 INFO [train.py:1198] (1/2) Epoch 39, batch 2050, loss[loss=0.1717, simple_loss=0.2336, pruned_loss=0.03978, ctc_loss=0.0887, cr_loss=0.3126, over 34471.00 frames. ], tot_loss[loss=0.2058, simple_loss=0.2629, pruned_loss=0.05478, ctc_loss=0.1175, cr_loss=0.3918, over 6755688.00 frames. ], batch size: 82, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:05:12,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=697022.6666666666, ans=0.125 2024-09-19 16:05:17,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=697069.3333333334, ans=0.125 2024-09-19 16:05:19,744 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:05:22,434 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.538e+02 3.014e+02 3.788e+02 1.760e+03, threshold=6.028e+02, percent-clipped=6.0 2024-09-19 16:05:48,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=697162.6666666666, ans=0.015 2024-09-19 16:06:05,099 INFO [train.py:1198] (1/2) Epoch 39, batch 2100, loss[loss=0.2043, simple_loss=0.2592, pruned_loss=0.0552, ctc_loss=0.117, cr_loss=0.3891, over 34543.00 frames. ], tot_loss[loss=0.2051, simple_loss=0.2623, pruned_loss=0.05448, ctc_loss=0.1169, cr_loss=0.3908, over 6769309.96 frames. ], batch size: 94, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:06:05,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=697209.3333333334, ans=0.0 2024-09-19 16:06:18,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=697209.3333333334, ans=0.125 2024-09-19 16:07:05,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=697349.3333333334, ans=0.0 2024-09-19 16:07:10,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=697396.0, ans=0.2 2024-09-19 16:07:23,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=697396.0, ans=0.0 2024-09-19 16:07:26,656 INFO [train.py:1198] (1/2) Epoch 39, batch 2150, loss[loss=0.2041, simple_loss=0.2583, pruned_loss=0.05574, ctc_loss=0.1163, cr_loss=0.3807, over 34356.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2613, pruned_loss=0.05407, ctc_loss=0.1162, cr_loss=0.3895, over 6788126.36 frames. ], batch size: 91, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:07:26,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=697442.6666666666, ans=0.125 2024-09-19 16:07:30,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.26 vs. limit=15.0 2024-09-19 16:07:40,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=697442.6666666666, ans=0.1 2024-09-19 16:07:40,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=697442.6666666666, ans=0.125 2024-09-19 16:08:08,336 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.555e+02 3.169e+02 3.993e+02 6.341e+02, threshold=6.338e+02, percent-clipped=4.0 2024-09-19 16:08:48,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=697629.3333333334, ans=0.125 2024-09-19 16:08:51,988 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:08:53,263 INFO [train.py:1198] (1/2) Epoch 39, batch 2200, loss[loss=0.1983, simple_loss=0.2606, pruned_loss=0.04976, ctc_loss=0.1092, cr_loss=0.3664, over 34449.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.2617, pruned_loss=0.05427, ctc_loss=0.1166, cr_loss=0.3912, over 6783344.58 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:09:21,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=697722.6666666666, ans=0.1 2024-09-19 16:09:28,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.58 vs. limit=15.0 2024-09-19 16:09:41,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=697816.0, ans=0.1 2024-09-19 16:09:43,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=697816.0, ans=0.125 2024-09-19 16:10:06,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=697862.6666666666, ans=0.125 2024-09-19 16:10:15,967 INFO [train.py:1198] (1/2) Epoch 39, batch 2250, loss[loss=0.2128, simple_loss=0.2731, pruned_loss=0.05568, ctc_loss=0.1223, cr_loss=0.4171, over 34444.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.2617, pruned_loss=0.05426, ctc_loss=0.1166, cr_loss=0.3909, over 6779479.71 frames. ], batch size: 95, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:10:54,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=698002.6666666666, ans=0.0 2024-09-19 16:10:55,187 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.012e+02 2.581e+02 3.030e+02 3.671e+02 5.600e+02, threshold=6.060e+02, percent-clipped=0.0 2024-09-19 16:11:08,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=698049.3333333334, ans=0.0 2024-09-19 16:11:27,342 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=10.38 vs. limit=15.0 2024-09-19 16:11:39,881 INFO [train.py:1198] (1/2) Epoch 39, batch 2300, loss[loss=0.1799, simple_loss=0.24, pruned_loss=0.04375, ctc_loss=0.09519, cr_loss=0.3308, over 34299.00 frames. ], tot_loss[loss=0.2037, simple_loss=0.2609, pruned_loss=0.05388, ctc_loss=0.1159, cr_loss=0.3889, over 6764846.90 frames. ], batch size: 83, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:11:40,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=698142.6666666666, ans=0.07 2024-09-19 16:11:57,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=698189.3333333334, ans=0.125 2024-09-19 16:12:03,175 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:12:06,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=698189.3333333334, ans=0.0 2024-09-19 16:12:23,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=698236.0, ans=0.2 2024-09-19 16:12:27,012 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.88 vs. limit=10.0 2024-09-19 16:12:49,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=698329.3333333334, ans=0.125 2024-09-19 16:13:03,660 INFO [train.py:1198] (1/2) Epoch 39, batch 2350, loss[loss=0.2132, simple_loss=0.2727, pruned_loss=0.05618, ctc_loss=0.1227, cr_loss=0.4206, over 34716.00 frames. ], tot_loss[loss=0.2042, simple_loss=0.2614, pruned_loss=0.05406, ctc_loss=0.1162, cr_loss=0.3896, over 6771555.42 frames. ], batch size: 97, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:13:17,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=698376.0, ans=0.125 2024-09-19 16:13:20,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=698422.6666666666, ans=0.0 2024-09-19 16:13:20,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=698422.6666666666, ans=0.2 2024-09-19 16:13:40,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=698469.3333333334, ans=0.0 2024-09-19 16:13:43,004 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.021e+02 2.535e+02 2.855e+02 3.404e+02 5.834e+02, threshold=5.709e+02, percent-clipped=0.0 2024-09-19 16:13:51,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=698516.0, ans=0.0 2024-09-19 16:13:58,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=698516.0, ans=0.0 2024-09-19 16:14:11,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=15.0 2024-09-19 16:14:20,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=698562.6666666666, ans=0.125 2024-09-19 16:14:22,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=698562.6666666666, ans=0.2 2024-09-19 16:14:25,679 INFO [train.py:1198] (1/2) Epoch 39, batch 2400, loss[loss=0.2033, simple_loss=0.257, pruned_loss=0.05554, ctc_loss=0.1171, cr_loss=0.3803, over 34576.00 frames. ], tot_loss[loss=0.2044, simple_loss=0.2616, pruned_loss=0.05418, ctc_loss=0.1163, cr_loss=0.3901, over 6775258.90 frames. ], batch size: 89, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:14:41,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.80 vs. limit=22.5 2024-09-19 16:14:50,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=698656.0, ans=0.0 2024-09-19 16:15:16,574 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.12 vs. limit=15.0 2024-09-19 16:15:32,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=22.5 2024-09-19 16:15:33,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=698796.0, ans=0.0 2024-09-19 16:15:40,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=698796.0, ans=0.125 2024-09-19 16:15:51,970 INFO [train.py:1198] (1/2) Epoch 39, batch 2450, loss[loss=0.2072, simple_loss=0.2636, pruned_loss=0.05579, ctc_loss=0.1174, cr_loss=0.3926, over 34425.00 frames. ], tot_loss[loss=0.205, simple_loss=0.2624, pruned_loss=0.05432, ctc_loss=0.1167, cr_loss=0.3908, over 6752361.51 frames. ], batch size: 95, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:15:52,898 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.15 vs. limit=22.5 2024-09-19 16:15:55,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=698842.6666666666, ans=0.0 2024-09-19 16:16:15,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=698889.3333333334, ans=0.125 2024-09-19 16:16:16,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=698889.3333333334, ans=0.5 2024-09-19 16:16:21,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=698889.3333333334, ans=0.125 2024-09-19 16:16:30,988 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.107e+02 2.533e+02 3.118e+02 3.875e+02 5.975e+02, threshold=6.236e+02, percent-clipped=2.0 2024-09-19 16:16:36,448 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.65 vs. limit=15.0 2024-09-19 16:17:01,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2024-09-19 16:17:13,897 INFO [train.py:1198] (1/2) Epoch 39, batch 2500, loss[loss=0.211, simple_loss=0.2677, pruned_loss=0.05704, ctc_loss=0.1197, cr_loss=0.4044, over 34464.00 frames. ], tot_loss[loss=0.2052, simple_loss=0.2623, pruned_loss=0.0545, ctc_loss=0.117, cr_loss=0.3918, over 6763639.44 frames. ], batch size: 100, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:18:22,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=699262.6666666666, ans=0.0 2024-09-19 16:18:37,784 INFO [train.py:1198] (1/2) Epoch 39, batch 2550, loss[loss=0.1735, simple_loss=0.2298, pruned_loss=0.04256, ctc_loss=0.09506, cr_loss=0.3276, over 34130.00 frames. ], tot_loss[loss=0.2048, simple_loss=0.2621, pruned_loss=0.05427, ctc_loss=0.1165, cr_loss=0.3906, over 6765789.14 frames. ], batch size: 78, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:18:53,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=699356.0, ans=0.1 2024-09-19 16:19:10,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=699402.6666666666, ans=0.125 2024-09-19 16:19:17,102 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.477e+02 3.209e+02 3.899e+02 8.459e+02, threshold=6.418e+02, percent-clipped=5.0 2024-09-19 16:20:01,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.33 vs. limit=15.0 2024-09-19 16:20:02,032 INFO [train.py:1198] (1/2) Epoch 39, batch 2600, loss[loss=0.2113, simple_loss=0.2638, pruned_loss=0.05868, ctc_loss=0.1244, cr_loss=0.4127, over 34371.00 frames. ], tot_loss[loss=0.2055, simple_loss=0.2629, pruned_loss=0.05452, ctc_loss=0.117, cr_loss=0.3918, over 6760851.26 frames. ], batch size: 91, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:20:15,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=699542.6666666666, ans=15.0 2024-09-19 16:20:26,079 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.35 vs. limit=15.0 2024-09-19 16:20:51,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=699682.6666666666, ans=0.2 2024-09-19 16:20:53,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=699682.6666666666, ans=0.125 2024-09-19 16:20:54,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=699682.6666666666, ans=0.125 2024-09-19 16:21:02,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=699682.6666666666, ans=0.125 2024-09-19 16:21:02,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=699682.6666666666, ans=0.0 2024-09-19 16:21:09,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=699729.3333333334, ans=0.125 2024-09-19 16:21:20,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=699729.3333333334, ans=0.0 2024-09-19 16:21:23,480 INFO [train.py:1198] (1/2) Epoch 39, batch 2650, loss[loss=0.2107, simple_loss=0.2699, pruned_loss=0.05558, ctc_loss=0.1217, cr_loss=0.3991, over 34288.00 frames. ], tot_loss[loss=0.2054, simple_loss=0.2629, pruned_loss=0.05446, ctc_loss=0.1169, cr_loss=0.3917, over 6769291.35 frames. ], batch size: 117, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:22:02,419 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.087e+02 2.420e+02 2.777e+02 3.437e+02 4.959e+02, threshold=5.554e+02, percent-clipped=0.0 2024-09-19 16:22:07,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=699869.3333333334, ans=0.1 2024-09-19 16:22:22,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=699916.0, ans=0.125 2024-09-19 16:22:25,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=699916.0, ans=0.125 2024-09-19 16:22:30,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=699962.6666666666, ans=0.125 2024-09-19 16:22:47,168 INFO [train.py:1198] (1/2) Epoch 39, batch 2700, loss[loss=0.2111, simple_loss=0.2705, pruned_loss=0.05562, ctc_loss=0.1225, cr_loss=0.398, over 34620.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.263, pruned_loss=0.05455, ctc_loss=0.1169, cr_loss=0.3914, over 6763706.86 frames. ], batch size: 102, lr: 3.05e-03, grad_scale: 32.0 2024-09-19 16:22:47,696 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 16:22:54,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.66 vs. limit=15.0 2024-09-19 16:23:12,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=700056.0, ans=0.0 2024-09-19 16:23:31,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.68 vs. limit=15.0 2024-09-19 16:24:12,022 INFO [train.py:1198] (1/2) Epoch 39, batch 2750, loss[loss=0.2012, simple_loss=0.2514, pruned_loss=0.05554, ctc_loss=0.1179, cr_loss=0.407, over 34642.00 frames. ], tot_loss[loss=0.2044, simple_loss=0.2617, pruned_loss=0.05411, ctc_loss=0.1161, cr_loss=0.3902, over 6760905.32 frames. ], batch size: 88, lr: 3.04e-03, grad_scale: 32.0 2024-09-19 16:24:24,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=700242.6666666666, ans=0.2 2024-09-19 16:24:31,453 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.42 vs. limit=12.0 2024-09-19 16:24:33,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=700289.3333333334, ans=0.025 2024-09-19 16:24:37,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2024-09-19 16:24:51,304 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.596e+02 2.991e+02 3.825e+02 7.363e+02, threshold=5.982e+02, percent-clipped=5.0 2024-09-19 16:25:06,417 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=700382.6666666666, ans=0.125 2024-09-19 16:25:34,450 INFO [train.py:1198] (1/2) Epoch 39, batch 2800, loss[loss=0.2112, simple_loss=0.2682, pruned_loss=0.05668, ctc_loss=0.1238, cr_loss=0.4017, over 23361.00 frames. ], tot_loss[loss=0.2051, simple_loss=0.2621, pruned_loss=0.05451, ctc_loss=0.1168, cr_loss=0.3914, over 6739899.61 frames. ], batch size: 244, lr: 3.04e-03, grad_scale: 32.0 2024-09-19 16:25:36,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=700476.0, ans=0.125 2024-09-19 16:25:43,660 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.83 vs. limit=10.0 2024-09-19 16:26:12,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=700569.3333333334, ans=0.125 2024-09-19 16:26:16,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=700569.3333333334, ans=0.05 2024-09-19 16:26:58,343 INFO [train.py:1198] (1/2) Epoch 39, batch 2850, loss[loss=0.1974, simple_loss=0.2495, pruned_loss=0.05333, ctc_loss=0.1154, cr_loss=0.3854, over 34451.00 frames. ], tot_loss[loss=0.2058, simple_loss=0.2626, pruned_loss=0.05488, ctc_loss=0.1176, cr_loss=0.393, over 6722513.88 frames. ], batch size: 90, lr: 3.04e-03, grad_scale: 32.0 2024-09-19 16:27:14,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=700709.3333333334, ans=0.025 2024-09-19 16:27:24,562 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.09 vs. limit=15.0 2024-09-19 16:27:27,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=700756.0, ans=0.0 2024-09-19 16:27:41,842 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.175e+02 2.593e+02 2.955e+02 3.841e+02 6.908e+02, threshold=5.911e+02, percent-clipped=3.0 2024-09-19 16:27:53,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=700849.3333333334, ans=0.2 2024-09-19 16:28:08,741 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.87 vs. limit=15.0 2024-09-19 16:28:23,234 INFO [train.py:1198] (1/2) Epoch 39, batch 2900, loss[loss=0.1988, simple_loss=0.2588, pruned_loss=0.05088, ctc_loss=0.1115, cr_loss=0.3682, over 34503.00 frames. ], tot_loss[loss=0.2065, simple_loss=0.2635, pruned_loss=0.05508, ctc_loss=0.118, cr_loss=0.394, over 6753580.32 frames. ], batch size: 94, lr: 3.04e-03, grad_scale: 16.0 2024-09-19 16:28:23,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=700942.6666666666, ans=0.1 2024-09-19 16:28:33,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=700942.6666666666, ans=0.125 2024-09-19 16:28:59,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=701036.0, ans=0.125 2024-09-19 16:29:04,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=701036.0, ans=0.125 2024-09-19 16:29:29,559 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.07 vs. limit=22.5 2024-09-19 16:29:34,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.86 vs. limit=12.0 2024-09-19 16:29:39,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=701129.3333333334, ans=0.125 2024-09-19 16:29:45,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=701176.0, ans=0.025 2024-09-19 16:29:47,013 INFO [train.py:1198] (1/2) Epoch 39, batch 2950, loss[loss=0.1964, simple_loss=0.2475, pruned_loss=0.05327, ctc_loss=0.115, cr_loss=0.3934, over 34638.00 frames. ], tot_loss[loss=0.2052, simple_loss=0.2622, pruned_loss=0.05459, ctc_loss=0.117, cr_loss=0.3914, over 6749771.01 frames. ], batch size: 88, lr: 3.04e-03, grad_scale: 16.0 2024-09-19 16:29:47,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=701176.0, ans=0.125 2024-09-19 16:29:53,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=701176.0, ans=0.0 2024-09-19 16:30:02,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.58 vs. limit=15.0 2024-09-19 16:30:31,462 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.037e+02 2.491e+02 2.893e+02 3.717e+02 6.936e+02, threshold=5.786e+02, percent-clipped=4.0 2024-09-19 16:30:39,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=701316.0, ans=0.125 2024-09-19 16:30:59,082 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-19 16:31:11,302 INFO [train.py:1198] (1/2) Epoch 39, batch 3000, loss[loss=0.2014, simple_loss=0.2592, pruned_loss=0.05251, ctc_loss=0.1155, cr_loss=0.3852, over 34528.00 frames. ], tot_loss[loss=0.2045, simple_loss=0.2618, pruned_loss=0.05418, ctc_loss=0.1163, cr_loss=0.39, over 6751115.70 frames. ], batch size: 94, lr: 3.04e-03, grad_scale: 8.0 2024-09-19 16:31:11,302 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 16:31:28,147 INFO [train.py:1230] (1/2) Epoch 39, validation: loss=0.1488, simple_loss=0.2425, pruned_loss=0.02364, ctc_loss=0.03913, cr_loss=2.194e-14, over 944034.00 frames. 2024-09-19 16:31:28,147 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 16:31:56,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=701456.0, ans=0.125 2024-09-19 16:31:58,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=701456.0, ans=0.025 2024-09-19 16:32:29,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=701549.3333333334, ans=0.025 2024-09-19 16:32:29,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.60 vs. limit=15.0 2024-09-19 16:32:43,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=701596.0, ans=0.125 2024-09-19 16:32:49,821 INFO [train.py:1198] (1/2) Epoch 39, batch 3050, loss[loss=0.1958, simple_loss=0.2522, pruned_loss=0.05126, ctc_loss=0.1102, cr_loss=0.3722, over 34621.00 frames. ], tot_loss[loss=0.2052, simple_loss=0.2624, pruned_loss=0.05444, ctc_loss=0.1168, cr_loss=0.3913, over 6743784.98 frames. ], batch size: 89, lr: 3.04e-03, grad_scale: 8.0 2024-09-19 16:33:12,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=701689.3333333334, ans=0.1 2024-09-19 16:33:22,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=701736.0, ans=0.125 2024-09-19 16:33:23,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=701736.0, ans=0.1 2024-09-19 16:33:26,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.34 vs. limit=12.0 2024-09-19 16:33:33,322 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.149e+02 2.479e+02 2.733e+02 3.274e+02 7.701e+02, threshold=5.466e+02, percent-clipped=3.0 2024-09-19 16:33:52,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=701829.3333333334, ans=0.125 2024-09-19 16:33:55,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=701829.3333333334, ans=0.0 2024-09-19 16:34:11,782 INFO [train.py:1198] (1/2) Epoch 39, batch 3100, loss[loss=0.2201, simple_loss=0.2779, pruned_loss=0.05983, ctc_loss=0.1281, cr_loss=0.4277, over 34281.00 frames. ], tot_loss[loss=0.2052, simple_loss=0.2623, pruned_loss=0.05454, ctc_loss=0.1169, cr_loss=0.3917, over 6743052.83 frames. ], batch size: 117, lr: 3.04e-03, grad_scale: 8.0 2024-09-19 16:34:24,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=701876.0, ans=0.2 2024-09-19 16:34:28,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=701922.6666666666, ans=0.125 2024-09-19 16:34:33,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.99 vs. limit=12.0 2024-09-19 16:34:48,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.03 vs. limit=15.0 2024-09-19 16:34:51,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=701969.3333333334, ans=0.0 2024-09-19 16:34:51,448 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-09-19 16:35:09,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=702016.0, ans=0.0 2024-09-19 16:35:32,919 INFO [train.py:1198] (1/2) Epoch 39, batch 3150, loss[loss=0.2247, simple_loss=0.2804, pruned_loss=0.06282, ctc_loss=0.131, cr_loss=0.4288, over 33911.00 frames. ], tot_loss[loss=0.2052, simple_loss=0.2623, pruned_loss=0.05453, ctc_loss=0.1169, cr_loss=0.391, over 6747989.18 frames. ], batch size: 122, lr: 3.04e-03, grad_scale: 8.0 2024-09-19 16:35:39,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=702109.3333333334, ans=0.5 2024-09-19 16:35:50,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.96 vs. limit=15.0 2024-09-19 16:36:03,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=702202.6666666666, ans=0.0 2024-09-19 16:36:05,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=702202.6666666666, ans=0.125 2024-09-19 16:36:11,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=702202.6666666666, ans=0.0 2024-09-19 16:36:16,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.116e+02 2.691e+02 3.205e+02 3.988e+02 7.934e+02, threshold=6.409e+02, percent-clipped=10.0 2024-09-19 16:36:37,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=702296.0, ans=0.125 2024-09-19 16:36:40,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=702296.0, ans=0.0 2024-09-19 16:36:47,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=702296.0, ans=0.0 2024-09-19 16:36:55,351 INFO [train.py:1198] (1/2) Epoch 39, batch 3200, loss[loss=0.2003, simple_loss=0.2603, pruned_loss=0.05161, ctc_loss=0.1095, cr_loss=0.3783, over 34545.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.2617, pruned_loss=0.05427, ctc_loss=0.1164, cr_loss=0.3903, over 6761788.98 frames. ], batch size: 94, lr: 3.04e-03, grad_scale: 16.0 2024-09-19 16:36:58,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=702342.6666666666, ans=0.09899494936611666 2024-09-19 16:37:05,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=702342.6666666666, ans=0.1 2024-09-19 16:37:07,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=702342.6666666666, ans=0.125 2024-09-19 16:37:10,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=702389.3333333334, ans=0.125 2024-09-19 16:37:33,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=702436.0, ans=0.025 2024-09-19 16:37:34,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=702436.0, ans=0.025 2024-09-19 16:37:38,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=702436.0, ans=0.025 2024-09-19 16:37:52,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=702482.6666666666, ans=0.125 2024-09-19 16:38:08,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=702529.3333333334, ans=0.125 2024-09-19 16:38:16,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.40 vs. limit=10.0 2024-09-19 16:38:16,454 INFO [train.py:1198] (1/2) Epoch 39, batch 3250, loss[loss=0.2068, simple_loss=0.2648, pruned_loss=0.05471, ctc_loss=0.1179, cr_loss=0.3952, over 34639.00 frames. ], tot_loss[loss=0.205, simple_loss=0.2623, pruned_loss=0.05436, ctc_loss=0.1168, cr_loss=0.3909, over 6770335.72 frames. ], batch size: 98, lr: 3.04e-03, grad_scale: 16.0 2024-09-19 16:38:37,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=702622.6666666666, ans=0.125 2024-09-19 16:38:38,095 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.15 vs. limit=12.0 2024-09-19 16:38:43,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=702622.6666666666, ans=0.125 2024-09-19 16:38:46,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=702669.3333333334, ans=0.125 2024-09-19 16:38:50,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=12.0 2024-09-19 16:38:59,642 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.069e+02 2.416e+02 2.666e+02 3.265e+02 5.157e+02, threshold=5.332e+02, percent-clipped=0.0 2024-09-19 16:39:06,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=702716.0, ans=0.125 2024-09-19 16:39:14,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=702716.0, ans=0.0 2024-09-19 16:39:37,145 INFO [train.py:1198] (1/2) Epoch 39, batch 3300, loss[loss=0.2056, simple_loss=0.2695, pruned_loss=0.05204, ctc_loss=0.1135, cr_loss=0.3705, over 32940.00 frames. ], tot_loss[loss=0.2039, simple_loss=0.2611, pruned_loss=0.05395, ctc_loss=0.1159, cr_loss=0.3889, over 6767552.35 frames. ], batch size: 130, lr: 3.04e-03, grad_scale: 8.0 2024-09-19 16:39:44,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=702809.3333333334, ans=0.1 2024-09-19 16:40:01,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=702856.0, ans=0.1 2024-09-19 16:40:01,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=702856.0, ans=0.125 2024-09-19 16:40:01,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=702856.0, ans=0.0 2024-09-19 16:40:14,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=702902.6666666666, ans=0.125 2024-09-19 16:40:25,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=702949.3333333334, ans=0.0 2024-09-19 16:40:26,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=702949.3333333334, ans=10.0 2024-09-19 16:40:29,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.48 vs. limit=12.0 2024-09-19 16:40:33,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=702949.3333333334, ans=0.035 2024-09-19 16:40:39,139 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.84 vs. limit=22.5 2024-09-19 16:40:58,556 INFO [train.py:1198] (1/2) Epoch 39, batch 3350, loss[loss=0.2128, simple_loss=0.2714, pruned_loss=0.05671, ctc_loss=0.1231, cr_loss=0.4037, over 33736.00 frames. ], tot_loss[loss=0.2045, simple_loss=0.2617, pruned_loss=0.05418, ctc_loss=0.1163, cr_loss=0.3898, over 6741786.75 frames. ], batch size: 122, lr: 3.04e-03, grad_scale: 8.0 2024-09-19 16:41:10,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=703042.6666666666, ans=0.0 2024-09-19 16:41:29,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=703136.0, ans=0.125 2024-09-19 16:41:43,551 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.082e+02 2.479e+02 2.761e+02 3.307e+02 6.725e+02, threshold=5.521e+02, percent-clipped=1.0 2024-09-19 16:41:58,227 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=12.0 2024-09-19 16:42:20,090 INFO [train.py:1198] (1/2) Epoch 39, batch 3400, loss[loss=0.1785, simple_loss=0.2332, pruned_loss=0.0451, ctc_loss=0.09802, cr_loss=0.3491, over 34195.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.2616, pruned_loss=0.05436, ctc_loss=0.1168, cr_loss=0.3902, over 6732887.60 frames. ], batch size: 78, lr: 3.04e-03, grad_scale: 8.0 2024-09-19 16:42:23,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=703276.0, ans=0.2 2024-09-19 16:42:36,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=703322.6666666666, ans=0.125 2024-09-19 16:42:42,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=703322.6666666666, ans=0.0 2024-09-19 16:42:47,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=703322.6666666666, ans=0.125 2024-09-19 16:43:34,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=703462.6666666666, ans=0.125 2024-09-19 16:43:40,415 INFO [train.py:1198] (1/2) Epoch 39, batch 3450, loss[loss=0.2069, simple_loss=0.2693, pruned_loss=0.0524, ctc_loss=0.1176, cr_loss=0.406, over 32983.00 frames. ], tot_loss[loss=0.2048, simple_loss=0.262, pruned_loss=0.05431, ctc_loss=0.1167, cr_loss=0.3907, over 6745576.49 frames. ], batch size: 130, lr: 3.04e-03, grad_scale: 8.0 2024-09-19 16:43:47,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=703509.3333333334, ans=0.0 2024-09-19 16:43:52,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=703509.3333333334, ans=0.04949747468305833 2024-09-19 16:44:12,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=703602.6666666666, ans=0.125 2024-09-19 16:44:14,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=703602.6666666666, ans=0.0 2024-09-19 16:44:22,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=703602.6666666666, ans=0.0 2024-09-19 16:44:24,828 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.068e+02 2.571e+02 2.814e+02 3.409e+02 9.542e+02, threshold=5.629e+02, percent-clipped=4.0 2024-09-19 16:44:36,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2024-09-19 16:44:37,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=703649.3333333334, ans=0.125 2024-09-19 16:44:56,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=703696.0, ans=0.125 2024-09-19 16:45:01,546 INFO [train.py:1198] (1/2) Epoch 39, batch 3500, loss[loss=0.1797, simple_loss=0.2384, pruned_loss=0.04395, ctc_loss=0.09705, cr_loss=0.3422, over 34500.00 frames. ], tot_loss[loss=0.2042, simple_loss=0.2615, pruned_loss=0.05402, ctc_loss=0.1163, cr_loss=0.3898, over 6747869.52 frames. ], batch size: 85, lr: 3.04e-03, grad_scale: 8.0 2024-09-19 16:45:05,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=703742.6666666666, ans=0.125 2024-09-19 16:45:18,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2024-09-19 16:45:26,280 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.20 vs. limit=15.0 2024-09-19 16:45:40,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2024-09-19 16:46:09,718 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.30 vs. limit=15.0 2024-09-19 16:46:22,586 INFO [train.py:1198] (1/2) Epoch 39, batch 3550, loss[loss=0.2093, simple_loss=0.2718, pruned_loss=0.0535, ctc_loss=0.1185, cr_loss=0.4033, over 34363.00 frames. ], tot_loss[loss=0.204, simple_loss=0.2615, pruned_loss=0.05388, ctc_loss=0.116, cr_loss=0.3893, over 6757935.45 frames. ], batch size: 103, lr: 3.04e-03, grad_scale: 8.0 2024-09-19 16:46:25,088 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.04 vs. limit=12.0 2024-09-19 16:46:42,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=704022.6666666666, ans=15.0 2024-09-19 16:47:06,541 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.64 vs. limit=15.0 2024-09-19 16:47:07,191 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.156e+02 2.593e+02 3.005e+02 3.957e+02 6.392e+02, threshold=6.010e+02, percent-clipped=3.0 2024-09-19 16:47:09,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=704116.0, ans=0.2 2024-09-19 16:47:19,295 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.88 vs. limit=15.0 2024-09-19 16:47:21,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=704116.0, ans=0.125 2024-09-19 16:47:37,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=704162.6666666666, ans=0.125 2024-09-19 16:47:42,532 INFO [train.py:1198] (1/2) Epoch 39, batch 3600, loss[loss=0.1913, simple_loss=0.2468, pruned_loss=0.05018, ctc_loss=0.105, cr_loss=0.3596, over 34481.00 frames. ], tot_loss[loss=0.2042, simple_loss=0.2617, pruned_loss=0.05398, ctc_loss=0.1161, cr_loss=0.3896, over 6767103.96 frames. ], batch size: 90, lr: 3.04e-03, grad_scale: 16.0 2024-09-19 16:47:47,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.min_positive, batch_count=704209.3333333334, ans=0.025 2024-09-19 16:48:11,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=704256.0, ans=0.125 2024-09-19 16:48:28,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=704302.6666666666, ans=0.0 2024-09-19 16:48:29,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=15.0 2024-09-19 16:48:39,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=704349.3333333334, ans=0.125 2024-09-19 16:48:44,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=704349.3333333334, ans=0.035 2024-09-19 16:49:03,227 INFO [train.py:1198] (1/2) Epoch 39, batch 3650, loss[loss=0.2151, simple_loss=0.2747, pruned_loss=0.05699, ctc_loss=0.1234, cr_loss=0.4216, over 34476.00 frames. ], tot_loss[loss=0.2036, simple_loss=0.261, pruned_loss=0.05378, ctc_loss=0.1157, cr_loss=0.3886, over 6769436.57 frames. ], batch size: 110, lr: 3.04e-03, grad_scale: 16.0 2024-09-19 16:49:13,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=704442.6666666666, ans=0.0 2024-09-19 16:49:29,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=704489.3333333334, ans=0.125 2024-09-19 16:49:29,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=704489.3333333334, ans=0.0 2024-09-19 16:49:45,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=704536.0, ans=0.0 2024-09-19 16:49:47,907 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.178e+02 2.552e+02 3.076e+02 3.852e+02 1.652e+03, threshold=6.152e+02, percent-clipped=4.0 2024-09-19 16:50:24,257 INFO [train.py:1198] (1/2) Epoch 39, batch 3700, loss[loss=0.206, simple_loss=0.2709, pruned_loss=0.05191, ctc_loss=0.1117, cr_loss=0.3731, over 34609.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.261, pruned_loss=0.05352, ctc_loss=0.1153, cr_loss=0.3874, over 6784860.59 frames. ], batch size: 102, lr: 3.04e-03, grad_scale: 16.0 2024-09-19 16:50:27,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704676.0, ans=0.1 2024-09-19 16:50:36,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704676.0, ans=0.1 2024-09-19 16:50:56,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=704769.3333333334, ans=0.0 2024-09-19 16:50:58,584 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=704769.3333333334, ans=0.0 2024-09-19 16:51:42,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=704862.6666666666, ans=0.0 2024-09-19 16:51:45,376 INFO [train.py:1198] (1/2) Epoch 39, batch 3750, loss[loss=0.221, simple_loss=0.2778, pruned_loss=0.06045, ctc_loss=0.1308, cr_loss=0.43, over 34342.00 frames. ], tot_loss[loss=0.2064, simple_loss=0.2643, pruned_loss=0.05468, ctc_loss=0.1175, cr_loss=0.3934, over 6786108.72 frames. ], batch size: 113, lr: 3.03e-03, grad_scale: 16.0 2024-09-19 16:52:06,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=704956.0, ans=0.125 2024-09-19 16:52:26,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=705002.6666666666, ans=0.2 2024-09-19 16:52:30,502 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.101e+02 2.403e+02 2.602e+02 3.182e+02 5.853e+02, threshold=5.205e+02, percent-clipped=0.0 2024-09-19 16:53:06,836 INFO [train.py:1198] (1/2) Epoch 39, batch 3800, loss[loss=0.2202, simple_loss=0.2735, pruned_loss=0.06217, ctc_loss=0.1319, cr_loss=0.4023, over 29808.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.2665, pruned_loss=0.05594, ctc_loss=0.1199, cr_loss=0.3983, over 6674313.42 frames. ], batch size: 175, lr: 3.03e-03, grad_scale: 16.0 2024-09-19 16:54:00,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=705282.6666666666, ans=0.0 2024-09-19 16:54:02,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=705282.6666666666, ans=0.125 2024-09-19 16:54:08,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=705282.6666666666, ans=0.125 2024-09-19 16:54:28,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=705376.0, ans=0.125 2024-09-19 16:54:30,106 INFO [train.py:1198] (1/2) Epoch 39, batch 3850, loss[loss=0.2239, simple_loss=0.2768, pruned_loss=0.06432, ctc_loss=0.1342, cr_loss=0.3869, over 23154.00 frames. ], tot_loss[loss=0.2124, simple_loss=0.2686, pruned_loss=0.0577, ctc_loss=0.1234, cr_loss=0.4026, over 6251753.41 frames. ], batch size: 244, lr: 3.03e-03, grad_scale: 16.0 2024-09-19 16:54:41,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2024-09-19 16:54:43,981 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=705376.0, ans=0.025 2024-09-19 16:54:49,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=705422.6666666666, ans=0.1 2024-09-19 16:54:50,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=705422.6666666666, ans=0.025 2024-09-19 16:55:00,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=705422.6666666666, ans=0.0 2024-09-19 16:55:05,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.84 vs. limit=15.0 2024-09-19 16:56:00,198 INFO [train.py:1198] (1/2) Epoch 40, batch 0, loss[loss=0.1867, simple_loss=0.2443, pruned_loss=0.04688, ctc_loss=0.1026, cr_loss=0.3683, over 34475.00 frames. ], tot_loss[loss=0.1867, simple_loss=0.2443, pruned_loss=0.04688, ctc_loss=0.1026, cr_loss=0.3683, over 34475.00 frames. ], batch size: 85, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 16:56:00,199 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 16:56:07,799 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.2294, 4.0418, 4.0378, 3.9320], device='cuda:1') 2024-09-19 16:56:16,964 INFO [train.py:1230] (1/2) Epoch 40, validation: loss=0.1487, simple_loss=0.2434, pruned_loss=0.02306, ctc_loss=0.03907, cr_loss=2.295e-14, over 944034.00 frames. 2024-09-19 16:56:16,964 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 16:56:18,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=705497.3333333334, ans=0.1 2024-09-19 16:56:20,160 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.153e+02 2.627e+02 2.792e+02 3.081e+02 3.991e+02, threshold=5.584e+02, percent-clipped=0.0 2024-09-19 16:56:57,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=705590.6666666666, ans=0.0 2024-09-19 16:57:12,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705637.3333333334, ans=0.1 2024-09-19 16:57:35,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=705684.0, ans=0.0 2024-09-19 16:57:37,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=705684.0, ans=0.125 2024-09-19 16:57:40,008 INFO [train.py:1198] (1/2) Epoch 40, batch 50, loss[loss=0.1812, simple_loss=0.2368, pruned_loss=0.04559, ctc_loss=0.102, cr_loss=0.351, over 34486.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2617, pruned_loss=0.05358, ctc_loss=0.1153, cr_loss=0.3906, over 1482033.04 frames. ], batch size: 82, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 16:57:50,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.06 vs. limit=15.0 2024-09-19 16:57:55,873 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2024-09-19 16:58:13,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=705824.0, ans=0.0 2024-09-19 16:58:14,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=705824.0, ans=0.125 2024-09-19 16:58:26,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=705824.0, ans=0.125 2024-09-19 16:58:39,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=705870.6666666666, ans=0.125 2024-09-19 16:58:44,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=705870.6666666666, ans=0.125 2024-09-19 16:58:47,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=705917.3333333334, ans=0.0 2024-09-19 16:58:49,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=705917.3333333334, ans=0.125 2024-09-19 16:59:02,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=705917.3333333334, ans=0.125 2024-09-19 16:59:02,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=705917.3333333334, ans=0.125 2024-09-19 16:59:05,214 INFO [train.py:1198] (1/2) Epoch 40, batch 100, loss[loss=0.201, simple_loss=0.252, pruned_loss=0.05557, ctc_loss=0.1167, cr_loss=0.3853, over 34580.00 frames. ], tot_loss[loss=0.2072, simple_loss=0.2644, pruned_loss=0.05527, ctc_loss=0.1183, cr_loss=0.3963, over 2628968.41 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 16:59:08,452 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.073e+02 2.494e+02 2.770e+02 3.389e+02 5.508e+02, threshold=5.539e+02, percent-clipped=0.0 2024-09-19 16:59:26,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=706010.6666666666, ans=0.125 2024-09-19 17:00:09,061 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=1.033e-02 2024-09-19 17:00:12,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=706150.6666666666, ans=0.0 2024-09-19 17:00:12,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.30 vs. limit=15.0 2024-09-19 17:00:26,299 INFO [train.py:1198] (1/2) Epoch 40, batch 150, loss[loss=0.1747, simple_loss=0.2305, pruned_loss=0.04341, ctc_loss=0.09475, cr_loss=0.3261, over 34478.00 frames. ], tot_loss[loss=0.2053, simple_loss=0.2627, pruned_loss=0.05445, ctc_loss=0.1168, cr_loss=0.392, over 3556964.41 frames. ], batch size: 82, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 17:00:31,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=706197.3333333334, ans=0.125 2024-09-19 17:00:31,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=706197.3333333334, ans=0.125 2024-09-19 17:00:39,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=706197.3333333334, ans=0.2 2024-09-19 17:00:40,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=706197.3333333334, ans=10.0 2024-09-19 17:00:48,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=706244.0, ans=0.0 2024-09-19 17:00:53,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=706244.0, ans=0.125 2024-09-19 17:00:54,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=706244.0, ans=0.2 2024-09-19 17:00:55,270 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.53 vs. limit=15.0 2024-09-19 17:01:12,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=706290.6666666666, ans=0.125 2024-09-19 17:01:37,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=706384.0, ans=0.0 2024-09-19 17:01:45,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=706384.0, ans=0.0 2024-09-19 17:01:48,010 INFO [train.py:1198] (1/2) Epoch 40, batch 200, loss[loss=0.2275, simple_loss=0.2764, pruned_loss=0.06691, ctc_loss=0.1376, cr_loss=0.4302, over 32124.00 frames. ], tot_loss[loss=0.2048, simple_loss=0.262, pruned_loss=0.05432, ctc_loss=0.1165, cr_loss=0.3909, over 4271077.65 frames. ], batch size: 145, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 17:01:51,221 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.096e+02 2.508e+02 3.118e+02 4.341e+02 7.339e+02, threshold=6.236e+02, percent-clipped=10.0 2024-09-19 17:01:55,193 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2024-09-19 17:02:47,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706570.6666666666, ans=0.1 2024-09-19 17:02:54,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=706570.6666666666, ans=0.125 2024-09-19 17:02:54,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=706570.6666666666, ans=0.05 2024-09-19 17:03:06,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2024-09-19 17:03:13,825 INFO [train.py:1198] (1/2) Epoch 40, batch 250, loss[loss=0.2192, simple_loss=0.2749, pruned_loss=0.06052, ctc_loss=0.1278, cr_loss=0.4217, over 34255.00 frames. ], tot_loss[loss=0.2043, simple_loss=0.2615, pruned_loss=0.05413, ctc_loss=0.1161, cr_loss=0.3897, over 4832911.92 frames. ], batch size: 117, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 17:03:14,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=706664.0, ans=0.0 2024-09-19 17:03:15,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=706664.0, ans=0.0 2024-09-19 17:03:29,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706710.6666666666, ans=0.1 2024-09-19 17:03:29,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=706710.6666666666, ans=0.125 2024-09-19 17:03:33,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.17 vs. limit=15.0 2024-09-19 17:03:35,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=706710.6666666666, ans=0.2 2024-09-19 17:03:55,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=706757.3333333334, ans=0.125 2024-09-19 17:04:11,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=706804.0, ans=0.0 2024-09-19 17:04:33,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=706850.6666666666, ans=0.07 2024-09-19 17:04:36,005 INFO [train.py:1198] (1/2) Epoch 40, batch 300, loss[loss=0.227, simple_loss=0.2783, pruned_loss=0.06507, ctc_loss=0.1367, cr_loss=0.454, over 34335.00 frames. ], tot_loss[loss=0.2043, simple_loss=0.2615, pruned_loss=0.0541, ctc_loss=0.1162, cr_loss=0.39, over 5261040.28 frames. ], batch size: 107, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 17:04:37,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=706897.3333333334, ans=0.125 2024-09-19 17:04:39,254 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.222e+02 2.492e+02 3.054e+02 4.206e+02 8.531e+02, threshold=6.107e+02, percent-clipped=5.0 2024-09-19 17:04:55,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=706944.0, ans=0.125 2024-09-19 17:04:57,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=706944.0, ans=0.125 2024-09-19 17:04:59,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=706944.0, ans=0.125 2024-09-19 17:05:02,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=706944.0, ans=0.125 2024-09-19 17:05:18,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=706990.6666666666, ans=0.025 2024-09-19 17:05:20,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=706990.6666666666, ans=0.0 2024-09-19 17:05:28,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=707037.3333333334, ans=0.0 2024-09-19 17:05:30,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=707037.3333333334, ans=0.1 2024-09-19 17:05:40,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=707084.0, ans=0.125 2024-09-19 17:05:46,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=707084.0, ans=0.125 2024-09-19 17:06:00,107 INFO [train.py:1198] (1/2) Epoch 40, batch 350, loss[loss=0.1796, simple_loss=0.2357, pruned_loss=0.04527, ctc_loss=0.09695, cr_loss=0.3413, over 34251.00 frames. ], tot_loss[loss=0.2047, simple_loss=0.2618, pruned_loss=0.05432, ctc_loss=0.1165, cr_loss=0.3909, over 5595949.25 frames. ], batch size: 83, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 17:06:26,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=707177.3333333334, ans=0.125 2024-09-19 17:06:30,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=707177.3333333334, ans=0.125 2024-09-19 17:06:49,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=707270.6666666666, ans=0.125 2024-09-19 17:06:56,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=707270.6666666666, ans=0.125 2024-09-19 17:07:23,574 INFO [train.py:1198] (1/2) Epoch 40, batch 400, loss[loss=0.2152, simple_loss=0.2705, pruned_loss=0.05922, ctc_loss=0.1245, cr_loss=0.4148, over 34415.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2613, pruned_loss=0.05404, ctc_loss=0.116, cr_loss=0.3893, over 5862213.30 frames. ], batch size: 95, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 17:07:28,462 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.151e+02 2.494e+02 2.777e+02 3.541e+02 7.449e+02, threshold=5.555e+02, percent-clipped=1.0 2024-09-19 17:07:37,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.21 vs. limit=15.0 2024-09-19 17:07:40,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=707410.6666666666, ans=0.125 2024-09-19 17:07:45,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=707410.6666666666, ans=0.2 2024-09-19 17:07:53,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=707410.6666666666, ans=0.125 2024-09-19 17:08:08,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=707457.3333333334, ans=0.0 2024-09-19 17:08:11,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=707504.0, ans=0.125 2024-09-19 17:08:46,003 INFO [train.py:1198] (1/2) Epoch 40, batch 450, loss[loss=0.2148, simple_loss=0.2741, pruned_loss=0.05756, ctc_loss=0.1224, cr_loss=0.3934, over 34679.00 frames. ], tot_loss[loss=0.2039, simple_loss=0.2612, pruned_loss=0.05392, ctc_loss=0.1159, cr_loss=0.3892, over 6052223.02 frames. ], batch size: 97, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 17:09:07,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=707644.0, ans=0.0 2024-09-19 17:09:35,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=707737.3333333334, ans=0.2 2024-09-19 17:10:12,282 INFO [train.py:1198] (1/2) Epoch 40, batch 500, loss[loss=0.2235, simple_loss=0.2792, pruned_loss=0.06234, ctc_loss=0.1316, cr_loss=0.423, over 34391.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.2606, pruned_loss=0.05372, ctc_loss=0.1155, cr_loss=0.3884, over 6219196.11 frames. ], batch size: 110, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 17:10:17,147 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.189e+02 2.447e+02 2.736e+02 3.528e+02 5.495e+02, threshold=5.473e+02, percent-clipped=0.0 2024-09-19 17:10:45,877 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.96 vs. limit=15.0 2024-09-19 17:11:15,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=707970.6666666666, ans=0.0 2024-09-19 17:11:23,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=708017.3333333334, ans=0.0 2024-09-19 17:11:25,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=708017.3333333334, ans=0.125 2024-09-19 17:11:34,545 INFO [train.py:1198] (1/2) Epoch 40, batch 550, loss[loss=0.219, simple_loss=0.2781, pruned_loss=0.05909, ctc_loss=0.1264, cr_loss=0.4128, over 33904.00 frames. ], tot_loss[loss=0.2032, simple_loss=0.2606, pruned_loss=0.05364, ctc_loss=0.1154, cr_loss=0.3879, over 6327721.39 frames. ], batch size: 122, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 17:11:59,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=708110.6666666666, ans=0.0 2024-09-19 17:12:19,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=708157.3333333334, ans=0.025 2024-09-19 17:12:37,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=708204.0, ans=0.125 2024-09-19 17:12:57,053 INFO [train.py:1198] (1/2) Epoch 40, batch 600, loss[loss=0.2115, simple_loss=0.2709, pruned_loss=0.05593, ctc_loss=0.1199, cr_loss=0.4055, over 34297.00 frames. ], tot_loss[loss=0.2037, simple_loss=0.2609, pruned_loss=0.05386, ctc_loss=0.1158, cr_loss=0.3894, over 6430426.20 frames. ], batch size: 117, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 17:13:01,971 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.065e+02 2.548e+02 2.995e+02 3.513e+02 7.459e+02, threshold=5.990e+02, percent-clipped=4.0 2024-09-19 17:13:04,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=708297.3333333334, ans=0.125 2024-09-19 17:13:13,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=708344.0, ans=0.125 2024-09-19 17:13:50,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708437.3333333334, ans=0.1 2024-09-19 17:14:02,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.30 vs. limit=22.5 2024-09-19 17:14:06,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.43 vs. limit=15.0 2024-09-19 17:14:22,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2024-09-19 17:14:23,188 INFO [train.py:1198] (1/2) Epoch 40, batch 650, loss[loss=0.2149, simple_loss=0.2657, pruned_loss=0.0608, ctc_loss=0.1269, cr_loss=0.4288, over 34565.00 frames. ], tot_loss[loss=0.203, simple_loss=0.2603, pruned_loss=0.05357, ctc_loss=0.1152, cr_loss=0.3885, over 6521715.79 frames. ], batch size: 94, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 17:15:05,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.34 vs. limit=22.5 2024-09-19 17:15:39,528 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=708717.3333333334, ans=0.2 2024-09-19 17:15:41,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=708717.3333333334, ans=0.125 2024-09-19 17:15:45,724 INFO [train.py:1198] (1/2) Epoch 40, batch 700, loss[loss=0.201, simple_loss=0.255, pruned_loss=0.05409, ctc_loss=0.1169, cr_loss=0.3864, over 34584.00 frames. ], tot_loss[loss=0.2039, simple_loss=0.2612, pruned_loss=0.05393, ctc_loss=0.116, cr_loss=0.3902, over 6579717.71 frames. ], batch size: 89, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 17:15:50,658 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.150e+02 2.468e+02 2.803e+02 3.666e+02 7.905e+02, threshold=5.606e+02, percent-clipped=2.0 2024-09-19 17:15:52,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=708764.0, ans=0.125 2024-09-19 17:15:58,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.37 vs. limit=15.0 2024-09-19 17:16:03,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=708810.6666666666, ans=0.035 2024-09-19 17:16:04,730 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2024-09-19 17:16:14,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.30 vs. limit=22.5 2024-09-19 17:16:22,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=708857.3333333334, ans=0.125 2024-09-19 17:16:30,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=708857.3333333334, ans=0.125 2024-09-19 17:16:47,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=708904.0, ans=0.0 2024-09-19 17:17:08,022 INFO [train.py:1198] (1/2) Epoch 40, batch 750, loss[loss=0.2036, simple_loss=0.2617, pruned_loss=0.05359, ctc_loss=0.1157, cr_loss=0.3772, over 34398.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.2608, pruned_loss=0.0538, ctc_loss=0.1157, cr_loss=0.3895, over 6622191.03 frames. ], batch size: 95, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 17:17:21,332 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=708997.3333333334, ans=0.0 2024-09-19 17:17:26,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709044.0, ans=0.1 2024-09-19 17:17:45,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=709090.6666666666, ans=0.0 2024-09-19 17:17:55,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=709090.6666666666, ans=0.025 2024-09-19 17:18:00,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=709137.3333333334, ans=0.0 2024-09-19 17:18:11,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=709137.3333333334, ans=0.1 2024-09-19 17:18:19,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=709184.0, ans=0.125 2024-09-19 17:18:20,356 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.88 vs. limit=15.0 2024-09-19 17:18:29,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=709184.0, ans=0.0 2024-09-19 17:18:34,161 INFO [train.py:1198] (1/2) Epoch 40, batch 800, loss[loss=0.1799, simple_loss=0.2404, pruned_loss=0.04313, ctc_loss=0.09627, cr_loss=0.3459, over 34482.00 frames. ], tot_loss[loss=0.2037, simple_loss=0.2609, pruned_loss=0.05391, ctc_loss=0.1159, cr_loss=0.3901, over 6657313.65 frames. ], batch size: 85, lr: 2.99e-03, grad_scale: 32.0 2024-09-19 17:18:36,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2024-09-19 17:18:39,073 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.123e+02 2.541e+02 2.942e+02 3.492e+02 7.820e+02, threshold=5.885e+02, percent-clipped=1.0 2024-09-19 17:18:52,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=709277.3333333334, ans=0.125 2024-09-19 17:18:59,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=709277.3333333334, ans=0.125 2024-09-19 17:19:20,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=709324.0, ans=0.1 2024-09-19 17:19:33,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=709370.6666666666, ans=0.025 2024-09-19 17:19:38,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=709370.6666666666, ans=0.2 2024-09-19 17:20:02,239 INFO [train.py:1198] (1/2) Epoch 40, batch 850, loss[loss=0.2023, simple_loss=0.2648, pruned_loss=0.0514, ctc_loss=0.1114, cr_loss=0.3711, over 34370.00 frames. ], tot_loss[loss=0.2031, simple_loss=0.2603, pruned_loss=0.05364, ctc_loss=0.1155, cr_loss=0.3889, over 6690882.00 frames. ], batch size: 103, lr: 2.99e-03, grad_scale: 16.0 2024-09-19 17:20:12,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=709464.0, ans=0.125 2024-09-19 17:20:33,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=709557.3333333334, ans=0.0 2024-09-19 17:20:35,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=709557.3333333334, ans=0.125 2024-09-19 17:20:45,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2024-09-19 17:20:48,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=709557.3333333334, ans=0.0 2024-09-19 17:21:13,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=16.02 vs. limit=22.5 2024-09-19 17:21:26,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.56 vs. limit=15.0 2024-09-19 17:21:26,757 INFO [train.py:1198] (1/2) Epoch 40, batch 900, loss[loss=0.1762, simple_loss=0.2339, pruned_loss=0.04294, ctc_loss=0.09501, cr_loss=0.3385, over 34490.00 frames. ], tot_loss[loss=0.2034, simple_loss=0.2606, pruned_loss=0.05371, ctc_loss=0.1156, cr_loss=0.389, over 6698050.54 frames. ], batch size: 85, lr: 2.99e-03, grad_scale: 16.0 2024-09-19 17:21:34,962 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.000e+02 2.564e+02 2.810e+02 3.550e+02 5.722e+02, threshold=5.620e+02, percent-clipped=0.0 2024-09-19 17:21:53,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=709744.0, ans=0.125 2024-09-19 17:21:56,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=709744.0, ans=0.0 2024-09-19 17:22:16,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=709837.3333333334, ans=0.2 2024-09-19 17:22:19,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=709837.3333333334, ans=0.125 2024-09-19 17:22:23,751 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.71 vs. limit=15.0 2024-09-19 17:22:33,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=709884.0, ans=0.1 2024-09-19 17:22:34,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=709884.0, ans=0.2 2024-09-19 17:22:42,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=709884.0, ans=0.1 2024-09-19 17:22:50,609 INFO [train.py:1198] (1/2) Epoch 40, batch 950, loss[loss=0.1836, simple_loss=0.2434, pruned_loss=0.04504, ctc_loss=0.09942, cr_loss=0.3463, over 34719.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2611, pruned_loss=0.05389, ctc_loss=0.1159, cr_loss=0.3897, over 6700853.58 frames. ], batch size: 87, lr: 2.99e-03, grad_scale: 16.0 2024-09-19 17:22:57,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=709930.6666666666, ans=0.07 2024-09-19 17:23:05,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=709977.3333333334, ans=0.125 2024-09-19 17:23:05,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=709977.3333333334, ans=0.04949747468305833 2024-09-19 17:23:38,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=710070.6666666666, ans=0.125 2024-09-19 17:23:56,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=710117.3333333334, ans=0.125 2024-09-19 17:24:12,496 INFO [train.py:1198] (1/2) Epoch 40, batch 1000, loss[loss=0.1979, simple_loss=0.2535, pruned_loss=0.05246, ctc_loss=0.1123, cr_loss=0.372, over 34496.00 frames. ], tot_loss[loss=0.2044, simple_loss=0.2616, pruned_loss=0.05409, ctc_loss=0.1164, cr_loss=0.3909, over 6694898.92 frames. ], batch size: 90, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 17:24:12,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=710164.0, ans=0.125 2024-09-19 17:24:19,050 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.200e+02 2.772e+02 3.113e+02 4.135e+02 7.332e+02, threshold=6.226e+02, percent-clipped=3.0 2024-09-19 17:24:29,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=710210.6666666666, ans=0.0 2024-09-19 17:24:31,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=710210.6666666666, ans=0.0 2024-09-19 17:24:42,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=710210.6666666666, ans=0.125 2024-09-19 17:24:42,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=710210.6666666666, ans=0.125 2024-09-19 17:24:44,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=710257.3333333334, ans=0.07 2024-09-19 17:25:03,580 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.27 vs. limit=10.0 2024-09-19 17:25:09,236 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2024-09-19 17:25:13,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710304.0, ans=0.1 2024-09-19 17:25:16,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=710304.0, ans=0.125 2024-09-19 17:25:18,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=710304.0, ans=0.0 2024-09-19 17:25:34,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=710350.6666666666, ans=0.0 2024-09-19 17:25:37,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=710397.3333333334, ans=0.0 2024-09-19 17:25:39,152 INFO [train.py:1198] (1/2) Epoch 40, batch 1050, loss[loss=0.208, simple_loss=0.267, pruned_loss=0.05486, ctc_loss=0.1162, cr_loss=0.4006, over 34573.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2612, pruned_loss=0.05405, ctc_loss=0.1162, cr_loss=0.3905, over 6704358.85 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 17:25:46,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.62 vs. limit=15.0 2024-09-19 17:26:09,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=710444.0, ans=0.07 2024-09-19 17:26:13,464 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.24 vs. limit=15.0 2024-09-19 17:26:21,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=710490.6666666666, ans=0.2 2024-09-19 17:26:25,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.50 vs. limit=15.0 2024-09-19 17:26:29,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=710537.3333333334, ans=0.125 2024-09-19 17:27:00,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=710630.6666666666, ans=0.0 2024-09-19 17:27:01,622 INFO [train.py:1198] (1/2) Epoch 40, batch 1100, loss[loss=0.1965, simple_loss=0.2504, pruned_loss=0.05308, ctc_loss=0.1094, cr_loss=0.3648, over 34727.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2611, pruned_loss=0.05406, ctc_loss=0.1161, cr_loss=0.3905, over 6718050.19 frames. ], batch size: 92, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 17:27:08,282 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.484e+02 2.876e+02 3.654e+02 5.612e+02, threshold=5.752e+02, percent-clipped=0.0 2024-09-19 17:27:10,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=710630.6666666666, ans=0.0 2024-09-19 17:27:35,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710724.0, ans=0.1 2024-09-19 17:27:40,210 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:27:41,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=710724.0, ans=0.125 2024-09-19 17:27:47,314 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.60 vs. limit=15.0 2024-09-19 17:27:51,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710770.6666666666, ans=0.1 2024-09-19 17:27:53,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_na.min_abs, batch_count=710770.6666666666, ans=0.02 2024-09-19 17:28:23,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=710864.0, ans=0.025 2024-09-19 17:28:24,471 INFO [train.py:1198] (1/2) Epoch 40, batch 1150, loss[loss=0.1969, simple_loss=0.254, pruned_loss=0.05168, ctc_loss=0.1086, cr_loss=0.3666, over 34374.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2609, pruned_loss=0.05396, ctc_loss=0.1159, cr_loss=0.3893, over 6717455.15 frames. ], batch size: 91, lr: 2.98e-03, grad_scale: 16.0 2024-09-19 17:28:33,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=710864.0, ans=0.125 2024-09-19 17:28:35,595 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.98 vs. limit=6.0 2024-09-19 17:28:47,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=710910.6666666666, ans=0.0 2024-09-19 17:28:54,957 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.66 vs. limit=10.0 2024-09-19 17:29:21,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=711004.0, ans=0.125 2024-09-19 17:29:37,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.84 vs. limit=22.5 2024-09-19 17:29:37,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=711050.6666666666, ans=0.1 2024-09-19 17:29:42,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=711050.6666666666, ans=0.125 2024-09-19 17:29:50,533 INFO [train.py:1198] (1/2) Epoch 40, batch 1200, loss[loss=0.2072, simple_loss=0.2667, pruned_loss=0.05398, ctc_loss=0.1189, cr_loss=0.4009, over 34582.00 frames. ], tot_loss[loss=0.2043, simple_loss=0.2617, pruned_loss=0.05408, ctc_loss=0.1161, cr_loss=0.3898, over 6710328.81 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:29:52,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=711097.3333333334, ans=0.125 2024-09-19 17:29:57,057 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.110e+02 2.504e+02 2.798e+02 3.559e+02 5.513e+02, threshold=5.596e+02, percent-clipped=0.0 2024-09-19 17:30:02,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=711097.3333333334, ans=0.025 2024-09-19 17:30:05,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=711144.0, ans=0.125 2024-09-19 17:30:05,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=711144.0, ans=0.125 2024-09-19 17:30:18,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=711144.0, ans=0.125 2024-09-19 17:30:23,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=711190.6666666666, ans=0.125 2024-09-19 17:30:37,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=711190.6666666666, ans=0.0 2024-09-19 17:30:56,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.39 vs. limit=15.0 2024-09-19 17:31:13,270 INFO [train.py:1198] (1/2) Epoch 40, batch 1250, loss[loss=0.2137, simple_loss=0.2671, pruned_loss=0.05918, ctc_loss=0.1252, cr_loss=0.4238, over 34321.00 frames. ], tot_loss[loss=0.2049, simple_loss=0.2622, pruned_loss=0.05433, ctc_loss=0.1166, cr_loss=0.3905, over 6744306.95 frames. ], batch size: 107, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:31:25,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.18 vs. limit=22.5 2024-09-19 17:31:35,998 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.79 vs. limit=10.0 2024-09-19 17:31:40,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=711377.3333333334, ans=0.125 2024-09-19 17:31:42,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.87 vs. limit=15.0 2024-09-19 17:31:50,727 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.42 vs. limit=15.0 2024-09-19 17:32:07,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.07 vs. limit=15.0 2024-09-19 17:32:10,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=711470.6666666666, ans=0.1 2024-09-19 17:32:26,975 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:32:28,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=711517.3333333334, ans=0.0 2024-09-19 17:32:31,761 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:32:38,198 INFO [train.py:1198] (1/2) Epoch 40, batch 1300, loss[loss=0.2098, simple_loss=0.2696, pruned_loss=0.05502, ctc_loss=0.12, cr_loss=0.3956, over 33143.00 frames. ], tot_loss[loss=0.2045, simple_loss=0.2618, pruned_loss=0.05416, ctc_loss=0.1164, cr_loss=0.3902, over 6745065.98 frames. ], batch size: 130, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:32:44,846 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.140e+02 2.554e+02 2.894e+02 3.921e+02 1.168e+03, threshold=5.789e+02, percent-clipped=3.0 2024-09-19 17:33:05,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=711610.6666666666, ans=0.1 2024-09-19 17:33:41,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=711704.0, ans=0.0 2024-09-19 17:33:56,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.68 vs. limit=15.0 2024-09-19 17:34:02,602 INFO [train.py:1198] (1/2) Epoch 40, batch 1350, loss[loss=0.2019, simple_loss=0.2593, pruned_loss=0.05321, ctc_loss=0.113, cr_loss=0.3877, over 34519.00 frames. ], tot_loss[loss=0.204, simple_loss=0.2613, pruned_loss=0.05393, ctc_loss=0.1161, cr_loss=0.3894, over 6766474.79 frames. ], batch size: 94, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:34:13,534 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.02 vs. limit=15.0 2024-09-19 17:34:15,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=711797.3333333334, ans=0.125 2024-09-19 17:34:17,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=711844.0, ans=0.125 2024-09-19 17:34:31,355 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-09-19 17:34:37,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=711890.6666666666, ans=0.0 2024-09-19 17:34:42,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=711890.6666666666, ans=0.125 2024-09-19 17:34:42,644 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.63 vs. limit=15.0 2024-09-19 17:35:00,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=711937.3333333334, ans=0.025 2024-09-19 17:35:21,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=711984.0, ans=0.0 2024-09-19 17:35:24,832 INFO [train.py:1198] (1/2) Epoch 40, batch 1400, loss[loss=0.1721, simple_loss=0.2314, pruned_loss=0.04073, ctc_loss=0.09052, cr_loss=0.3337, over 34287.00 frames. ], tot_loss[loss=0.2031, simple_loss=0.2606, pruned_loss=0.05356, ctc_loss=0.1153, cr_loss=0.3876, over 6777441.57 frames. ], batch size: 80, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:35:25,323 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:35:28,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=712030.6666666666, ans=0.125 2024-09-19 17:35:31,394 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.051e+02 2.567e+02 3.178e+02 3.662e+02 7.079e+02, threshold=6.357e+02, percent-clipped=1.0 2024-09-19 17:35:42,192 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2024-09-19 17:35:45,414 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.96 vs. limit=15.0 2024-09-19 17:36:08,587 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.42 vs. limit=22.5 2024-09-19 17:36:18,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=712170.6666666666, ans=0.2 2024-09-19 17:36:28,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=712170.6666666666, ans=0.04949747468305833 2024-09-19 17:36:40,185 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:36:45,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=712217.3333333334, ans=0.2 2024-09-19 17:36:46,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=712217.3333333334, ans=0.125 2024-09-19 17:36:51,280 INFO [train.py:1198] (1/2) Epoch 40, batch 1450, loss[loss=0.2026, simple_loss=0.2635, pruned_loss=0.05165, ctc_loss=0.1146, cr_loss=0.388, over 34467.00 frames. ], tot_loss[loss=0.2036, simple_loss=0.2613, pruned_loss=0.05361, ctc_loss=0.1155, cr_loss=0.3887, over 6774766.38 frames. ], batch size: 110, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:37:01,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=712264.0, ans=0.2 2024-09-19 17:37:12,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=712310.6666666666, ans=0.125 2024-09-19 17:37:37,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=712357.3333333334, ans=0.125 2024-09-19 17:37:54,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=712404.0, ans=0.125 2024-09-19 17:38:07,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=712450.6666666666, ans=0.0 2024-09-19 17:38:12,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=712497.3333333334, ans=0.0 2024-09-19 17:38:13,596 INFO [train.py:1198] (1/2) Epoch 40, batch 1500, loss[loss=0.2123, simple_loss=0.2739, pruned_loss=0.05582, ctc_loss=0.1172, cr_loss=0.3903, over 34461.00 frames. ], tot_loss[loss=0.204, simple_loss=0.2618, pruned_loss=0.05371, ctc_loss=0.1159, cr_loss=0.3896, over 6773539.00 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:38:20,283 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.117e+02 2.484e+02 2.787e+02 3.340e+02 5.523e+02, threshold=5.575e+02, percent-clipped=0.0 2024-09-19 17:38:27,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=712497.3333333334, ans=0.125 2024-09-19 17:38:55,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=712590.6666666666, ans=0.125 2024-09-19 17:38:57,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712590.6666666666, ans=0.1 2024-09-19 17:39:03,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=712637.3333333334, ans=0.125 2024-09-19 17:39:21,891 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=5.045e-03 2024-09-19 17:39:26,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=712684.0, ans=0.1 2024-09-19 17:39:26,949 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:39:28,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=712684.0, ans=0.2 2024-09-19 17:39:30,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.13 vs. limit=15.0 2024-09-19 17:39:35,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=712730.6666666666, ans=0.125 2024-09-19 17:39:36,328 INFO [train.py:1198] (1/2) Epoch 40, batch 1550, loss[loss=0.2335, simple_loss=0.2896, pruned_loss=0.06631, ctc_loss=0.1372, cr_loss=0.4313, over 34428.00 frames. ], tot_loss[loss=0.2044, simple_loss=0.262, pruned_loss=0.05394, ctc_loss=0.1163, cr_loss=0.3904, over 6745938.93 frames. ], batch size: 105, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:39:36,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=712730.6666666666, ans=0.2 2024-09-19 17:39:43,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=712730.6666666666, ans=0.1 2024-09-19 17:39:53,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=712777.3333333334, ans=0.04949747468305833 2024-09-19 17:39:58,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=712777.3333333334, ans=0.125 2024-09-19 17:40:14,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=712824.0, ans=0.1 2024-09-19 17:40:21,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=712824.0, ans=0.125 2024-09-19 17:41:01,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=712964.0, ans=0.1 2024-09-19 17:41:02,435 INFO [train.py:1198] (1/2) Epoch 40, batch 1600, loss[loss=0.2031, simple_loss=0.2684, pruned_loss=0.05047, ctc_loss=0.1092, cr_loss=0.3772, over 34555.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.262, pruned_loss=0.0541, ctc_loss=0.1166, cr_loss=0.3912, over 6724027.71 frames. ], batch size: 99, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:41:08,969 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.123e+02 2.518e+02 2.979e+02 3.589e+02 5.736e+02, threshold=5.957e+02, percent-clipped=3.0 2024-09-19 17:41:30,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=713010.6666666666, ans=0.125 2024-09-19 17:41:37,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=713057.3333333334, ans=0.0 2024-09-19 17:41:42,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=713057.3333333334, ans=0.1 2024-09-19 17:41:49,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=713057.3333333334, ans=0.125 2024-09-19 17:42:15,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=713150.6666666666, ans=0.0 2024-09-19 17:42:25,378 INFO [train.py:1198] (1/2) Epoch 40, batch 1650, loss[loss=0.205, simple_loss=0.2672, pruned_loss=0.05232, ctc_loss=0.1128, cr_loss=0.3891, over 34350.00 frames. ], tot_loss[loss=0.2042, simple_loss=0.2618, pruned_loss=0.05389, ctc_loss=0.1161, cr_loss=0.3904, over 6717011.54 frames. ], batch size: 103, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:42:35,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=713197.3333333334, ans=0.125 2024-09-19 17:42:48,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=713244.0, ans=0.125 2024-09-19 17:42:50,321 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=713244.0, ans=0.0 2024-09-19 17:42:58,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=8.0 2024-09-19 17:43:46,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=713384.0, ans=0.125 2024-09-19 17:43:49,102 INFO [train.py:1198] (1/2) Epoch 40, batch 1700, loss[loss=0.1768, simple_loss=0.2338, pruned_loss=0.04364, ctc_loss=0.09508, cr_loss=0.3357, over 34300.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2614, pruned_loss=0.0537, ctc_loss=0.1157, cr_loss=0.3893, over 6743440.55 frames. ], batch size: 80, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:43:55,642 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.054e+02 2.482e+02 2.804e+02 3.630e+02 7.660e+02, threshold=5.608e+02, percent-clipped=2.0 2024-09-19 17:44:11,657 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-09-19 17:44:29,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.44 vs. limit=15.0 2024-09-19 17:45:00,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2024-09-19 17:45:05,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=713617.3333333334, ans=0.2 2024-09-19 17:45:13,166 INFO [train.py:1198] (1/2) Epoch 40, batch 1750, loss[loss=0.1832, simple_loss=0.2363, pruned_loss=0.04799, ctc_loss=0.1008, cr_loss=0.3518, over 34167.00 frames. ], tot_loss[loss=0.2034, simple_loss=0.2611, pruned_loss=0.05351, ctc_loss=0.1154, cr_loss=0.3882, over 6753154.50 frames. ], batch size: 78, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:45:33,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=713710.6666666666, ans=0.125 2024-09-19 17:45:36,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=713710.6666666666, ans=0.025 2024-09-19 17:46:08,012 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=713804.0, ans=0.125 2024-09-19 17:46:08,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=713804.0, ans=0.0 2024-09-19 17:46:35,386 INFO [train.py:1198] (1/2) Epoch 40, batch 1800, loss[loss=0.2021, simple_loss=0.2612, pruned_loss=0.05236, ctc_loss=0.1149, cr_loss=0.3825, over 34691.00 frames. ], tot_loss[loss=0.2036, simple_loss=0.2614, pruned_loss=0.05363, ctc_loss=0.1155, cr_loss=0.3881, over 6757186.74 frames. ], batch size: 97, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:46:41,985 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.056e+02 2.511e+02 2.951e+02 3.865e+02 6.627e+02, threshold=5.901e+02, percent-clipped=3.0 2024-09-19 17:48:02,147 INFO [train.py:1198] (1/2) Epoch 40, batch 1850, loss[loss=0.2089, simple_loss=0.2687, pruned_loss=0.05481, ctc_loss=0.1188, cr_loss=0.3932, over 34456.00 frames. ], tot_loss[loss=0.2034, simple_loss=0.2611, pruned_loss=0.05358, ctc_loss=0.1154, cr_loss=0.3882, over 6764526.35 frames. ], batch size: 100, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:48:42,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=714224.0, ans=0.025 2024-09-19 17:48:51,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=714270.6666666666, ans=0.125 2024-09-19 17:48:54,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten.whitening_limit, batch_count=714270.6666666666, ans=22.5 2024-09-19 17:48:55,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.67 vs. limit=15.0 2024-09-19 17:49:08,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=714317.3333333334, ans=0.0 2024-09-19 17:49:10,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=714317.3333333334, ans=0.125 2024-09-19 17:49:23,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=714364.0, ans=0.0 2024-09-19 17:49:24,324 INFO [train.py:1198] (1/2) Epoch 40, batch 1900, loss[loss=0.2127, simple_loss=0.2708, pruned_loss=0.05656, ctc_loss=0.1253, cr_loss=0.4083, over 34367.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2618, pruned_loss=0.05383, ctc_loss=0.1158, cr_loss=0.3894, over 6773304.71 frames. ], batch size: 103, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:49:27,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=714364.0, ans=0.125 2024-09-19 17:49:30,769 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.224e+02 2.567e+02 3.087e+02 3.971e+02 8.068e+02, threshold=6.175e+02, percent-clipped=5.0 2024-09-19 17:49:34,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=714364.0, ans=0.1 2024-09-19 17:49:57,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=714457.3333333334, ans=0.025 2024-09-19 17:50:04,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.97 vs. limit=15.0 2024-09-19 17:50:30,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=714550.6666666666, ans=0.125 2024-09-19 17:50:46,468 INFO [train.py:1198] (1/2) Epoch 40, batch 1950, loss[loss=0.2153, simple_loss=0.2669, pruned_loss=0.06097, ctc_loss=0.1256, cr_loss=0.4172, over 34345.00 frames. ], tot_loss[loss=0.2053, simple_loss=0.2631, pruned_loss=0.05426, ctc_loss=0.1165, cr_loss=0.3918, over 6790339.72 frames. ], batch size: 91, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:50:46,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=714597.3333333334, ans=0.125 2024-09-19 17:51:01,912 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:51:08,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=714644.0, ans=0.0 2024-09-19 17:51:15,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=8.24 vs. limit=15.0 2024-09-19 17:51:17,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.35 vs. limit=22.5 2024-09-19 17:51:21,723 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 17:51:24,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=714690.6666666666, ans=0.125 2024-09-19 17:52:11,729 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=714830.6666666666, ans=0.0 2024-09-19 17:52:13,025 INFO [train.py:1198] (1/2) Epoch 40, batch 2000, loss[loss=0.1796, simple_loss=0.2373, pruned_loss=0.04461, ctc_loss=0.09649, cr_loss=0.3335, over 34157.00 frames. ], tot_loss[loss=0.2061, simple_loss=0.2638, pruned_loss=0.05457, ctc_loss=0.1172, cr_loss=0.393, over 6765227.42 frames. ], batch size: 78, lr: 2.98e-03, grad_scale: 32.0 2024-09-19 17:52:19,676 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.100e+02 2.583e+02 2.887e+02 3.311e+02 5.813e+02, threshold=5.773e+02, percent-clipped=0.0 2024-09-19 17:52:30,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=714877.3333333334, ans=0.0 2024-09-19 17:52:41,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=714877.3333333334, ans=0.2 2024-09-19 17:52:43,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=714877.3333333334, ans=0.0 2024-09-19 17:52:45,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=22.5 2024-09-19 17:52:58,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=714924.0, ans=0.0 2024-09-19 17:53:01,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=714970.6666666666, ans=0.0 2024-09-19 17:53:12,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.62 vs. limit=22.5 2024-09-19 17:53:21,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=715017.3333333334, ans=10.0 2024-09-19 17:53:31,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=715017.3333333334, ans=0.1 2024-09-19 17:53:35,782 INFO [train.py:1198] (1/2) Epoch 40, batch 2050, loss[loss=0.1813, simple_loss=0.2389, pruned_loss=0.04471, ctc_loss=0.1015, cr_loss=0.3496, over 34471.00 frames. ], tot_loss[loss=0.2055, simple_loss=0.263, pruned_loss=0.05448, ctc_loss=0.117, cr_loss=0.3924, over 6757247.59 frames. ], batch size: 82, lr: 2.97e-03, grad_scale: 32.0 2024-09-19 17:53:50,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=715110.6666666666, ans=0.0 2024-09-19 17:53:55,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=715110.6666666666, ans=0.0 2024-09-19 17:54:23,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=715204.0, ans=0.0 2024-09-19 17:54:40,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=715250.6666666666, ans=0.125 2024-09-19 17:54:45,688 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.81 vs. limit=15.0 2024-09-19 17:55:00,009 INFO [train.py:1198] (1/2) Epoch 40, batch 2100, loss[loss=0.2041, simple_loss=0.2635, pruned_loss=0.05299, ctc_loss=0.1163, cr_loss=0.386, over 34537.00 frames. ], tot_loss[loss=0.2049, simple_loss=0.2624, pruned_loss=0.05421, ctc_loss=0.1164, cr_loss=0.3909, over 6772560.51 frames. ], batch size: 94, lr: 2.97e-03, grad_scale: 32.0 2024-09-19 17:55:06,497 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.204e+02 2.482e+02 3.245e+02 3.933e+02 7.721e+02, threshold=6.490e+02, percent-clipped=5.0 2024-09-19 17:55:40,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=715390.6666666666, ans=0.125 2024-09-19 17:55:48,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=715390.6666666666, ans=0.0 2024-09-19 17:56:03,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=715437.3333333334, ans=0.1 2024-09-19 17:56:24,145 INFO [train.py:1198] (1/2) Epoch 40, batch 2150, loss[loss=0.2067, simple_loss=0.2632, pruned_loss=0.05575, ctc_loss=0.1152, cr_loss=0.3919, over 34347.00 frames. ], tot_loss[loss=0.204, simple_loss=0.2615, pruned_loss=0.05386, ctc_loss=0.1158, cr_loss=0.3891, over 6790330.06 frames. ], batch size: 91, lr: 2.97e-03, grad_scale: 32.0 2024-09-19 17:56:32,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=715530.6666666666, ans=0.1 2024-09-19 17:56:37,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=715530.6666666666, ans=0.2 2024-09-19 17:56:41,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=715577.3333333334, ans=0.2 2024-09-19 17:56:57,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=715624.0, ans=0.125 2024-09-19 17:57:08,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=715624.0, ans=0.04949747468305833 2024-09-19 17:57:36,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=715717.3333333334, ans=0.0 2024-09-19 17:57:46,428 INFO [train.py:1198] (1/2) Epoch 40, batch 2200, loss[loss=0.2079, simple_loss=0.2657, pruned_loss=0.05543, ctc_loss=0.116, cr_loss=0.3995, over 34457.00 frames. ], tot_loss[loss=0.204, simple_loss=0.2614, pruned_loss=0.05394, ctc_loss=0.1159, cr_loss=0.3894, over 6784821.82 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 32.0 2024-09-19 17:57:52,857 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.150e+02 2.504e+02 2.872e+02 3.730e+02 6.558e+02, threshold=5.745e+02, percent-clipped=1.0 2024-09-19 17:58:01,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=715810.6666666666, ans=0.2 2024-09-19 17:58:53,394 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.48 vs. limit=15.0 2024-09-19 17:58:58,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.00 vs. limit=22.5 2024-09-19 17:59:10,982 INFO [train.py:1198] (1/2) Epoch 40, batch 2250, loss[loss=0.2161, simple_loss=0.2725, pruned_loss=0.05919, ctc_loss=0.1233, cr_loss=0.4164, over 34409.00 frames. ], tot_loss[loss=0.2036, simple_loss=0.261, pruned_loss=0.05373, ctc_loss=0.1155, cr_loss=0.3888, over 6782034.30 frames. ], batch size: 95, lr: 2.97e-03, grad_scale: 32.0 2024-09-19 17:59:41,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=716044.0, ans=0.09899494936611666 2024-09-19 17:59:47,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=716090.6666666666, ans=0.125 2024-09-19 18:00:09,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=716137.3333333334, ans=0.1 2024-09-19 18:00:28,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=716184.0, ans=0.025 2024-09-19 18:00:35,220 INFO [train.py:1198] (1/2) Epoch 40, batch 2300, loss[loss=0.1927, simple_loss=0.2441, pruned_loss=0.05183, ctc_loss=0.1115, cr_loss=0.3838, over 34294.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2599, pruned_loss=0.05326, ctc_loss=0.1147, cr_loss=0.3869, over 6768199.56 frames. ], batch size: 83, lr: 2.97e-03, grad_scale: 32.0 2024-09-19 18:00:41,757 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.188e+02 2.629e+02 3.058e+02 3.784e+02 5.960e+02, threshold=6.117e+02, percent-clipped=3.0 2024-09-19 18:00:42,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.12 vs. limit=15.0 2024-09-19 18:01:10,500 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.61 vs. limit=15.0 2024-09-19 18:01:23,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=716370.6666666666, ans=10.0 2024-09-19 18:01:28,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=716370.6666666666, ans=0.1 2024-09-19 18:01:28,074 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=716370.6666666666, ans=0.0 2024-09-19 18:01:44,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=716417.3333333334, ans=0.125 2024-09-19 18:01:57,285 INFO [train.py:1198] (1/2) Epoch 40, batch 2350, loss[loss=0.2149, simple_loss=0.2728, pruned_loss=0.05784, ctc_loss=0.123, cr_loss=0.4183, over 34708.00 frames. ], tot_loss[loss=0.203, simple_loss=0.2604, pruned_loss=0.05354, ctc_loss=0.1152, cr_loss=0.3885, over 6775648.08 frames. ], batch size: 97, lr: 2.97e-03, grad_scale: 32.0 2024-09-19 18:02:13,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=716510.6666666666, ans=0.125 2024-09-19 18:02:22,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=716510.6666666666, ans=0.125 2024-09-19 18:02:37,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=716557.3333333334, ans=0.125 2024-09-19 18:02:43,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=716557.3333333334, ans=0.2 2024-09-19 18:03:23,434 INFO [train.py:1198] (1/2) Epoch 40, batch 2400, loss[loss=0.1959, simple_loss=0.2505, pruned_loss=0.0513, ctc_loss=0.1138, cr_loss=0.3993, over 34606.00 frames. ], tot_loss[loss=0.2034, simple_loss=0.2607, pruned_loss=0.05372, ctc_loss=0.1156, cr_loss=0.3892, over 6779187.38 frames. ], batch size: 89, lr: 2.97e-03, grad_scale: 32.0 2024-09-19 18:03:29,910 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.162e+02 2.558e+02 2.834e+02 3.684e+02 6.203e+02, threshold=5.668e+02, percent-clipped=1.0 2024-09-19 18:04:13,796 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=716837.3333333334, ans=0.1 2024-09-19 18:04:21,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=716837.3333333334, ans=0.025 2024-09-19 18:04:28,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=716884.0, ans=0.125 2024-09-19 18:04:45,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=716930.6666666666, ans=0.1 2024-09-19 18:04:46,335 INFO [train.py:1198] (1/2) Epoch 40, batch 2450, loss[loss=0.198, simple_loss=0.2593, pruned_loss=0.05054, ctc_loss=0.108, cr_loss=0.3523, over 34394.00 frames. ], tot_loss[loss=0.204, simple_loss=0.2614, pruned_loss=0.05387, ctc_loss=0.1159, cr_loss=0.3902, over 6751754.11 frames. ], batch size: 95, lr: 2.97e-03, grad_scale: 32.0 2024-09-19 18:05:09,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=716977.3333333334, ans=0.1 2024-09-19 18:05:21,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=717024.0, ans=0.125 2024-09-19 18:05:30,090 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.05 vs. limit=22.5 2024-09-19 18:05:54,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=717117.3333333334, ans=0.0 2024-09-19 18:06:10,417 INFO [train.py:1198] (1/2) Epoch 40, batch 2500, loss[loss=0.2019, simple_loss=0.2678, pruned_loss=0.04934, ctc_loss=0.1108, cr_loss=0.3775, over 34456.00 frames. ], tot_loss[loss=0.2047, simple_loss=0.2621, pruned_loss=0.05423, ctc_loss=0.1165, cr_loss=0.3913, over 6762150.39 frames. ], batch size: 100, lr: 2.97e-03, grad_scale: 32.0 2024-09-19 18:06:16,997 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.095e+02 2.575e+02 3.017e+02 4.007e+02 5.627e+02, threshold=6.034e+02, percent-clipped=0.0 2024-09-19 18:06:22,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=717164.0, ans=0.1 2024-09-19 18:06:27,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2024-09-19 18:06:50,922 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2024-09-19 18:07:28,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=717350.6666666666, ans=6.0 2024-09-19 18:07:34,429 INFO [train.py:1198] (1/2) Epoch 40, batch 2550, loss[loss=0.1751, simple_loss=0.2295, pruned_loss=0.044, ctc_loss=0.09659, cr_loss=0.3341, over 34147.00 frames. ], tot_loss[loss=0.2045, simple_loss=0.2619, pruned_loss=0.05408, ctc_loss=0.1163, cr_loss=0.3911, over 6766779.09 frames. ], batch size: 78, lr: 2.97e-03, grad_scale: 32.0 2024-09-19 18:07:51,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=717444.0, ans=0.125 2024-09-19 18:08:11,365 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=15.0 2024-09-19 18:08:15,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=717490.6666666666, ans=0.1 2024-09-19 18:08:28,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=717537.3333333334, ans=0.125 2024-09-19 18:08:46,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=717584.0, ans=0.2 2024-09-19 18:08:56,335 INFO [train.py:1198] (1/2) Epoch 40, batch 2600, loss[loss=0.198, simple_loss=0.2532, pruned_loss=0.05288, ctc_loss=0.1106, cr_loss=0.3715, over 34347.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.2622, pruned_loss=0.05408, ctc_loss=0.1163, cr_loss=0.3906, over 6761538.87 frames. ], batch size: 91, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 18:09:04,577 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.574e+02 2.959e+02 3.723e+02 6.492e+02, threshold=5.919e+02, percent-clipped=4.0 2024-09-19 18:09:10,400 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.14 vs. limit=15.0 2024-09-19 18:09:31,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2024-09-19 18:09:58,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=717770.6666666666, ans=0.2 2024-09-19 18:09:59,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=717770.6666666666, ans=0.125 2024-09-19 18:10:09,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=717817.3333333334, ans=0.0 2024-09-19 18:10:20,669 INFO [train.py:1198] (1/2) Epoch 40, batch 2650, loss[loss=0.213, simple_loss=0.2731, pruned_loss=0.05677, ctc_loss=0.1205, cr_loss=0.3828, over 34225.00 frames. ], tot_loss[loss=0.2048, simple_loss=0.2625, pruned_loss=0.05407, ctc_loss=0.1163, cr_loss=0.3911, over 6769036.31 frames. ], batch size: 117, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 18:10:30,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=717864.0, ans=0.125 2024-09-19 18:10:46,068 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.07 vs. limit=10.0 2024-09-19 18:10:50,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=717910.6666666666, ans=0.125 2024-09-19 18:10:55,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=717957.3333333334, ans=0.125 2024-09-19 18:11:05,670 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:11:44,765 INFO [train.py:1198] (1/2) Epoch 40, batch 2700, loss[loss=0.2122, simple_loss=0.2754, pruned_loss=0.05437, ctc_loss=0.1207, cr_loss=0.4014, over 34631.00 frames. ], tot_loss[loss=0.2051, simple_loss=0.2627, pruned_loss=0.05423, ctc_loss=0.1167, cr_loss=0.392, over 6764986.03 frames. ], batch size: 102, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 18:11:54,655 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.101e+02 2.468e+02 2.774e+02 3.361e+02 5.837e+02, threshold=5.549e+02, percent-clipped=0.0 2024-09-19 18:12:01,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=718144.0, ans=0.035 2024-09-19 18:12:36,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=718237.3333333334, ans=0.0 2024-09-19 18:12:44,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=718237.3333333334, ans=0.025 2024-09-19 18:12:50,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=718284.0, ans=0.04949747468305833 2024-09-19 18:12:52,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=718284.0, ans=0.125 2024-09-19 18:12:59,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=718284.0, ans=0.0 2024-09-19 18:13:07,072 INFO [train.py:1198] (1/2) Epoch 40, batch 2750, loss[loss=0.1902, simple_loss=0.2505, pruned_loss=0.04737, ctc_loss=0.1043, cr_loss=0.3582, over 34621.00 frames. ], tot_loss[loss=0.2036, simple_loss=0.2612, pruned_loss=0.05364, ctc_loss=0.1157, cr_loss=0.3895, over 6763164.80 frames. ], batch size: 88, lr: 2.97e-03, grad_scale: 8.0 2024-09-19 18:13:09,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=718330.6666666666, ans=0.09899494936611666 2024-09-19 18:13:50,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=718424.0, ans=0.125 2024-09-19 18:13:53,484 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=718424.0, ans=0.2 2024-09-19 18:14:05,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=718470.6666666666, ans=0.0 2024-09-19 18:14:33,331 INFO [train.py:1198] (1/2) Epoch 40, batch 2800, loss[loss=0.2351, simple_loss=0.2846, pruned_loss=0.06949, ctc_loss=0.1469, cr_loss=0.4346, over 23793.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2614, pruned_loss=0.05375, ctc_loss=0.1159, cr_loss=0.3895, over 6742360.56 frames. ], batch size: 244, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 18:14:33,585 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:14:43,310 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.110e+02 2.665e+02 3.113e+02 3.826e+02 6.936e+02, threshold=6.226e+02, percent-clipped=2.0 2024-09-19 18:15:01,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=718610.6666666666, ans=0.0 2024-09-19 18:15:09,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.03 vs. limit=15.0 2024-09-19 18:15:12,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=718657.3333333334, ans=0.125 2024-09-19 18:15:17,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=718657.3333333334, ans=0.09899494936611666 2024-09-19 18:15:56,077 INFO [train.py:1198] (1/2) Epoch 40, batch 2850, loss[loss=0.207, simple_loss=0.2624, pruned_loss=0.05611, ctc_loss=0.1192, cr_loss=0.3872, over 34492.00 frames. ], tot_loss[loss=0.2044, simple_loss=0.2619, pruned_loss=0.05404, ctc_loss=0.1164, cr_loss=0.3905, over 6725199.00 frames. ], batch size: 90, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 18:16:08,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=718797.3333333334, ans=0.2 2024-09-19 18:16:49,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=718937.3333333334, ans=0.125 2024-09-19 18:17:15,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=718984.0, ans=0.125 2024-09-19 18:17:20,012 INFO [train.py:1198] (1/2) Epoch 40, batch 2900, loss[loss=0.1984, simple_loss=0.2596, pruned_loss=0.05028, ctc_loss=0.1092, cr_loss=0.3701, over 34543.00 frames. ], tot_loss[loss=0.2052, simple_loss=0.2627, pruned_loss=0.05432, ctc_loss=0.1169, cr_loss=0.3921, over 6755519.92 frames. ], batch size: 94, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 18:17:29,958 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.107e+02 2.563e+02 3.047e+02 4.260e+02 6.159e+02, threshold=6.093e+02, percent-clipped=0.0 2024-09-19 18:17:32,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=719030.6666666666, ans=0.0 2024-09-19 18:17:40,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=719077.3333333334, ans=0.125 2024-09-19 18:17:50,443 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:18:44,435 INFO [train.py:1198] (1/2) Epoch 40, batch 2950, loss[loss=0.2046, simple_loss=0.2594, pruned_loss=0.05516, ctc_loss=0.1189, cr_loss=0.3946, over 34647.00 frames. ], tot_loss[loss=0.2039, simple_loss=0.2614, pruned_loss=0.05384, ctc_loss=0.116, cr_loss=0.3897, over 6749687.29 frames. ], batch size: 88, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 18:18:54,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=719264.0, ans=0.035 2024-09-19 18:18:57,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=719264.0, ans=0.125 2024-09-19 18:19:01,158 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:19:14,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=719310.6666666666, ans=0.0 2024-09-19 18:19:17,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=719357.3333333334, ans=0.125 2024-09-19 18:19:21,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=15.0 2024-09-19 18:19:32,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=719404.0, ans=0.125 2024-09-19 18:19:35,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=719404.0, ans=0.125 2024-09-19 18:19:40,700 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=15.0 2024-09-19 18:19:55,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.62 vs. limit=15.0 2024-09-19 18:20:06,373 INFO [train.py:1198] (1/2) Epoch 40, batch 3000, loss[loss=0.2057, simple_loss=0.2648, pruned_loss=0.05374, ctc_loss=0.1146, cr_loss=0.404, over 34528.00 frames. ], tot_loss[loss=0.2037, simple_loss=0.2612, pruned_loss=0.05372, ctc_loss=0.1158, cr_loss=0.3901, over 6749439.79 frames. ], batch size: 94, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 18:20:06,373 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 18:20:23,161 INFO [train.py:1230] (1/2) Epoch 40, validation: loss=0.1486, simple_loss=0.2423, pruned_loss=0.02352, ctc_loss=0.03908, cr_loss=2.2e-14, over 944034.00 frames. 2024-09-19 18:20:23,161 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 18:20:29,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.17 vs. limit=15.0 2024-09-19 18:20:34,819 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.063e+02 2.533e+02 2.994e+02 3.741e+02 5.520e+02, threshold=5.988e+02, percent-clipped=0.0 2024-09-19 18:20:39,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=719544.0, ans=0.125 2024-09-19 18:20:40,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=719544.0, ans=0.025 2024-09-19 18:20:41,761 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:20:57,809 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:21:01,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=719590.6666666666, ans=0.125 2024-09-19 18:21:06,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=719590.6666666666, ans=0.02 2024-09-19 18:21:09,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=22.5 2024-09-19 18:21:20,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=719637.3333333334, ans=0.0 2024-09-19 18:21:46,374 INFO [train.py:1198] (1/2) Epoch 40, batch 3050, loss[loss=0.1956, simple_loss=0.247, pruned_loss=0.05282, ctc_loss=0.1152, cr_loss=0.386, over 34593.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.262, pruned_loss=0.05409, ctc_loss=0.1164, cr_loss=0.392, over 6742117.44 frames. ], batch size: 89, lr: 2.97e-03, grad_scale: 16.0 2024-09-19 18:21:56,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=719730.6666666666, ans=0.125 2024-09-19 18:22:00,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=719730.6666666666, ans=0.07 2024-09-19 18:22:12,876 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:22:16,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=719777.3333333334, ans=0.0 2024-09-19 18:22:20,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=719824.0, ans=0.2 2024-09-19 18:22:20,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=719824.0, ans=0.125 2024-09-19 18:22:30,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=719824.0, ans=0.025 2024-09-19 18:23:08,617 INFO [train.py:1198] (1/2) Epoch 40, batch 3100, loss[loss=0.217, simple_loss=0.2752, pruned_loss=0.05861, ctc_loss=0.1265, cr_loss=0.4079, over 34269.00 frames. ], tot_loss[loss=0.2047, simple_loss=0.2621, pruned_loss=0.05417, ctc_loss=0.1165, cr_loss=0.3918, over 6741763.30 frames. ], batch size: 117, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 18:23:13,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=719964.0, ans=0.2 2024-09-19 18:23:18,415 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.163e+02 2.484e+02 2.822e+02 3.409e+02 8.613e+02, threshold=5.645e+02, percent-clipped=2.0 2024-09-19 18:23:41,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.70 vs. limit=15.0 2024-09-19 18:23:47,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=720057.3333333334, ans=0.125 2024-09-19 18:24:03,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=720104.0, ans=0.125 2024-09-19 18:24:10,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=720104.0, ans=0.95 2024-09-19 18:24:12,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=720150.6666666666, ans=0.2 2024-09-19 18:24:19,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=720150.6666666666, ans=0.125 2024-09-19 18:24:23,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=720150.6666666666, ans=0.0 2024-09-19 18:24:29,564 INFO [train.py:1198] (1/2) Epoch 40, batch 3150, loss[loss=0.214, simple_loss=0.2735, pruned_loss=0.0566, ctc_loss=0.1239, cr_loss=0.4153, over 33815.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.2619, pruned_loss=0.05413, ctc_loss=0.1165, cr_loss=0.3918, over 6747692.32 frames. ], batch size: 122, lr: 2.96e-03, grad_scale: 16.0 2024-09-19 18:24:51,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=720244.0, ans=0.2 2024-09-19 18:25:10,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=720290.6666666666, ans=0.025 2024-09-19 18:25:15,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=720290.6666666666, ans=0.125 2024-09-19 18:25:20,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=720337.3333333334, ans=0.125 2024-09-19 18:25:28,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=720337.3333333334, ans=0.125 2024-09-19 18:25:30,588 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.84 vs. limit=10.0 2024-09-19 18:25:34,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=720384.0, ans=0.1 2024-09-19 18:25:50,277 INFO [train.py:1198] (1/2) Epoch 40, batch 3200, loss[loss=0.1957, simple_loss=0.256, pruned_loss=0.04931, ctc_loss=0.1089, cr_loss=0.3759, over 34534.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2615, pruned_loss=0.05397, ctc_loss=0.1161, cr_loss=0.3906, over 6760690.06 frames. ], batch size: 94, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 18:25:56,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=15.0 2024-09-19 18:26:00,079 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.081e+02 2.569e+02 3.020e+02 3.542e+02 1.540e+03, threshold=6.041e+02, percent-clipped=1.0 2024-09-19 18:26:42,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=720570.6666666666, ans=0.125 2024-09-19 18:26:46,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=720570.6666666666, ans=0.05 2024-09-19 18:27:11,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=720664.0, ans=0.1 2024-09-19 18:27:12,695 INFO [train.py:1198] (1/2) Epoch 40, batch 3250, loss[loss=0.222, simple_loss=0.2804, pruned_loss=0.06058, ctc_loss=0.1294, cr_loss=0.4135, over 34664.00 frames. ], tot_loss[loss=0.2047, simple_loss=0.2619, pruned_loss=0.0542, ctc_loss=0.1165, cr_loss=0.3916, over 6770087.28 frames. ], batch size: 98, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 18:28:01,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=720804.0, ans=0.025 2024-09-19 18:28:10,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=720804.0, ans=0.125 2024-09-19 18:28:20,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=720850.6666666666, ans=0.5 2024-09-19 18:28:32,902 INFO [train.py:1198] (1/2) Epoch 40, batch 3300, loss[loss=0.212, simple_loss=0.2708, pruned_loss=0.05623, ctc_loss=0.1237, cr_loss=0.3974, over 33011.00 frames. ], tot_loss[loss=0.2032, simple_loss=0.2605, pruned_loss=0.05366, ctc_loss=0.1155, cr_loss=0.389, over 6768113.65 frames. ], batch size: 130, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 18:28:44,225 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.080e+02 2.442e+02 2.793e+02 3.464e+02 5.213e+02, threshold=5.586e+02, percent-clipped=0.0 2024-09-19 18:28:49,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=720944.0, ans=0.1 2024-09-19 18:29:05,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=720990.6666666666, ans=0.2 2024-09-19 18:29:07,475 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2024-09-19 18:29:08,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=720990.6666666666, ans=0.0 2024-09-19 18:29:10,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=720990.6666666666, ans=0.2 2024-09-19 18:29:27,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=721037.3333333334, ans=0.125 2024-09-19 18:29:43,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=721084.0, ans=0.07 2024-09-19 18:29:54,817 INFO [train.py:1198] (1/2) Epoch 40, batch 3350, loss[loss=0.2096, simple_loss=0.2673, pruned_loss=0.05566, ctc_loss=0.1222, cr_loss=0.4055, over 33769.00 frames. ], tot_loss[loss=0.2042, simple_loss=0.2614, pruned_loss=0.05404, ctc_loss=0.1163, cr_loss=0.3905, over 6742754.57 frames. ], batch size: 122, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 18:30:01,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=721130.6666666666, ans=0.0 2024-09-19 18:30:04,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=721130.6666666666, ans=0.125 2024-09-19 18:30:28,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.58 vs. limit=22.5 2024-09-19 18:30:33,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=721224.0, ans=0.0 2024-09-19 18:30:44,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=721270.6666666666, ans=0.0 2024-09-19 18:31:09,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=721317.3333333334, ans=0.1 2024-09-19 18:31:14,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=721364.0, ans=0.0 2024-09-19 18:31:15,286 INFO [train.py:1198] (1/2) Epoch 40, batch 3400, loss[loss=0.1836, simple_loss=0.2357, pruned_loss=0.04834, ctc_loss=0.1023, cr_loss=0.3582, over 34122.00 frames. ], tot_loss[loss=0.2042, simple_loss=0.2612, pruned_loss=0.05413, ctc_loss=0.1164, cr_loss=0.391, over 6733462.77 frames. ], batch size: 78, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 18:31:24,861 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.184e+02 2.487e+02 2.803e+02 3.392e+02 6.685e+02, threshold=5.606e+02, percent-clipped=1.0 2024-09-19 18:31:45,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=721410.6666666666, ans=0.0 2024-09-19 18:31:46,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=12.0 2024-09-19 18:31:52,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=721457.3333333334, ans=0.1 2024-09-19 18:32:00,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=721457.3333333334, ans=0.0 2024-09-19 18:32:19,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=721550.6666666666, ans=0.0 2024-09-19 18:32:31,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2024-09-19 18:32:36,701 INFO [train.py:1198] (1/2) Epoch 40, batch 3450, loss[loss=0.2147, simple_loss=0.2738, pruned_loss=0.05729, ctc_loss=0.1238, cr_loss=0.4085, over 33008.00 frames. ], tot_loss[loss=0.2047, simple_loss=0.2618, pruned_loss=0.05431, ctc_loss=0.1168, cr_loss=0.3921, over 6745622.85 frames. ], batch size: 130, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 18:32:43,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=721597.3333333334, ans=0.0 2024-09-19 18:32:53,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=721644.0, ans=0.5 2024-09-19 18:33:04,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=721644.0, ans=0.125 2024-09-19 18:33:14,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.90 vs. limit=15.0 2024-09-19 18:33:17,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=721690.6666666666, ans=0.0 2024-09-19 18:33:38,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.70 vs. limit=10.0 2024-09-19 18:33:43,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=721784.0, ans=0.5 2024-09-19 18:33:57,666 INFO [train.py:1198] (1/2) Epoch 40, batch 3500, loss[loss=0.1671, simple_loss=0.2317, pruned_loss=0.03701, ctc_loss=0.08284, cr_loss=0.2995, over 34447.00 frames. ], tot_loss[loss=0.204, simple_loss=0.2611, pruned_loss=0.05402, ctc_loss=0.1162, cr_loss=0.3906, over 6747934.54 frames. ], batch size: 85, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 18:34:04,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=721830.6666666666, ans=0.125 2024-09-19 18:34:07,214 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.980e+02 2.512e+02 2.875e+02 3.556e+02 5.860e+02, threshold=5.749e+02, percent-clipped=2.0 2024-09-19 18:34:07,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=721830.6666666666, ans=0.0 2024-09-19 18:34:07,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=721830.6666666666, ans=0.07 2024-09-19 18:34:14,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.83 vs. limit=10.0 2024-09-19 18:34:31,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=721924.0, ans=0.125 2024-09-19 18:34:32,237 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.60 vs. limit=10.0 2024-09-19 18:34:34,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=721924.0, ans=0.1 2024-09-19 18:34:44,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=721970.6666666666, ans=0.2 2024-09-19 18:35:17,287 INFO [train.py:1198] (1/2) Epoch 40, batch 3550, loss[loss=0.2028, simple_loss=0.2683, pruned_loss=0.04973, ctc_loss=0.1126, cr_loss=0.3851, over 34399.00 frames. ], tot_loss[loss=0.2042, simple_loss=0.2614, pruned_loss=0.05407, ctc_loss=0.1163, cr_loss=0.391, over 6757276.72 frames. ], batch size: 103, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 18:35:45,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=722110.6666666666, ans=0.0 2024-09-19 18:35:56,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.37 vs. limit=15.0 2024-09-19 18:36:14,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.56 vs. limit=15.0 2024-09-19 18:36:24,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=722250.6666666666, ans=0.0 2024-09-19 18:36:38,541 INFO [train.py:1198] (1/2) Epoch 40, batch 3600, loss[loss=0.2067, simple_loss=0.261, pruned_loss=0.05619, ctc_loss=0.1198, cr_loss=0.3978, over 34457.00 frames. ], tot_loss[loss=0.2044, simple_loss=0.2615, pruned_loss=0.05416, ctc_loss=0.1165, cr_loss=0.3914, over 6766706.75 frames. ], batch size: 90, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 18:36:38,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=722297.3333333334, ans=0.04949747468305833 2024-09-19 18:36:43,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=722297.3333333334, ans=0.0 2024-09-19 18:36:48,076 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.184e+02 2.605e+02 3.074e+02 4.049e+02 7.682e+02, threshold=6.147e+02, percent-clipped=5.0 2024-09-19 18:36:48,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=722297.3333333334, ans=0.125 2024-09-19 18:37:04,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=722344.0, ans=0.125 2024-09-19 18:37:12,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=722390.6666666666, ans=0.125 2024-09-19 18:37:14,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=722390.6666666666, ans=0.125 2024-09-19 18:37:34,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722437.3333333334, ans=0.1 2024-09-19 18:37:50,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.44 vs. limit=15.0 2024-09-19 18:37:59,367 INFO [train.py:1198] (1/2) Epoch 40, batch 3650, loss[loss=0.2179, simple_loss=0.2798, pruned_loss=0.05736, ctc_loss=0.1221, cr_loss=0.4195, over 34463.00 frames. ], tot_loss[loss=0.2036, simple_loss=0.2609, pruned_loss=0.05378, ctc_loss=0.1158, cr_loss=0.3899, over 6768738.78 frames. ], batch size: 110, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 18:37:59,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=722530.6666666666, ans=0.05 2024-09-19 18:38:03,484 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=15.0 2024-09-19 18:38:14,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=722577.3333333334, ans=0.125 2024-09-19 18:38:39,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=722624.0, ans=0.125 2024-09-19 18:38:54,844 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.33 vs. limit=6.0 2024-09-19 18:39:03,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=722717.3333333334, ans=0.1 2024-09-19 18:39:18,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=722764.0, ans=0.1 2024-09-19 18:39:20,086 INFO [train.py:1198] (1/2) Epoch 40, batch 3700, loss[loss=0.2226, simple_loss=0.2786, pruned_loss=0.06186, ctc_loss=0.1283, cr_loss=0.4302, over 34618.00 frames. ], tot_loss[loss=0.2034, simple_loss=0.261, pruned_loss=0.05355, ctc_loss=0.1153, cr_loss=0.3887, over 6784049.16 frames. ], batch size: 102, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 18:39:29,786 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.108e+02 2.444e+02 2.865e+02 3.707e+02 6.702e+02, threshold=5.729e+02, percent-clipped=3.0 2024-09-19 18:39:30,638 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2024-09-19 18:39:35,553 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.61 vs. limit=15.0 2024-09-19 18:39:42,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.85 vs. limit=15.0 2024-09-19 18:39:54,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=722857.3333333334, ans=0.0 2024-09-19 18:40:02,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=15.0 2024-09-19 18:40:04,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=722857.3333333334, ans=0.125 2024-09-19 18:40:12,344 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=722904.0, ans=0.025 2024-09-19 18:40:15,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=722904.0, ans=0.125 2024-09-19 18:40:16,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=722904.0, ans=0.125 2024-09-19 18:40:41,496 INFO [train.py:1198] (1/2) Epoch 40, batch 3750, loss[loss=0.2096, simple_loss=0.2693, pruned_loss=0.05527, ctc_loss=0.1176, cr_loss=0.3982, over 34384.00 frames. ], tot_loss[loss=0.2068, simple_loss=0.2643, pruned_loss=0.05493, ctc_loss=0.1179, cr_loss=0.395, over 6786009.06 frames. ], batch size: 113, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 18:41:04,932 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.22 vs. limit=15.0 2024-09-19 18:41:14,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=723090.6666666666, ans=0.0 2024-09-19 18:41:38,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=723137.3333333334, ans=0.2 2024-09-19 18:41:47,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.87 vs. limit=15.0 2024-09-19 18:41:56,976 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=7.19 vs. limit=12.0 2024-09-19 18:42:02,491 INFO [train.py:1198] (1/2) Epoch 40, batch 3800, loss[loss=0.2303, simple_loss=0.2793, pruned_loss=0.06801, ctc_loss=0.1389, cr_loss=0.4342, over 30110.00 frames. ], tot_loss[loss=0.2095, simple_loss=0.2666, pruned_loss=0.05621, ctc_loss=0.1203, cr_loss=0.4001, over 6674156.02 frames. ], batch size: 175, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 18:42:04,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=723230.6666666666, ans=0.125 2024-09-19 18:42:07,875 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:42:07,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=723230.6666666666, ans=0.125 2024-09-19 18:42:10,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=723230.6666666666, ans=0.125 2024-09-19 18:42:12,387 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.174e+02 2.402e+02 2.533e+02 2.742e+02 4.465e+02, threshold=5.066e+02, percent-clipped=0.0 2024-09-19 18:42:20,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.max_abs, batch_count=723277.3333333334, ans=10.0 2024-09-19 18:42:22,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=723277.3333333334, ans=0.125 2024-09-19 18:43:06,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=723370.6666666666, ans=0.025 2024-09-19 18:43:11,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=723417.3333333334, ans=0.025 2024-09-19 18:43:11,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=723417.3333333334, ans=0.0 2024-09-19 18:43:18,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=723417.3333333334, ans=0.0 2024-09-19 18:43:25,951 INFO [train.py:1198] (1/2) Epoch 40, batch 3850, loss[loss=0.2272, simple_loss=0.2764, pruned_loss=0.06641, ctc_loss=0.1423, cr_loss=0.4173, over 24094.00 frames. ], tot_loss[loss=0.2126, simple_loss=0.2686, pruned_loss=0.05785, ctc_loss=0.1238, cr_loss=0.4042, over 6251615.43 frames. ], batch size: 244, lr: 2.96e-03, grad_scale: 32.0 2024-09-19 18:43:36,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=723464.0, ans=0.0 2024-09-19 18:44:51,169 INFO [train.py:1198] (1/2) Epoch 41, batch 0, loss[loss=0.1848, simple_loss=0.2442, pruned_loss=0.04599, ctc_loss=0.09819, cr_loss=0.3436, over 34445.00 frames. ], tot_loss[loss=0.1848, simple_loss=0.2442, pruned_loss=0.04599, ctc_loss=0.09819, cr_loss=0.3436, over 34445.00 frames. ], batch size: 85, lr: 2.92e-03, grad_scale: 32.0 2024-09-19 18:44:51,169 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 18:45:07,953 INFO [train.py:1230] (1/2) Epoch 41, validation: loss=0.1494, simple_loss=0.2436, pruned_loss=0.02369, ctc_loss=0.03933, cr_loss=2.312e-14, over 944034.00 frames. 2024-09-19 18:45:07,953 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 18:45:32,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=723632.0, ans=0.0 2024-09-19 18:45:39,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=723632.0, ans=0.2 2024-09-19 18:45:53,040 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:45:59,280 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.206e+02 2.707e+02 2.874e+02 3.170e+02 6.380e+02, threshold=5.749e+02, percent-clipped=3.0 2024-09-19 18:46:01,706 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2024-09-19 18:46:32,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=723818.6666666666, ans=0.025 2024-09-19 18:46:33,331 INFO [train.py:1198] (1/2) Epoch 41, batch 50, loss[loss=0.1813, simple_loss=0.2319, pruned_loss=0.048, ctc_loss=0.1035, cr_loss=0.3505, over 34526.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.2626, pruned_loss=0.05489, ctc_loss=0.1175, cr_loss=0.3966, over 1480266.73 frames. ], batch size: 82, lr: 2.92e-03, grad_scale: 32.0 2024-09-19 18:46:40,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=723818.6666666666, ans=0.025 2024-09-19 18:46:43,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=723818.6666666666, ans=0.1 2024-09-19 18:46:46,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=723818.6666666666, ans=0.0 2024-09-19 18:46:48,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=723865.3333333334, ans=0.125 2024-09-19 18:47:07,520 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.48 vs. limit=12.0 2024-09-19 18:47:08,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=723912.0, ans=0.0 2024-09-19 18:47:15,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=723912.0, ans=0.2 2024-09-19 18:47:21,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=723958.6666666666, ans=0.0 2024-09-19 18:47:42,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=724005.3333333334, ans=0.125 2024-09-19 18:47:55,687 INFO [train.py:1198] (1/2) Epoch 41, batch 100, loss[loss=0.2017, simple_loss=0.2539, pruned_loss=0.05517, ctc_loss=0.1182, cr_loss=0.3881, over 34573.00 frames. ], tot_loss[loss=0.2065, simple_loss=0.2638, pruned_loss=0.05494, ctc_loss=0.1178, cr_loss=0.3948, over 2627716.38 frames. ], batch size: 89, lr: 2.92e-03, grad_scale: 32.0 2024-09-19 18:48:32,398 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2024-09-19 18:48:44,714 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 2.483e+02 2.783e+02 3.518e+02 8.112e+02, threshold=5.566e+02, percent-clipped=3.0 2024-09-19 18:49:01,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=724238.6666666666, ans=0.025 2024-09-19 18:49:07,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=724238.6666666666, ans=0.0 2024-09-19 18:49:19,955 INFO [train.py:1198] (1/2) Epoch 41, batch 150, loss[loss=0.1794, simple_loss=0.2358, pruned_loss=0.04425, ctc_loss=0.1002, cr_loss=0.3637, over 34504.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2616, pruned_loss=0.05387, ctc_loss=0.1158, cr_loss=0.3903, over 3555831.56 frames. ], batch size: 82, lr: 2.92e-03, grad_scale: 32.0 2024-09-19 18:49:20,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=724285.3333333334, ans=0.125 2024-09-19 18:49:23,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2024-09-19 18:50:21,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=724425.3333333334, ans=0.1 2024-09-19 18:50:39,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=724472.0, ans=0.125 2024-09-19 18:50:43,672 INFO [train.py:1198] (1/2) Epoch 41, batch 200, loss[loss=0.218, simple_loss=0.2731, pruned_loss=0.06044, ctc_loss=0.1282, cr_loss=0.4112, over 31914.00 frames. ], tot_loss[loss=0.2032, simple_loss=0.2605, pruned_loss=0.05365, ctc_loss=0.1155, cr_loss=0.3893, over 4271515.82 frames. ], batch size: 145, lr: 2.92e-03, grad_scale: 32.0 2024-09-19 18:50:59,471 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.85 vs. limit=10.0 2024-09-19 18:51:11,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=724565.3333333334, ans=0.2 2024-09-19 18:51:16,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2024-09-19 18:51:23,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=724612.0, ans=0.5 2024-09-19 18:51:33,175 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.160e+02 2.735e+02 3.452e+02 4.808e+02 7.986e+02, threshold=6.905e+02, percent-clipped=17.0 2024-09-19 18:51:43,631 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.408e-02 2024-09-19 18:51:48,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=724705.3333333334, ans=0.0 2024-09-19 18:51:50,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2024-09-19 18:51:53,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=724705.3333333334, ans=0.025 2024-09-19 18:52:01,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=724705.3333333334, ans=0.125 2024-09-19 18:52:06,360 INFO [train.py:1198] (1/2) Epoch 41, batch 250, loss[loss=0.2089, simple_loss=0.2682, pruned_loss=0.055, ctc_loss=0.1169, cr_loss=0.4057, over 34213.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.261, pruned_loss=0.05362, ctc_loss=0.1156, cr_loss=0.3896, over 4832803.85 frames. ], batch size: 117, lr: 2.92e-03, grad_scale: 32.0 2024-09-19 18:52:07,689 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=15.0 2024-09-19 18:52:16,764 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:52:31,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=724798.6666666666, ans=0.125 2024-09-19 18:52:33,470 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.08 vs. limit=15.0 2024-09-19 18:52:34,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=724798.6666666666, ans=0.1 2024-09-19 18:52:34,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=724798.6666666666, ans=10.0 2024-09-19 18:52:52,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=724845.3333333334, ans=0.125 2024-09-19 18:52:54,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=724892.0, ans=0.0 2024-09-19 18:53:07,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=724892.0, ans=0.125 2024-09-19 18:53:24,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=724938.6666666666, ans=0.025 2024-09-19 18:53:32,538 INFO [train.py:1198] (1/2) Epoch 41, batch 300, loss[loss=0.2292, simple_loss=0.2817, pruned_loss=0.06587, ctc_loss=0.1379, cr_loss=0.4376, over 34333.00 frames. ], tot_loss[loss=0.2032, simple_loss=0.2606, pruned_loss=0.0536, ctc_loss=0.1153, cr_loss=0.3892, over 5260518.35 frames. ], batch size: 107, lr: 2.92e-03, grad_scale: 32.0 2024-09-19 18:54:04,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=725078.6666666666, ans=0.0 2024-09-19 18:54:21,713 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.128e+02 2.555e+02 2.898e+02 4.014e+02 6.250e+02, threshold=5.797e+02, percent-clipped=0.0 2024-09-19 18:54:33,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=725125.3333333334, ans=0.0 2024-09-19 18:54:41,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=725172.0, ans=0.0 2024-09-19 18:54:50,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=725172.0, ans=0.125 2024-09-19 18:54:53,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=725218.6666666666, ans=0.0 2024-09-19 18:54:54,952 INFO [train.py:1198] (1/2) Epoch 41, batch 350, loss[loss=0.1805, simple_loss=0.2386, pruned_loss=0.04452, ctc_loss=0.09815, cr_loss=0.343, over 34270.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.261, pruned_loss=0.05366, ctc_loss=0.1154, cr_loss=0.3896, over 5597192.70 frames. ], batch size: 83, lr: 2.92e-03, grad_scale: 32.0 2024-09-19 18:55:19,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=725265.3333333334, ans=0.025 2024-09-19 18:55:42,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=725358.6666666666, ans=0.125 2024-09-19 18:55:59,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=725405.3333333334, ans=0.2 2024-09-19 18:56:07,582 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:56:16,751 INFO [train.py:1198] (1/2) Epoch 41, batch 400, loss[loss=0.2042, simple_loss=0.2618, pruned_loss=0.05392, ctc_loss=0.1156, cr_loss=0.3918, over 34411.00 frames. ], tot_loss[loss=0.203, simple_loss=0.2606, pruned_loss=0.05338, ctc_loss=0.1151, cr_loss=0.3882, over 5864958.43 frames. ], batch size: 95, lr: 2.92e-03, grad_scale: 32.0 2024-09-19 18:56:17,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=725452.0, ans=0.09899494936611666 2024-09-19 18:56:25,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=725452.0, ans=0.2 2024-09-19 18:56:25,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=725452.0, ans=0.2 2024-09-19 18:56:27,966 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.14 vs. limit=15.0 2024-09-19 18:56:35,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=725498.6666666666, ans=0.0 2024-09-19 18:56:36,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=725498.6666666666, ans=0.0 2024-09-19 18:56:41,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=725498.6666666666, ans=0.125 2024-09-19 18:56:57,148 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.34 vs. limit=15.0 2024-09-19 18:57:08,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.174e+02 2.508e+02 2.791e+02 3.502e+02 5.859e+02, threshold=5.581e+02, percent-clipped=1.0 2024-09-19 18:57:18,259 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.49 vs. limit=15.0 2024-09-19 18:57:26,318 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.77 vs. limit=6.0 2024-09-19 18:57:43,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.54 vs. limit=22.5 2024-09-19 18:57:44,012 INFO [train.py:1198] (1/2) Epoch 41, batch 450, loss[loss=0.2152, simple_loss=0.2737, pruned_loss=0.05746, ctc_loss=0.1245, cr_loss=0.4219, over 34683.00 frames. ], tot_loss[loss=0.2031, simple_loss=0.2608, pruned_loss=0.05345, ctc_loss=0.1152, cr_loss=0.3886, over 6055379.46 frames. ], batch size: 97, lr: 2.92e-03, grad_scale: 32.0 2024-09-19 18:57:57,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=725685.3333333334, ans=0.125 2024-09-19 18:58:06,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=725732.0, ans=0.125 2024-09-19 18:58:11,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=725732.0, ans=0.125 2024-09-19 18:58:14,910 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.85 vs. limit=22.5 2024-09-19 18:58:19,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=725778.6666666666, ans=0.015 2024-09-19 18:58:23,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.18 vs. limit=22.5 2024-09-19 18:58:32,797 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 18:58:34,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2024-09-19 18:59:06,920 INFO [train.py:1198] (1/2) Epoch 41, batch 500, loss[loss=0.2155, simple_loss=0.2751, pruned_loss=0.05759, ctc_loss=0.1216, cr_loss=0.4095, over 34459.00 frames. ], tot_loss[loss=0.2022, simple_loss=0.2599, pruned_loss=0.05306, ctc_loss=0.1145, cr_loss=0.3873, over 6222674.43 frames. ], batch size: 110, lr: 2.92e-03, grad_scale: 32.0 2024-09-19 18:59:12,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=725918.6666666666, ans=0.09899494936611666 2024-09-19 18:59:23,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=725965.3333333334, ans=0.0 2024-09-19 18:59:25,958 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.51 vs. limit=22.5 2024-09-19 18:59:37,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=725965.3333333334, ans=0.125 2024-09-19 18:59:56,757 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.101e+02 2.546e+02 3.017e+02 3.756e+02 6.155e+02, threshold=6.035e+02, percent-clipped=4.0 2024-09-19 19:00:30,280 INFO [train.py:1198] (1/2) Epoch 41, batch 550, loss[loss=0.2084, simple_loss=0.2702, pruned_loss=0.05336, ctc_loss=0.1199, cr_loss=0.3986, over 33861.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.2601, pruned_loss=0.05304, ctc_loss=0.1145, cr_loss=0.3871, over 6331926.85 frames. ], batch size: 122, lr: 2.92e-03, grad_scale: 32.0 2024-09-19 19:00:54,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=726198.6666666666, ans=0.125 2024-09-19 19:00:57,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=726198.6666666666, ans=0.0 2024-09-19 19:01:02,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=726198.6666666666, ans=0.0 2024-09-19 19:01:39,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=726338.6666666666, ans=0.0 2024-09-19 19:01:39,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=726338.6666666666, ans=0.1 2024-09-19 19:01:56,764 INFO [train.py:1198] (1/2) Epoch 41, batch 600, loss[loss=0.2015, simple_loss=0.2621, pruned_loss=0.05172, ctc_loss=0.1115, cr_loss=0.3815, over 34216.00 frames. ], tot_loss[loss=0.2021, simple_loss=0.2599, pruned_loss=0.05301, ctc_loss=0.1145, cr_loss=0.3872, over 6433311.60 frames. ], batch size: 117, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:02:15,119 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=726432.0, ans=0.125 2024-09-19 19:02:20,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=726432.0, ans=0.125 2024-09-19 19:02:20,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=726432.0, ans=0.125 2024-09-19 19:02:28,029 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=726478.6666666666, ans=0.125 2024-09-19 19:02:29,031 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.08 vs. limit=15.0 2024-09-19 19:02:32,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=726478.6666666666, ans=0.125 2024-09-19 19:02:36,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=726478.6666666666, ans=0.125 2024-09-19 19:02:45,525 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.126e+02 2.507e+02 2.996e+02 3.757e+02 8.792e+02, threshold=5.992e+02, percent-clipped=3.0 2024-09-19 19:02:49,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=726525.3333333334, ans=0.125 2024-09-19 19:03:18,299 INFO [train.py:1198] (1/2) Epoch 41, batch 650, loss[loss=0.2074, simple_loss=0.2683, pruned_loss=0.05373, ctc_loss=0.1172, cr_loss=0.3906, over 34529.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2591, pruned_loss=0.0526, ctc_loss=0.1138, cr_loss=0.3857, over 6523948.34 frames. ], batch size: 94, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:03:26,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=726618.6666666666, ans=0.125 2024-09-19 19:03:31,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=726618.6666666666, ans=0.1 2024-09-19 19:03:50,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=726712.0, ans=0.2 2024-09-19 19:03:57,194 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.28 vs. limit=15.0 2024-09-19 19:04:22,315 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=22.5 2024-09-19 19:04:23,146 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=726805.3333333334, ans=0.0 2024-09-19 19:04:24,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=726805.3333333334, ans=0.125 2024-09-19 19:04:26,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=726805.3333333334, ans=0.2 2024-09-19 19:04:40,926 INFO [train.py:1198] (1/2) Epoch 41, batch 700, loss[loss=0.1964, simple_loss=0.2543, pruned_loss=0.05096, ctc_loss=0.11, cr_loss=0.3631, over 34571.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.2602, pruned_loss=0.053, ctc_loss=0.1144, cr_loss=0.3873, over 6579640.70 frames. ], batch size: 89, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:05:18,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=726945.3333333334, ans=0.0 2024-09-19 19:05:19,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=726945.3333333334, ans=0.125 2024-09-19 19:05:19,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=726945.3333333334, ans=0.125 2024-09-19 19:05:34,527 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 2.565e+02 3.005e+02 3.807e+02 6.211e+02, threshold=6.010e+02, percent-clipped=2.0 2024-09-19 19:05:49,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=727038.6666666666, ans=0.125 2024-09-19 19:05:54,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=727038.6666666666, ans=0.125 2024-09-19 19:06:01,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=727038.6666666666, ans=0.0 2024-09-19 19:06:07,744 INFO [train.py:1198] (1/2) Epoch 41, batch 750, loss[loss=0.211, simple_loss=0.2699, pruned_loss=0.05557, ctc_loss=0.1236, cr_loss=0.4057, over 34416.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2598, pruned_loss=0.05295, ctc_loss=0.1142, cr_loss=0.3869, over 6624842.59 frames. ], batch size: 95, lr: 2.91e-03, grad_scale: 64.0 2024-09-19 19:06:29,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=727132.0, ans=0.0 2024-09-19 19:07:00,574 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:07:23,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=727272.0, ans=0.125 2024-09-19 19:07:29,818 INFO [train.py:1198] (1/2) Epoch 41, batch 800, loss[loss=0.1766, simple_loss=0.2344, pruned_loss=0.04334, ctc_loss=0.09492, cr_loss=0.3291, over 34484.00 frames. ], tot_loss[loss=0.2021, simple_loss=0.2598, pruned_loss=0.05302, ctc_loss=0.1144, cr_loss=0.3868, over 6661253.75 frames. ], batch size: 85, lr: 2.91e-03, grad_scale: 64.0 2024-09-19 19:07:38,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=727318.6666666666, ans=0.125 2024-09-19 19:07:52,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=727365.3333333334, ans=0.2 2024-09-19 19:08:01,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=727412.0, ans=0.1 2024-09-19 19:08:18,749 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.165e+02 2.510e+02 2.808e+02 3.273e+02 5.562e+02, threshold=5.617e+02, percent-clipped=0.0 2024-09-19 19:08:25,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=727458.6666666666, ans=0.125 2024-09-19 19:08:31,325 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.20 vs. limit=10.0 2024-09-19 19:08:55,175 INFO [train.py:1198] (1/2) Epoch 41, batch 850, loss[loss=0.2139, simple_loss=0.2718, pruned_loss=0.05746, ctc_loss=0.1216, cr_loss=0.4157, over 34390.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2598, pruned_loss=0.05296, ctc_loss=0.1142, cr_loss=0.3865, over 6694175.68 frames. ], batch size: 103, lr: 2.91e-03, grad_scale: 64.0 2024-09-19 19:09:02,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.16 vs. limit=10.0 2024-09-19 19:09:03,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=727552.0, ans=0.125 2024-09-19 19:09:05,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=727552.0, ans=10.0 2024-09-19 19:09:16,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=727598.6666666666, ans=0.0 2024-09-19 19:09:41,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=727645.3333333334, ans=0.125 2024-09-19 19:09:41,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=727645.3333333334, ans=0.1 2024-09-19 19:09:54,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=727692.0, ans=0.125 2024-09-19 19:09:58,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=727692.0, ans=0.125 2024-09-19 19:10:17,626 INFO [train.py:1198] (1/2) Epoch 41, batch 900, loss[loss=0.1892, simple_loss=0.246, pruned_loss=0.04833, ctc_loss=0.105, cr_loss=0.3707, over 34457.00 frames. ], tot_loss[loss=0.2028, simple_loss=0.2604, pruned_loss=0.05331, ctc_loss=0.1149, cr_loss=0.3882, over 6700692.13 frames. ], batch size: 85, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:10:47,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=727832.0, ans=0.125 2024-09-19 19:11:07,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=727925.3333333334, ans=0.125 2024-09-19 19:11:08,420 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.167e+02 2.510e+02 2.867e+02 3.234e+02 5.079e+02, threshold=5.734e+02, percent-clipped=0.0 2024-09-19 19:11:23,672 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:11:30,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=727972.0, ans=0.0 2024-09-19 19:11:39,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=727972.0, ans=0.1 2024-09-19 19:11:46,010 INFO [train.py:1198] (1/2) Epoch 41, batch 950, loss[loss=0.1895, simple_loss=0.2496, pruned_loss=0.04705, ctc_loss=0.1023, cr_loss=0.3708, over 34684.00 frames. ], tot_loss[loss=0.203, simple_loss=0.2608, pruned_loss=0.05339, ctc_loss=0.115, cr_loss=0.3883, over 6703983.86 frames. ], batch size: 87, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:11:52,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=728018.6666666666, ans=0.125 2024-09-19 19:11:59,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=728018.6666666666, ans=0.2 2024-09-19 19:12:15,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=728065.3333333334, ans=0.2 2024-09-19 19:12:21,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.68 vs. limit=12.0 2024-09-19 19:13:05,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=728205.3333333334, ans=0.1 2024-09-19 19:13:05,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=728205.3333333334, ans=0.125 2024-09-19 19:13:11,694 INFO [train.py:1198] (1/2) Epoch 41, batch 1000, loss[loss=0.1913, simple_loss=0.252, pruned_loss=0.04738, ctc_loss=0.1056, cr_loss=0.3697, over 34487.00 frames. ], tot_loss[loss=0.2039, simple_loss=0.2615, pruned_loss=0.05377, ctc_loss=0.1157, cr_loss=0.3897, over 6695612.20 frames. ], batch size: 90, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:13:16,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=728252.0, ans=0.125 2024-09-19 19:13:20,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=728252.0, ans=0.125 2024-09-19 19:13:20,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.34 vs. limit=12.0 2024-09-19 19:13:24,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=728252.0, ans=0.125 2024-09-19 19:13:35,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=728298.6666666666, ans=0.125 2024-09-19 19:13:44,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=728345.3333333334, ans=0.1 2024-09-19 19:14:01,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=728392.0, ans=0.1 2024-09-19 19:14:02,507 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.147e+02 2.577e+02 3.166e+02 3.967e+02 6.183e+02, threshold=6.331e+02, percent-clipped=1.0 2024-09-19 19:14:03,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=728392.0, ans=15.0 2024-09-19 19:14:09,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=728392.0, ans=0.0 2024-09-19 19:14:14,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=15.0 2024-09-19 19:14:22,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=728438.6666666666, ans=0.0 2024-09-19 19:14:24,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.58 vs. limit=15.0 2024-09-19 19:14:33,632 INFO [train.py:1198] (1/2) Epoch 41, batch 1050, loss[loss=0.2132, simple_loss=0.2756, pruned_loss=0.0557, ctc_loss=0.1186, cr_loss=0.3929, over 34574.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.2608, pruned_loss=0.05359, ctc_loss=0.1152, cr_loss=0.3882, over 6704983.75 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:14:41,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=728485.3333333334, ans=0.0 2024-09-19 19:14:43,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=728485.3333333334, ans=0.125 2024-09-19 19:14:45,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.53 vs. limit=15.0 2024-09-19 19:15:31,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=728625.3333333334, ans=0.125 2024-09-19 19:15:47,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=728672.0, ans=0.0 2024-09-19 19:15:55,892 INFO [train.py:1198] (1/2) Epoch 41, batch 1100, loss[loss=0.1786, simple_loss=0.2406, pruned_loss=0.04239, ctc_loss=0.09376, cr_loss=0.3281, over 34368.00 frames. ], tot_loss[loss=0.203, simple_loss=0.2607, pruned_loss=0.05342, ctc_loss=0.1149, cr_loss=0.3874, over 6717247.38 frames. ], batch size: 91, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:16:24,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=728765.3333333334, ans=0.125 2024-09-19 19:16:31,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=728812.0, ans=0.1 2024-09-19 19:16:44,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=728812.0, ans=0.125 2024-09-19 19:16:50,546 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.150e+02 2.488e+02 2.835e+02 3.343e+02 5.455e+02, threshold=5.669e+02, percent-clipped=0.0 2024-09-19 19:16:59,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=728858.6666666666, ans=0.125 2024-09-19 19:17:14,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=728905.3333333334, ans=0.125 2024-09-19 19:17:20,333 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.63 vs. limit=10.0 2024-09-19 19:17:22,546 INFO [train.py:1198] (1/2) Epoch 41, batch 1150, loss[loss=0.1958, simple_loss=0.2537, pruned_loss=0.05068, ctc_loss=0.1091, cr_loss=0.3692, over 34346.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.2609, pruned_loss=0.0537, ctc_loss=0.1155, cr_loss=0.3892, over 6714958.75 frames. ], batch size: 91, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:17:54,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=729045.3333333334, ans=0.125 2024-09-19 19:18:05,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=729045.3333333334, ans=0.0 2024-09-19 19:18:08,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.80 vs. limit=15.0 2024-09-19 19:18:11,711 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.94 vs. limit=15.0 2024-09-19 19:18:25,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=729092.0, ans=0.125 2024-09-19 19:18:33,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=729138.6666666666, ans=0.125 2024-09-19 19:18:36,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=729138.6666666666, ans=0.125 2024-09-19 19:18:44,862 INFO [train.py:1198] (1/2) Epoch 41, batch 1200, loss[loss=0.2128, simple_loss=0.2775, pruned_loss=0.05437, ctc_loss=0.117, cr_loss=0.3973, over 34568.00 frames. ], tot_loss[loss=0.2036, simple_loss=0.2613, pruned_loss=0.05361, ctc_loss=0.1154, cr_loss=0.3889, over 6707547.49 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:19:00,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2024-09-19 19:19:00,631 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.89 vs. limit=22.5 2024-09-19 19:19:18,401 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:19:22,793 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.89 vs. limit=15.0 2024-09-19 19:19:26,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=729278.6666666666, ans=0.1 2024-09-19 19:19:26,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=729278.6666666666, ans=0.04949747468305833 2024-09-19 19:19:36,325 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.159e+02 2.491e+02 2.773e+02 3.282e+02 8.797e+02, threshold=5.546e+02, percent-clipped=3.0 2024-09-19 19:19:53,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=729372.0, ans=0.125 2024-09-19 19:19:54,149 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=22.5 2024-09-19 19:20:11,370 INFO [train.py:1198] (1/2) Epoch 41, batch 1250, loss[loss=0.2257, simple_loss=0.2821, pruned_loss=0.06211, ctc_loss=0.1355, cr_loss=0.45, over 34293.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2619, pruned_loss=0.05377, ctc_loss=0.1157, cr_loss=0.3899, over 6740829.91 frames. ], batch size: 107, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:20:17,171 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=22.5 2024-09-19 19:20:24,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=729418.6666666666, ans=0.0 2024-09-19 19:20:44,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=729512.0, ans=0.125 2024-09-19 19:20:56,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=729512.0, ans=0.125 2024-09-19 19:20:58,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=729512.0, ans=0.0 2024-09-19 19:21:34,038 INFO [train.py:1198] (1/2) Epoch 41, batch 1300, loss[loss=0.2143, simple_loss=0.2767, pruned_loss=0.05605, ctc_loss=0.1184, cr_loss=0.4054, over 33210.00 frames. ], tot_loss[loss=0.2036, simple_loss=0.2615, pruned_loss=0.05355, ctc_loss=0.1153, cr_loss=0.3888, over 6745283.32 frames. ], batch size: 130, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:21:34,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=729652.0, ans=0.125 2024-09-19 19:21:52,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=729698.6666666666, ans=0.2 2024-09-19 19:21:54,027 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=729698.6666666666, ans=0.125 2024-09-19 19:22:19,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=729745.3333333334, ans=0.125 2024-09-19 19:22:25,368 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.116e+02 2.551e+02 2.956e+02 3.561e+02 6.019e+02, threshold=5.913e+02, percent-clipped=2.0 2024-09-19 19:22:57,027 INFO [train.py:1198] (1/2) Epoch 41, batch 1350, loss[loss=0.2003, simple_loss=0.2624, pruned_loss=0.05028, ctc_loss=0.1116, cr_loss=0.3836, over 34517.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.2611, pruned_loss=0.05347, ctc_loss=0.1152, cr_loss=0.3888, over 6766676.02 frames. ], batch size: 94, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:23:00,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=729885.3333333334, ans=0.0 2024-09-19 19:23:08,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=729885.3333333334, ans=0.125 2024-09-19 19:23:14,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.88 vs. limit=15.0 2024-09-19 19:23:16,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=729932.0, ans=0.0 2024-09-19 19:23:41,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=729978.6666666666, ans=0.125 2024-09-19 19:23:42,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=729978.6666666666, ans=0.125 2024-09-19 19:24:22,751 INFO [train.py:1198] (1/2) Epoch 41, batch 1400, loss[loss=0.1915, simple_loss=0.2432, pruned_loss=0.05134, ctc_loss=0.1096, cr_loss=0.3807, over 34255.00 frames. ], tot_loss[loss=0.2032, simple_loss=0.2611, pruned_loss=0.05341, ctc_loss=0.1151, cr_loss=0.3888, over 6778076.05 frames. ], batch size: 80, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:24:26,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730118.6666666666, ans=0.1 2024-09-19 19:24:41,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=730165.3333333334, ans=0.125 2024-09-19 19:25:00,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=730212.0, ans=0.125 2024-09-19 19:25:00,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=730212.0, ans=0.04949747468305833 2024-09-19 19:25:12,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=730258.6666666666, ans=0.0 2024-09-19 19:25:13,496 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.183e+02 2.666e+02 3.069e+02 3.759e+02 6.598e+02, threshold=6.139e+02, percent-clipped=1.0 2024-09-19 19:25:44,604 INFO [train.py:1198] (1/2) Epoch 41, batch 1450, loss[loss=0.2164, simple_loss=0.276, pruned_loss=0.05763, ctc_loss=0.1248, cr_loss=0.4149, over 34443.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.2612, pruned_loss=0.05335, ctc_loss=0.1151, cr_loss=0.3889, over 6774698.81 frames. ], batch size: 110, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:25:56,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.min_positive, batch_count=730352.0, ans=0.025 2024-09-19 19:25:58,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=730352.0, ans=0.125 2024-09-19 19:26:20,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=730445.3333333334, ans=0.125 2024-09-19 19:26:56,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=730538.6666666666, ans=0.0 2024-09-19 19:27:06,178 INFO [train.py:1198] (1/2) Epoch 41, batch 1500, loss[loss=0.2234, simple_loss=0.2843, pruned_loss=0.05986, ctc_loss=0.1286, cr_loss=0.4274, over 34442.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.2613, pruned_loss=0.05332, ctc_loss=0.1151, cr_loss=0.3892, over 6774220.88 frames. ], batch size: 100, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:27:08,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=730585.3333333334, ans=0.0 2024-09-19 19:27:08,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=730585.3333333334, ans=0.125 2024-09-19 19:27:27,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.91 vs. limit=6.0 2024-09-19 19:27:28,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=730632.0, ans=0.125 2024-09-19 19:28:01,584 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.197e+02 2.473e+02 2.749e+02 3.177e+02 5.601e+02, threshold=5.497e+02, percent-clipped=0.0 2024-09-19 19:28:33,121 INFO [train.py:1198] (1/2) Epoch 41, batch 1550, loss[loss=0.2261, simple_loss=0.2816, pruned_loss=0.06349, ctc_loss=0.1306, cr_loss=0.4386, over 34448.00 frames. ], tot_loss[loss=0.2037, simple_loss=0.2613, pruned_loss=0.05364, ctc_loss=0.1157, cr_loss=0.3902, over 6746318.41 frames. ], batch size: 105, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:28:40,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=730818.6666666666, ans=6.0 2024-09-19 19:29:02,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=730865.3333333334, ans=0.0 2024-09-19 19:29:06,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=730912.0, ans=0.125 2024-09-19 19:29:22,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=730958.6666666666, ans=0.09899494936611666 2024-09-19 19:29:35,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=730958.6666666666, ans=0.125 2024-09-19 19:29:35,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=730958.6666666666, ans=0.07 2024-09-19 19:29:54,868 INFO [train.py:1198] (1/2) Epoch 41, batch 1600, loss[loss=0.2121, simple_loss=0.2718, pruned_loss=0.05623, ctc_loss=0.12, cr_loss=0.3973, over 34567.00 frames. ], tot_loss[loss=0.2043, simple_loss=0.2617, pruned_loss=0.05397, ctc_loss=0.1162, cr_loss=0.3913, over 6725559.35 frames. ], batch size: 99, lr: 2.91e-03, grad_scale: 32.0 2024-09-19 19:30:09,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=731098.6666666666, ans=0.125 2024-09-19 19:30:11,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=731098.6666666666, ans=0.125 2024-09-19 19:30:48,066 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.177e+02 2.507e+02 2.897e+02 3.741e+02 6.461e+02, threshold=5.794e+02, percent-clipped=5.0 2024-09-19 19:30:51,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=731192.0, ans=0.125 2024-09-19 19:30:59,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.59 vs. limit=15.0 2024-09-19 19:31:19,023 INFO [train.py:1198] (1/2) Epoch 41, batch 1650, loss[loss=0.2017, simple_loss=0.2625, pruned_loss=0.05121, ctc_loss=0.113, cr_loss=0.3971, over 34402.00 frames. ], tot_loss[loss=0.2042, simple_loss=0.2616, pruned_loss=0.05394, ctc_loss=0.1162, cr_loss=0.3908, over 6719071.38 frames. ], batch size: 103, lr: 2.90e-03, grad_scale: 32.0 2024-09-19 19:31:32,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=731285.3333333334, ans=0.0 2024-09-19 19:32:05,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=731378.6666666666, ans=0.125 2024-09-19 19:32:12,396 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=22.5 2024-09-19 19:32:19,170 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=13.11 vs. limit=22.5 2024-09-19 19:32:23,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=731425.3333333334, ans=0.125 2024-09-19 19:32:42,838 INFO [train.py:1198] (1/2) Epoch 41, batch 1700, loss[loss=0.181, simple_loss=0.2363, pruned_loss=0.04544, ctc_loss=0.1015, cr_loss=0.3648, over 34317.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2614, pruned_loss=0.05372, ctc_loss=0.1157, cr_loss=0.39, over 6744936.91 frames. ], batch size: 80, lr: 2.90e-03, grad_scale: 32.0 2024-09-19 19:32:49,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=731518.6666666666, ans=0.0 2024-09-19 19:32:53,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=731518.6666666666, ans=0.125 2024-09-19 19:33:35,418 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.563e+02 2.913e+02 3.490e+02 7.482e+02, threshold=5.825e+02, percent-clipped=3.0 2024-09-19 19:33:38,049 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=15.0 2024-09-19 19:33:47,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2024-09-19 19:33:58,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=731705.3333333334, ans=0.04949747468305833 2024-09-19 19:34:05,254 INFO [train.py:1198] (1/2) Epoch 41, batch 1750, loss[loss=0.1786, simple_loss=0.2338, pruned_loss=0.0453, ctc_loss=0.09719, cr_loss=0.3351, over 34190.00 frames. ], tot_loss[loss=0.2032, simple_loss=0.2608, pruned_loss=0.05347, ctc_loss=0.1152, cr_loss=0.3892, over 6752473.60 frames. ], batch size: 78, lr: 2.90e-03, grad_scale: 32.0 2024-09-19 19:34:21,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=731798.6666666666, ans=0.05 2024-09-19 19:34:31,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=731798.6666666666, ans=0.025 2024-09-19 19:34:49,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=731845.3333333334, ans=0.0 2024-09-19 19:34:53,440 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=22.5 2024-09-19 19:35:12,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=731938.6666666666, ans=0.1 2024-09-19 19:35:30,440 INFO [train.py:1198] (1/2) Epoch 41, batch 1800, loss[loss=0.2067, simple_loss=0.2646, pruned_loss=0.05457, ctc_loss=0.1187, cr_loss=0.3996, over 34712.00 frames. ], tot_loss[loss=0.2036, simple_loss=0.2612, pruned_loss=0.05369, ctc_loss=0.1157, cr_loss=0.3898, over 6755730.30 frames. ], batch size: 97, lr: 2.90e-03, grad_scale: 32.0 2024-09-19 19:36:23,215 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.140e+02 2.577e+02 3.013e+02 4.021e+02 5.994e+02, threshold=6.025e+02, percent-clipped=2.0 2024-09-19 19:36:26,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=732125.3333333334, ans=0.125 2024-09-19 19:36:30,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=732125.3333333334, ans=0.125 2024-09-19 19:36:45,028 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=732172.0, ans=0.2 2024-09-19 19:36:48,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=732172.0, ans=0.2 2024-09-19 19:36:48,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=732172.0, ans=0.0 2024-09-19 19:36:52,977 INFO [train.py:1198] (1/2) Epoch 41, batch 1850, loss[loss=0.2151, simple_loss=0.2735, pruned_loss=0.05771, ctc_loss=0.1248, cr_loss=0.4092, over 34456.00 frames. ], tot_loss[loss=0.2032, simple_loss=0.2607, pruned_loss=0.05349, ctc_loss=0.1152, cr_loss=0.3894, over 6763456.88 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 32.0 2024-09-19 19:36:58,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=732218.6666666666, ans=0.025 2024-09-19 19:37:17,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=732265.3333333334, ans=0.2 2024-09-19 19:37:21,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.16 vs. limit=22.5 2024-09-19 19:37:22,956 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:37:22,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=732265.3333333334, ans=0.0 2024-09-19 19:37:58,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=732405.3333333334, ans=0.0 2024-09-19 19:38:03,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=732405.3333333334, ans=0.025 2024-09-19 19:38:11,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=732405.3333333334, ans=0.0 2024-09-19 19:38:14,732 INFO [train.py:1198] (1/2) Epoch 41, batch 1900, loss[loss=0.222, simple_loss=0.284, pruned_loss=0.05971, ctc_loss=0.1246, cr_loss=0.3922, over 34388.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2616, pruned_loss=0.05367, ctc_loss=0.1156, cr_loss=0.3901, over 6772813.46 frames. ], batch size: 103, lr: 2.90e-03, grad_scale: 32.0 2024-09-19 19:38:20,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=732452.0, ans=0.125 2024-09-19 19:38:43,976 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2024-09-19 19:38:56,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=732545.3333333334, ans=0.0 2024-09-19 19:39:04,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=732592.0, ans=0.0 2024-09-19 19:39:11,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=732592.0, ans=0.125 2024-09-19 19:39:12,906 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.131e+02 2.514e+02 2.871e+02 3.647e+02 7.935e+02, threshold=5.743e+02, percent-clipped=1.0 2024-09-19 19:39:13,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=732592.0, ans=0.1 2024-09-19 19:39:19,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=732592.0, ans=0.125 2024-09-19 19:39:41,216 INFO [train.py:1198] (1/2) Epoch 41, batch 1950, loss[loss=0.1921, simple_loss=0.2505, pruned_loss=0.04884, ctc_loss=0.1067, cr_loss=0.3683, over 34374.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.2625, pruned_loss=0.05389, ctc_loss=0.116, cr_loss=0.3914, over 6789555.97 frames. ], batch size: 91, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 19:39:59,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=732732.0, ans=0.0 2024-09-19 19:40:11,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732732.0, ans=0.1 2024-09-19 19:40:59,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=4.83 vs. limit=12.0 2024-09-19 19:41:00,765 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-09-19 19:41:03,385 INFO [train.py:1198] (1/2) Epoch 41, batch 2000, loss[loss=0.1758, simple_loss=0.2312, pruned_loss=0.04414, ctc_loss=0.09502, cr_loss=0.3311, over 34196.00 frames. ], tot_loss[loss=0.2048, simple_loss=0.2629, pruned_loss=0.05389, ctc_loss=0.1162, cr_loss=0.3918, over 6763743.91 frames. ], batch size: 78, lr: 2.90e-03, grad_scale: 32.0 2024-09-19 19:41:05,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=732918.6666666666, ans=0.0 2024-09-19 19:41:05,931 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.60 vs. limit=15.0 2024-09-19 19:41:31,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=732965.3333333334, ans=0.125 2024-09-19 19:41:40,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=733012.0, ans=0.0 2024-09-19 19:41:44,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=733012.0, ans=0.125 2024-09-19 19:41:54,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=733058.6666666666, ans=0.05 2024-09-19 19:41:59,258 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.146e+02 2.485e+02 2.845e+02 3.645e+02 7.371e+02, threshold=5.689e+02, percent-clipped=4.0 2024-09-19 19:42:19,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=733105.3333333334, ans=0.125 2024-09-19 19:42:27,466 INFO [train.py:1198] (1/2) Epoch 41, batch 2050, loss[loss=0.1782, simple_loss=0.2333, pruned_loss=0.04493, ctc_loss=0.0997, cr_loss=0.3352, over 34488.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2618, pruned_loss=0.05354, ctc_loss=0.1155, cr_loss=0.39, over 6754132.46 frames. ], batch size: 82, lr: 2.90e-03, grad_scale: 32.0 2024-09-19 19:42:27,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=733152.0, ans=0.125 2024-09-19 19:42:27,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=733152.0, ans=0.0 2024-09-19 19:42:29,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=733152.0, ans=0.025 2024-09-19 19:42:45,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=733198.6666666666, ans=0.125 2024-09-19 19:43:17,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=733292.0, ans=0.025 2024-09-19 19:43:19,732 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.09 vs. limit=15.0 2024-09-19 19:43:20,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=733292.0, ans=0.125 2024-09-19 19:43:43,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=733338.6666666666, ans=0.125 2024-09-19 19:43:50,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=733385.3333333334, ans=0.125 2024-09-19 19:43:51,532 INFO [train.py:1198] (1/2) Epoch 41, batch 2100, loss[loss=0.2054, simple_loss=0.2583, pruned_loss=0.05644, ctc_loss=0.1206, cr_loss=0.3885, over 34558.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.2611, pruned_loss=0.05342, ctc_loss=0.1152, cr_loss=0.3888, over 6766978.40 frames. ], batch size: 94, lr: 2.90e-03, grad_scale: 32.0 2024-09-19 19:43:55,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=733385.3333333334, ans=0.0 2024-09-19 19:44:16,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=733432.0, ans=0.125 2024-09-19 19:44:38,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=733525.3333333334, ans=0.0 2024-09-19 19:44:42,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=733525.3333333334, ans=0.125 2024-09-19 19:44:45,071 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.500e+02 3.064e+02 3.976e+02 6.316e+02, threshold=6.127e+02, percent-clipped=1.0 2024-09-19 19:45:04,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=733572.0, ans=0.125 2024-09-19 19:45:13,153 INFO [train.py:1198] (1/2) Epoch 41, batch 2150, loss[loss=0.1916, simple_loss=0.2478, pruned_loss=0.04997, ctc_loss=0.1065, cr_loss=0.3575, over 34365.00 frames. ], tot_loss[loss=0.2021, simple_loss=0.2601, pruned_loss=0.05294, ctc_loss=0.1143, cr_loss=0.3861, over 6787253.01 frames. ], batch size: 91, lr: 2.90e-03, grad_scale: 32.0 2024-09-19 19:45:16,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=733618.6666666666, ans=0.0 2024-09-19 19:45:44,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=733712.0, ans=0.1 2024-09-19 19:45:50,676 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=20.33 vs. limit=22.5 2024-09-19 19:46:17,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=733758.6666666666, ans=0.125 2024-09-19 19:46:21,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.12 vs. limit=22.5 2024-09-19 19:46:37,247 INFO [train.py:1198] (1/2) Epoch 41, batch 2200, loss[loss=0.2057, simple_loss=0.2707, pruned_loss=0.05157, ctc_loss=0.1103, cr_loss=0.3901, over 34460.00 frames. ], tot_loss[loss=0.2026, simple_loss=0.2603, pruned_loss=0.05324, ctc_loss=0.1147, cr_loss=0.3869, over 6781497.71 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 32.0 2024-09-19 19:46:39,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=733852.0, ans=0.0 2024-09-19 19:46:51,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=733852.0, ans=0.125 2024-09-19 19:46:58,246 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.41 vs. limit=15.0 2024-09-19 19:47:10,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=733945.3333333334, ans=0.125 2024-09-19 19:47:19,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=15.0 2024-09-19 19:47:28,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=733992.0, ans=0.125 2024-09-19 19:47:34,862 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.123e+02 2.618e+02 2.924e+02 3.690e+02 5.190e+02, threshold=5.847e+02, percent-clipped=0.0 2024-09-19 19:48:01,103 INFO [train.py:1198] (1/2) Epoch 41, batch 2250, loss[loss=0.2131, simple_loss=0.2704, pruned_loss=0.05766, ctc_loss=0.1205, cr_loss=0.4099, over 34437.00 frames. ], tot_loss[loss=0.2027, simple_loss=0.2605, pruned_loss=0.05327, ctc_loss=0.1148, cr_loss=0.3875, over 6779870.56 frames. ], batch size: 95, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 19:48:08,999 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.42 vs. limit=10.0 2024-09-19 19:48:23,658 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.47 vs. limit=22.5 2024-09-19 19:48:27,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=734132.0, ans=0.125 2024-09-19 19:48:34,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=734178.6666666666, ans=0.0 2024-09-19 19:48:37,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=734178.6666666666, ans=0.0 2024-09-19 19:48:42,330 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:49:22,787 INFO [train.py:1198] (1/2) Epoch 41, batch 2300, loss[loss=0.1779, simple_loss=0.2316, pruned_loss=0.04513, ctc_loss=0.09921, cr_loss=0.3526, over 34242.00 frames. ], tot_loss[loss=0.2018, simple_loss=0.2595, pruned_loss=0.05294, ctc_loss=0.114, cr_loss=0.386, over 6767328.30 frames. ], batch size: 83, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 19:49:23,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=734318.6666666666, ans=0.2 2024-09-19 19:49:31,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=734318.6666666666, ans=0.2 2024-09-19 19:49:37,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=734318.6666666666, ans=0.0 2024-09-19 19:49:39,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=734365.3333333334, ans=0.1 2024-09-19 19:49:49,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=734365.3333333334, ans=0.125 2024-09-19 19:50:06,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=734412.0, ans=0.1 2024-09-19 19:50:09,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=734412.0, ans=0.95 2024-09-19 19:50:20,606 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.166e+02 2.508e+02 2.923e+02 3.691e+02 5.716e+02, threshold=5.845e+02, percent-clipped=0.0 2024-09-19 19:50:37,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=734505.3333333334, ans=0.125 2024-09-19 19:50:49,356 INFO [train.py:1198] (1/2) Epoch 41, batch 2350, loss[loss=0.218, simple_loss=0.2712, pruned_loss=0.06126, ctc_loss=0.1274, cr_loss=0.4178, over 34701.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.2598, pruned_loss=0.05292, ctc_loss=0.1141, cr_loss=0.3862, over 6774330.14 frames. ], batch size: 97, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 19:51:05,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=734598.6666666666, ans=0.125 2024-09-19 19:51:07,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=734598.6666666666, ans=0.0 2024-09-19 19:51:14,535 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 19:51:20,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734645.3333333334, ans=0.1 2024-09-19 19:51:32,647 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2024-09-19 19:51:33,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=734645.3333333334, ans=0.2 2024-09-19 19:51:41,314 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=22.5 2024-09-19 19:51:46,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=734692.0, ans=0.2 2024-09-19 19:51:47,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.52 vs. limit=22.5 2024-09-19 19:51:51,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=734692.0, ans=0.125 2024-09-19 19:52:11,340 INFO [train.py:1198] (1/2) Epoch 41, batch 2400, loss[loss=0.1892, simple_loss=0.2468, pruned_loss=0.04788, ctc_loss=0.1044, cr_loss=0.3745, over 34576.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2603, pruned_loss=0.05302, ctc_loss=0.1143, cr_loss=0.3873, over 6778499.72 frames. ], batch size: 89, lr: 2.90e-03, grad_scale: 32.0 2024-09-19 19:52:38,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734832.0, ans=0.1 2024-09-19 19:52:45,454 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.74 vs. limit=10.0 2024-09-19 19:52:53,445 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.70 vs. limit=6.0 2024-09-19 19:53:10,670 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.179e+02 2.559e+02 3.039e+02 3.964e+02 7.072e+02, threshold=6.078e+02, percent-clipped=2.0 2024-09-19 19:53:11,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=734925.3333333334, ans=0.125 2024-09-19 19:53:29,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=734972.0, ans=0.125 2024-09-19 19:53:35,574 INFO [train.py:1198] (1/2) Epoch 41, batch 2450, loss[loss=0.205, simple_loss=0.2653, pruned_loss=0.05313, ctc_loss=0.1149, cr_loss=0.3873, over 34437.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.2613, pruned_loss=0.05339, ctc_loss=0.1151, cr_loss=0.3886, over 6752429.28 frames. ], batch size: 95, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 19:53:44,853 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.93 vs. limit=12.0 2024-09-19 19:53:46,230 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.93 vs. limit=12.0 2024-09-19 19:53:52,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=735065.3333333334, ans=0.0 2024-09-19 19:53:57,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=735065.3333333334, ans=0.125 2024-09-19 19:54:02,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=735065.3333333334, ans=0.1 2024-09-19 19:54:09,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.98 vs. limit=15.0 2024-09-19 19:54:33,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=735158.6666666666, ans=0.125 2024-09-19 19:54:51,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=735205.3333333334, ans=0.1 2024-09-19 19:54:59,580 INFO [train.py:1198] (1/2) Epoch 41, batch 2500, loss[loss=0.2139, simple_loss=0.2729, pruned_loss=0.05693, ctc_loss=0.1212, cr_loss=0.4197, over 34452.00 frames. ], tot_loss[loss=0.2032, simple_loss=0.2612, pruned_loss=0.05336, ctc_loss=0.115, cr_loss=0.3885, over 6763913.61 frames. ], batch size: 100, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 19:55:09,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=735252.0, ans=0.2 2024-09-19 19:55:18,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=735298.6666666666, ans=0.125 2024-09-19 19:55:36,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=735345.3333333334, ans=0.95 2024-09-19 19:55:41,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=735345.3333333334, ans=0.125 2024-09-19 19:55:41,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=735345.3333333334, ans=0.1 2024-09-19 19:55:42,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=735345.3333333334, ans=0.125 2024-09-19 19:55:43,343 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=12.0 2024-09-19 19:55:57,524 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.062e+02 2.503e+02 2.794e+02 3.368e+02 6.650e+02, threshold=5.587e+02, percent-clipped=2.0 2024-09-19 19:56:18,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=735438.6666666666, ans=0.0 2024-09-19 19:56:22,948 INFO [train.py:1198] (1/2) Epoch 41, batch 2550, loss[loss=0.1789, simple_loss=0.2319, pruned_loss=0.04576, ctc_loss=0.1022, cr_loss=0.3505, over 34141.00 frames. ], tot_loss[loss=0.203, simple_loss=0.261, pruned_loss=0.05329, ctc_loss=0.115, cr_loss=0.3884, over 6767559.43 frames. ], batch size: 78, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 19:56:23,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=735485.3333333334, ans=0.0 2024-09-19 19:56:31,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=735485.3333333334, ans=0.0 2024-09-19 19:56:46,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=735532.0, ans=0.125 2024-09-19 19:57:01,659 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.00 vs. limit=15.0 2024-09-19 19:57:04,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=735578.6666666666, ans=0.125 2024-09-19 19:57:16,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2024-09-19 19:57:27,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=735625.3333333334, ans=0.125 2024-09-19 19:57:32,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=735672.0, ans=0.125 2024-09-19 19:57:46,981 INFO [train.py:1198] (1/2) Epoch 41, batch 2600, loss[loss=0.1976, simple_loss=0.257, pruned_loss=0.05076, ctc_loss=0.1087, cr_loss=0.371, over 34390.00 frames. ], tot_loss[loss=0.2037, simple_loss=0.2616, pruned_loss=0.05355, ctc_loss=0.1155, cr_loss=0.3896, over 6761844.74 frames. ], batch size: 91, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 19:57:57,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=7.06 vs. limit=10.0 2024-09-19 19:58:25,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=735812.0, ans=0.125 2024-09-19 19:58:28,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735812.0, ans=0.1 2024-09-19 19:58:32,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=735812.0, ans=0.0 2024-09-19 19:58:46,618 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.067e+02 2.547e+02 3.017e+02 3.783e+02 6.658e+02, threshold=6.035e+02, percent-clipped=4.0 2024-09-19 19:59:04,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=735905.3333333334, ans=0.2 2024-09-19 19:59:08,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=735905.3333333334, ans=0.04949747468305833 2024-09-19 19:59:10,807 INFO [train.py:1198] (1/2) Epoch 41, batch 2650, loss[loss=0.2325, simple_loss=0.2889, pruned_loss=0.06544, ctc_loss=0.1384, cr_loss=0.4379, over 34240.00 frames. ], tot_loss[loss=0.2039, simple_loss=0.2618, pruned_loss=0.05361, ctc_loss=0.1157, cr_loss=0.3898, over 6770576.39 frames. ], batch size: 117, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 19:59:16,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=735952.0, ans=0.2 2024-09-19 20:00:09,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=736092.0, ans=0.125 2024-09-19 20:00:25,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.59 vs. limit=6.0 2024-09-19 20:00:33,998 INFO [train.py:1198] (1/2) Epoch 41, batch 2700, loss[loss=0.2032, simple_loss=0.2689, pruned_loss=0.05033, ctc_loss=0.108, cr_loss=0.3783, over 34637.00 frames. ], tot_loss[loss=0.2039, simple_loss=0.2619, pruned_loss=0.05363, ctc_loss=0.1156, cr_loss=0.3894, over 6765470.67 frames. ], batch size: 102, lr: 2.90e-03, grad_scale: 16.0 2024-09-19 20:00:37,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=736185.3333333334, ans=0.2 2024-09-19 20:00:49,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=736232.0, ans=0.1 2024-09-19 20:00:55,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=736232.0, ans=0.125 2024-09-19 20:01:00,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.72 vs. limit=15.0 2024-09-19 20:01:20,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=736278.6666666666, ans=0.0 2024-09-19 20:01:21,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736325.3333333334, ans=0.1 2024-09-19 20:01:31,356 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.210e+02 2.740e+02 3.287e+02 3.931e+02 1.335e+03, threshold=6.574e+02, percent-clipped=1.0 2024-09-19 20:01:36,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=736325.3333333334, ans=0.125 2024-09-19 20:01:39,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=736372.0, ans=0.125 2024-09-19 20:01:46,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=736372.0, ans=0.125 2024-09-19 20:01:58,420 INFO [train.py:1198] (1/2) Epoch 41, batch 2750, loss[loss=0.2066, simple_loss=0.2558, pruned_loss=0.05783, ctc_loss=0.1258, cr_loss=0.4131, over 34628.00 frames. ], tot_loss[loss=0.2029, simple_loss=0.2609, pruned_loss=0.0532, ctc_loss=0.1149, cr_loss=0.388, over 6761829.24 frames. ], batch size: 88, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 20:01:58,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=736418.6666666666, ans=0.125 2024-09-19 20:02:05,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=736418.6666666666, ans=0.1 2024-09-19 20:02:35,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.85 vs. limit=15.0 2024-09-19 20:02:54,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=736558.6666666666, ans=0.2 2024-09-19 20:03:11,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=736605.3333333334, ans=0.0 2024-09-19 20:03:20,951 INFO [train.py:1198] (1/2) Epoch 41, batch 2800, loss[loss=0.2249, simple_loss=0.2761, pruned_loss=0.06485, ctc_loss=0.1388, cr_loss=0.4079, over 23567.00 frames. ], tot_loss[loss=0.2034, simple_loss=0.2612, pruned_loss=0.05351, ctc_loss=0.1155, cr_loss=0.3889, over 6738562.47 frames. ], batch size: 244, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:03:24,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=736652.0, ans=0.2 2024-09-19 20:03:38,041 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.67 vs. limit=10.0 2024-09-19 20:03:43,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.32 vs. limit=22.5 2024-09-19 20:03:46,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=736698.6666666666, ans=0.0 2024-09-19 20:04:02,545 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:04:02,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=736745.3333333334, ans=0.125 2024-09-19 20:04:20,131 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 2.505e+02 2.906e+02 3.810e+02 6.695e+02, threshold=5.812e+02, percent-clipped=2.0 2024-09-19 20:04:20,944 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.73 vs. limit=15.0 2024-09-19 20:04:38,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=736838.6666666666, ans=0.0 2024-09-19 20:04:44,507 INFO [train.py:1198] (1/2) Epoch 41, batch 2850, loss[loss=0.2102, simple_loss=0.2614, pruned_loss=0.05904, ctc_loss=0.124, cr_loss=0.4031, over 34471.00 frames. ], tot_loss[loss=0.2037, simple_loss=0.2613, pruned_loss=0.05368, ctc_loss=0.1157, cr_loss=0.3893, over 6724060.70 frames. ], batch size: 90, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:04:46,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=736885.3333333334, ans=0.0 2024-09-19 20:04:54,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=736885.3333333334, ans=0.125 2024-09-19 20:04:54,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=736885.3333333334, ans=0.025 2024-09-19 20:04:59,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736932.0, ans=0.1 2024-09-19 20:05:44,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=737025.3333333334, ans=0.035 2024-09-19 20:05:45,401 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.43 vs. limit=15.0 2024-09-19 20:06:02,459 INFO [scaling.py:801] (1/2) Caught exception in Balancer backward: CUDA out of memory. Tried to allocate 3.77 GiB. GPU 1 has a total capacity of 79.17 GiB of which 3.62 GiB is free. Process 39810 has 75.54 GiB memory in use. Of the allocated memory 29.31 GiB is allocated by PyTorch, and 43.84 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables), size=[226, 384, 614, 19], will continue. 2024-09-19 20:06:09,327 INFO [train.py:1198] (1/2) Epoch 41, batch 2900, loss[loss=0.1981, simple_loss=0.2539, pruned_loss=0.05237, ctc_loss=0.1108, cr_loss=0.3847, over 34527.00 frames. ], tot_loss[loss=0.2049, simple_loss=0.2626, pruned_loss=0.0541, ctc_loss=0.1165, cr_loss=0.3918, over 6754547.79 frames. ], batch size: 94, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:06:27,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=737165.3333333334, ans=0.0 2024-09-19 20:06:28,132 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.03 vs. limit=15.0 2024-09-19 20:06:31,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=737165.3333333334, ans=0.125 2024-09-19 20:06:55,720 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:07:02,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=737258.6666666666, ans=0.125 2024-09-19 20:07:06,968 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.199e+02 2.705e+02 3.416e+02 4.312e+02 6.852e+02, threshold=6.832e+02, percent-clipped=2.0 2024-09-19 20:07:15,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=737305.3333333334, ans=0.125 2024-09-19 20:07:32,016 INFO [train.py:1198] (1/2) Epoch 41, batch 2950, loss[loss=0.2026, simple_loss=0.2564, pruned_loss=0.05542, ctc_loss=0.114, cr_loss=0.3815, over 34646.00 frames. ], tot_loss[loss=0.2032, simple_loss=0.2608, pruned_loss=0.05348, ctc_loss=0.1154, cr_loss=0.3888, over 6747674.49 frames. ], batch size: 88, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:07:32,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_ff2.min_abs, batch_count=737352.0, ans=0.1 2024-09-19 20:07:45,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=737352.0, ans=0.0 2024-09-19 20:07:53,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=737398.6666666666, ans=0.0 2024-09-19 20:07:55,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=737398.6666666666, ans=0.125 2024-09-19 20:07:55,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=737398.6666666666, ans=0.125 2024-09-19 20:08:15,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=737445.3333333334, ans=0.0 2024-09-19 20:08:56,337 INFO [train.py:1198] (1/2) Epoch 41, batch 3000, loss[loss=0.1964, simple_loss=0.2566, pruned_loss=0.04982, ctc_loss=0.1091, cr_loss=0.3708, over 34510.00 frames. ], tot_loss[loss=0.2027, simple_loss=0.2604, pruned_loss=0.05324, ctc_loss=0.1149, cr_loss=0.3874, over 6749750.18 frames. ], batch size: 94, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:08:56,338 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 20:09:13,128 INFO [train.py:1230] (1/2) Epoch 41, validation: loss=0.1478, simple_loss=0.2416, pruned_loss=0.02314, ctc_loss=0.03856, cr_loss=2.274e-14, over 944034.00 frames. 2024-09-19 20:09:13,129 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 20:09:15,964 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.67 vs. limit=15.0 2024-09-19 20:10:02,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=737725.3333333334, ans=0.125 2024-09-19 20:10:12,271 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.048e+02 2.626e+02 3.054e+02 3.521e+02 5.780e+02, threshold=6.109e+02, percent-clipped=0.0 2024-09-19 20:10:19,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.56 vs. limit=12.0 2024-09-19 20:10:22,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=737772.0, ans=0.125 2024-09-19 20:10:36,781 INFO [train.py:1198] (1/2) Epoch 41, batch 3050, loss[loss=0.2043, simple_loss=0.2578, pruned_loss=0.05582, ctc_loss=0.1175, cr_loss=0.3933, over 34588.00 frames. ], tot_loss[loss=0.2037, simple_loss=0.2613, pruned_loss=0.05368, ctc_loss=0.1156, cr_loss=0.3892, over 6741388.32 frames. ], batch size: 89, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:10:45,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737818.6666666666, ans=0.1 2024-09-19 20:10:50,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=737818.6666666666, ans=0.2 2024-09-19 20:11:11,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737912.0, ans=0.1 2024-09-19 20:11:22,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=737912.0, ans=0.0 2024-09-19 20:11:57,537 INFO [train.py:1198] (1/2) Epoch 41, batch 3100, loss[loss=0.222, simple_loss=0.2837, pruned_loss=0.05909, ctc_loss=0.1261, cr_loss=0.4216, over 34267.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.2611, pruned_loss=0.05358, ctc_loss=0.1155, cr_loss=0.3885, over 6741724.40 frames. ], batch size: 117, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:12:28,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=738145.3333333334, ans=0.2 2024-09-19 20:12:42,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=738145.3333333334, ans=0.125 2024-09-19 20:12:47,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=738192.0, ans=0.125 2024-09-19 20:12:55,389 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.082e+02 2.630e+02 2.990e+02 3.811e+02 6.593e+02, threshold=5.980e+02, percent-clipped=3.0 2024-09-19 20:13:06,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=738238.6666666666, ans=0.0 2024-09-19 20:13:19,983 INFO [train.py:1198] (1/2) Epoch 41, batch 3150, loss[loss=0.2188, simple_loss=0.2786, pruned_loss=0.05858, ctc_loss=0.1262, cr_loss=0.4158, over 33838.00 frames. ], tot_loss[loss=0.2036, simple_loss=0.2612, pruned_loss=0.05362, ctc_loss=0.1155, cr_loss=0.3883, over 6748372.13 frames. ], batch size: 122, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:13:30,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2024-09-19 20:13:44,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=738332.0, ans=0.0 2024-09-19 20:13:51,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.39 vs. limit=15.0 2024-09-19 20:13:59,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=738378.6666666666, ans=0.125 2024-09-19 20:14:03,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=738378.6666666666, ans=0.125 2024-09-19 20:14:05,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=738378.6666666666, ans=0.125 2024-09-19 20:14:15,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=738425.3333333334, ans=0.125 2024-09-19 20:14:21,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=738425.3333333334, ans=0.2 2024-09-19 20:14:33,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=738472.0, ans=0.0 2024-09-19 20:14:40,990 INFO [train.py:1198] (1/2) Epoch 41, batch 3200, loss[loss=0.2059, simple_loss=0.2626, pruned_loss=0.05502, ctc_loss=0.1162, cr_loss=0.3985, over 34524.00 frames. ], tot_loss[loss=0.203, simple_loss=0.2608, pruned_loss=0.0534, ctc_loss=0.1151, cr_loss=0.3877, over 6762475.85 frames. ], batch size: 94, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:14:45,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.98 vs. limit=10.0 2024-09-19 20:14:50,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=738518.6666666666, ans=0.125 2024-09-19 20:15:12,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=738612.0, ans=0.2 2024-09-19 20:15:18,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=738612.0, ans=0.0 2024-09-19 20:15:31,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=738658.6666666666, ans=0.125 2024-09-19 20:15:39,138 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.185e+02 2.477e+02 2.818e+02 3.497e+02 4.995e+02, threshold=5.636e+02, percent-clipped=0.0 2024-09-19 20:15:41,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=738658.6666666666, ans=0.025 2024-09-19 20:16:03,299 INFO [train.py:1198] (1/2) Epoch 41, batch 3250, loss[loss=0.2146, simple_loss=0.2718, pruned_loss=0.05804, ctc_loss=0.124, cr_loss=0.4127, over 34659.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.2611, pruned_loss=0.0535, ctc_loss=0.1153, cr_loss=0.3883, over 6771542.95 frames. ], batch size: 98, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:16:26,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=738798.6666666666, ans=0.0 2024-09-19 20:16:30,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=738798.6666666666, ans=0.125 2024-09-19 20:16:30,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=738798.6666666666, ans=0.125 2024-09-19 20:16:40,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=738845.3333333334, ans=0.125 2024-09-19 20:16:58,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=738892.0, ans=0.125 2024-09-19 20:17:09,695 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.33 vs. limit=6.0 2024-09-19 20:17:22,688 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=768, metric=4.29 vs. limit=12.0 2024-09-19 20:17:23,471 INFO [train.py:1198] (1/2) Epoch 41, batch 3300, loss[loss=0.2247, simple_loss=0.2851, pruned_loss=0.06053, ctc_loss=0.1297, cr_loss=0.4297, over 33043.00 frames. ], tot_loss[loss=0.2022, simple_loss=0.26, pruned_loss=0.05305, ctc_loss=0.1145, cr_loss=0.3865, over 6769223.94 frames. ], batch size: 130, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:17:27,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=738985.3333333334, ans=0.2 2024-09-19 20:17:48,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=739032.0, ans=0.0 2024-09-19 20:18:07,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=739078.6666666666, ans=0.0 2024-09-19 20:18:09,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.15 vs. limit=15.0 2024-09-19 20:18:21,116 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.131e+02 2.486e+02 2.715e+02 3.504e+02 5.251e+02, threshold=5.431e+02, percent-clipped=0.0 2024-09-19 20:18:29,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=739172.0, ans=0.1 2024-09-19 20:18:34,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=739172.0, ans=0.125 2024-09-19 20:18:40,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=739172.0, ans=0.025 2024-09-19 20:18:45,479 INFO [train.py:1198] (1/2) Epoch 41, batch 3350, loss[loss=0.2202, simple_loss=0.2748, pruned_loss=0.06176, ctc_loss=0.1284, cr_loss=0.4107, over 33836.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.2609, pruned_loss=0.05349, ctc_loss=0.1153, cr_loss=0.3888, over 6743913.13 frames. ], batch size: 122, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:18:49,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=739218.6666666666, ans=0.2 2024-09-19 20:18:49,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.31 vs. limit=15.0 2024-09-19 20:19:00,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=739265.3333333334, ans=0.125 2024-09-19 20:19:02,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=739265.3333333334, ans=0.07 2024-09-19 20:19:07,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=739265.3333333334, ans=0.0 2024-09-19 20:19:23,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=739312.0, ans=0.125 2024-09-19 20:19:31,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=739312.0, ans=0.125 2024-09-19 20:19:32,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=739358.6666666666, ans=0.0 2024-09-19 20:20:02,572 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.83 vs. limit=15.0 2024-09-19 20:20:06,168 INFO [train.py:1198] (1/2) Epoch 41, batch 3400, loss[loss=0.1724, simple_loss=0.2297, pruned_loss=0.04137, ctc_loss=0.09423, cr_loss=0.3364, over 34167.00 frames. ], tot_loss[loss=0.2036, simple_loss=0.2611, pruned_loss=0.05367, ctc_loss=0.1157, cr_loss=0.3898, over 6732804.21 frames. ], batch size: 78, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:20:16,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=739452.0, ans=0.125 2024-09-19 20:20:25,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=739498.6666666666, ans=0.0 2024-09-19 20:20:35,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=739498.6666666666, ans=0.0 2024-09-19 20:20:43,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=739545.3333333334, ans=0.125 2024-09-19 20:20:54,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=739592.0, ans=0.05 2024-09-19 20:21:03,596 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.119e+02 2.501e+02 2.920e+02 3.512e+02 7.414e+02, threshold=5.841e+02, percent-clipped=5.0 2024-09-19 20:21:23,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=739638.6666666666, ans=0.125 2024-09-19 20:21:27,487 INFO [train.py:1198] (1/2) Epoch 41, batch 3450, loss[loss=0.216, simple_loss=0.2744, pruned_loss=0.05804, ctc_loss=0.1241, cr_loss=0.4173, over 33102.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.2612, pruned_loss=0.05355, ctc_loss=0.1155, cr_loss=0.3894, over 6744734.45 frames. ], batch size: 130, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:21:27,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=739685.3333333334, ans=0.025 2024-09-19 20:21:27,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=739685.3333333334, ans=0.125 2024-09-19 20:21:40,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=739685.3333333334, ans=0.125 2024-09-19 20:22:05,500 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.31 vs. limit=12.0 2024-09-19 20:22:07,822 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=739778.6666666666, ans=0.2 2024-09-19 20:22:10,215 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2024-09-19 20:22:27,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=739825.3333333334, ans=0.2 2024-09-19 20:22:28,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=739825.3333333334, ans=0.025 2024-09-19 20:22:45,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=739872.0, ans=0.125 2024-09-19 20:22:48,443 INFO [train.py:1198] (1/2) Epoch 41, batch 3500, loss[loss=0.1768, simple_loss=0.2342, pruned_loss=0.04327, ctc_loss=0.09573, cr_loss=0.3435, over 34486.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.2612, pruned_loss=0.05356, ctc_loss=0.1154, cr_loss=0.3891, over 6747138.09 frames. ], batch size: 85, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:22:51,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=739918.6666666666, ans=0.125 2024-09-19 20:22:53,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=739918.6666666666, ans=0.0 2024-09-19 20:23:22,578 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:23:44,744 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 2.535e+02 3.062e+02 3.856e+02 8.650e+02, threshold=6.123e+02, percent-clipped=5.0 2024-09-19 20:23:50,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=740058.6666666666, ans=0.125 2024-09-19 20:23:59,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=740105.3333333334, ans=0.0 2024-09-19 20:24:08,882 INFO [train.py:1198] (1/2) Epoch 41, batch 3550, loss[loss=0.2134, simple_loss=0.2738, pruned_loss=0.0559, ctc_loss=0.123, cr_loss=0.4176, over 34352.00 frames. ], tot_loss[loss=0.2037, simple_loss=0.2613, pruned_loss=0.05366, ctc_loss=0.1156, cr_loss=0.39, over 6757901.61 frames. ], batch size: 103, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:24:23,911 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2024-09-19 20:24:31,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=740198.6666666666, ans=0.07 2024-09-19 20:24:36,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=740198.6666666666, ans=0.125 2024-09-19 20:24:36,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=740198.6666666666, ans=10.0 2024-09-19 20:24:47,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=740245.3333333334, ans=0.025 2024-09-19 20:24:53,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=740245.3333333334, ans=0.0 2024-09-19 20:24:59,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=740292.0, ans=0.0 2024-09-19 20:25:02,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=740292.0, ans=0.125 2024-09-19 20:25:02,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=740292.0, ans=0.125 2024-09-19 20:25:27,363 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.27 vs. limit=12.0 2024-09-19 20:25:29,865 INFO [train.py:1198] (1/2) Epoch 41, batch 3600, loss[loss=0.1956, simple_loss=0.2546, pruned_loss=0.04965, ctc_loss=0.109, cr_loss=0.3852, over 34493.00 frames. ], tot_loss[loss=0.2043, simple_loss=0.2619, pruned_loss=0.0539, ctc_loss=0.116, cr_loss=0.391, over 6767404.99 frames. ], batch size: 90, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:25:36,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=740385.3333333334, ans=0.125 2024-09-19 20:25:41,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=740385.3333333334, ans=0.125 2024-09-19 20:25:43,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=740385.3333333334, ans=0.0 2024-09-19 20:25:45,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.40 vs. limit=12.0 2024-09-19 20:26:08,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=740478.6666666666, ans=0.125 2024-09-19 20:26:18,507 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=22.5 2024-09-19 20:26:24,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=740525.3333333334, ans=0.0 2024-09-19 20:26:25,702 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.075e+02 2.593e+02 3.168e+02 4.184e+02 6.848e+02, threshold=6.336e+02, percent-clipped=6.0 2024-09-19 20:26:46,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=740572.0, ans=0.025 2024-09-19 20:26:50,558 INFO [train.py:1198] (1/2) Epoch 41, batch 3650, loss[loss=0.2024, simple_loss=0.2616, pruned_loss=0.05233, ctc_loss=0.1171, cr_loss=0.3808, over 34456.00 frames. ], tot_loss[loss=0.2031, simple_loss=0.2609, pruned_loss=0.05339, ctc_loss=0.1149, cr_loss=0.3884, over 6769938.77 frames. ], batch size: 110, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:27:02,787 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.35 vs. limit=15.0 2024-09-19 20:27:26,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=740712.0, ans=0.0 2024-09-19 20:27:30,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=740712.0, ans=0.0 2024-09-19 20:27:50,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=740758.6666666666, ans=0.025 2024-09-19 20:28:10,494 INFO [train.py:1198] (1/2) Epoch 41, batch 3700, loss[loss=0.2171, simple_loss=0.2778, pruned_loss=0.0579, ctc_loss=0.1217, cr_loss=0.4056, over 34628.00 frames. ], tot_loss[loss=0.2027, simple_loss=0.2608, pruned_loss=0.05314, ctc_loss=0.1145, cr_loss=0.3872, over 6784773.84 frames. ], batch size: 102, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:28:23,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=740852.0, ans=0.2 2024-09-19 20:28:38,162 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.68 vs. limit=15.0 2024-09-19 20:28:48,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=740945.3333333334, ans=0.0 2024-09-19 20:29:05,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=740992.0, ans=0.07 2024-09-19 20:29:06,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=740992.0, ans=0.1 2024-09-19 20:29:07,725 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.241e+02 2.501e+02 2.714e+02 3.224e+02 6.030e+02, threshold=5.428e+02, percent-clipped=0.0 2024-09-19 20:29:28,045 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.35 vs. limit=15.0 2024-09-19 20:29:32,953 INFO [train.py:1198] (1/2) Epoch 41, batch 3750, loss[loss=0.2136, simple_loss=0.271, pruned_loss=0.05788, ctc_loss=0.1229, cr_loss=0.399, over 34400.00 frames. ], tot_loss[loss=0.206, simple_loss=0.264, pruned_loss=0.05439, ctc_loss=0.117, cr_loss=0.3935, over 6785993.95 frames. ], batch size: 113, lr: 2.89e-03, grad_scale: 32.0 2024-09-19 20:30:00,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=741132.0, ans=0.125 2024-09-19 20:30:02,956 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.59 vs. limit=22.5 2024-09-19 20:30:29,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=741225.3333333334, ans=0.125 2024-09-19 20:30:35,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741272.0, ans=0.1 2024-09-19 20:30:41,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=741272.0, ans=10.0 2024-09-19 20:30:43,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=741272.0, ans=0.1 2024-09-19 20:30:52,984 INFO [train.py:1198] (1/2) Epoch 41, batch 3800, loss[loss=0.2364, simple_loss=0.2845, pruned_loss=0.07017, ctc_loss=0.146, cr_loss=0.4682, over 29720.00 frames. ], tot_loss[loss=0.2091, simple_loss=0.2665, pruned_loss=0.05583, ctc_loss=0.1197, cr_loss=0.3993, over 6675371.52 frames. ], batch size: 175, lr: 2.89e-03, grad_scale: 16.0 2024-09-19 20:31:05,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=741318.6666666666, ans=0.125 2024-09-19 20:31:22,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=13.58 vs. limit=15.0 2024-09-19 20:31:37,577 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:31:40,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=741412.0, ans=0.0 2024-09-19 20:31:49,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=741458.6666666666, ans=0.025 2024-09-19 20:31:51,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2024-09-19 20:31:53,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.52 vs. limit=10.0 2024-09-19 20:31:53,675 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.149e+02 2.413e+02 2.571e+02 2.822e+02 3.735e+02, threshold=5.141e+02, percent-clipped=0.0 2024-09-19 20:32:17,056 INFO [train.py:1198] (1/2) Epoch 41, batch 3850, loss[loss=0.231, simple_loss=0.2765, pruned_loss=0.06932, ctc_loss=0.147, cr_loss=0.4363, over 23024.00 frames. ], tot_loss[loss=0.2121, simple_loss=0.2684, pruned_loss=0.0575, ctc_loss=0.1231, cr_loss=0.4034, over 6250557.17 frames. ], batch size: 244, lr: 2.88e-03, grad_scale: 16.0 2024-09-19 20:32:22,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=741552.0, ans=0.5 2024-09-19 20:32:27,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=741552.0, ans=0.125 2024-09-19 20:32:27,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2024-09-19 20:32:47,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=741598.6666666666, ans=0.05 2024-09-19 20:32:50,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=741645.3333333334, ans=0.125 2024-09-19 20:33:43,782 INFO [train.py:1198] (1/2) Epoch 42, batch 0, loss[loss=0.1853, simple_loss=0.2447, pruned_loss=0.04647, ctc_loss=0.09754, cr_loss=0.3376, over 34460.00 frames. ], tot_loss[loss=0.1853, simple_loss=0.2447, pruned_loss=0.04647, ctc_loss=0.09754, cr_loss=0.3376, over 34460.00 frames. ], batch size: 85, lr: 2.85e-03, grad_scale: 32.0 2024-09-19 20:33:43,783 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 20:34:00,759 INFO [train.py:1230] (1/2) Epoch 42, validation: loss=0.1479, simple_loss=0.2425, pruned_loss=0.02285, ctc_loss=0.03849, cr_loss=2.217e-14, over 944034.00 frames. 2024-09-19 20:34:00,760 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 20:34:05,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=741673.3333333334, ans=0.0 2024-09-19 20:34:12,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=741673.3333333334, ans=0.5 2024-09-19 20:34:14,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=741673.3333333334, ans=0.07 2024-09-19 20:34:14,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=741673.3333333334, ans=10.0 2024-09-19 20:35:00,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=741813.3333333334, ans=0.1 2024-09-19 20:35:07,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=741860.0, ans=0.125 2024-09-19 20:35:23,086 INFO [train.py:1198] (1/2) Epoch 42, batch 50, loss[loss=0.1764, simple_loss=0.233, pruned_loss=0.04368, ctc_loss=0.09608, cr_loss=0.332, over 34479.00 frames. ], tot_loss[loss=0.2051, simple_loss=0.2627, pruned_loss=0.05419, ctc_loss=0.1169, cr_loss=0.3953, over 1481640.32 frames. ], batch size: 82, lr: 2.85e-03, grad_scale: 32.0 2024-09-19 20:35:39,470 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.055e+02 2.611e+02 2.841e+02 3.107e+02 7.147e+02, threshold=5.682e+02, percent-clipped=3.0 2024-09-19 20:36:30,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742093.3333333334, ans=0.1 2024-09-19 20:36:45,213 INFO [train.py:1198] (1/2) Epoch 42, batch 100, loss[loss=0.202, simple_loss=0.2592, pruned_loss=0.05325, ctc_loss=0.1145, cr_loss=0.3822, over 34581.00 frames. ], tot_loss[loss=0.2055, simple_loss=0.2634, pruned_loss=0.05426, ctc_loss=0.1169, cr_loss=0.3947, over 2630056.22 frames. ], batch size: 89, lr: 2.85e-03, grad_scale: 32.0 2024-09-19 20:36:47,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=742140.0, ans=0.0 2024-09-19 20:37:12,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.37 vs. limit=15.0 2024-09-19 20:37:16,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=742186.6666666666, ans=0.2 2024-09-19 20:37:32,080 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.15 vs. limit=12.0 2024-09-19 20:37:41,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=742280.0, ans=0.025 2024-09-19 20:37:56,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=742326.6666666666, ans=0.0 2024-09-19 20:38:10,360 INFO [train.py:1198] (1/2) Epoch 42, batch 150, loss[loss=0.1934, simple_loss=0.2472, pruned_loss=0.05083, ctc_loss=0.1122, cr_loss=0.3869, over 34469.00 frames. ], tot_loss[loss=0.2029, simple_loss=0.2612, pruned_loss=0.05308, ctc_loss=0.1147, cr_loss=0.3888, over 3558606.21 frames. ], batch size: 82, lr: 2.85e-03, grad_scale: 32.0 2024-09-19 20:38:12,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=742373.3333333334, ans=0.0 2024-09-19 20:38:26,680 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.044e+02 2.465e+02 2.835e+02 3.316e+02 6.520e+02, threshold=5.669e+02, percent-clipped=1.0 2024-09-19 20:38:38,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=742420.0, ans=0.05 2024-09-19 20:38:38,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=742420.0, ans=0.2 2024-09-19 20:38:43,940 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.01 vs. limit=15.0 2024-09-19 20:38:48,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=742466.6666666666, ans=0.125 2024-09-19 20:38:50,456 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.60 vs. limit=10.0 2024-09-19 20:38:56,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=742466.6666666666, ans=0.125 2024-09-19 20:39:16,617 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.83 vs. limit=15.0 2024-09-19 20:39:32,049 INFO [train.py:1198] (1/2) Epoch 42, batch 200, loss[loss=0.1995, simple_loss=0.2588, pruned_loss=0.05096, ctc_loss=0.1147, cr_loss=0.3859, over 32019.00 frames. ], tot_loss[loss=0.2028, simple_loss=0.2605, pruned_loss=0.05328, ctc_loss=0.1149, cr_loss=0.389, over 4272383.77 frames. ], batch size: 145, lr: 2.85e-03, grad_scale: 32.0 2024-09-19 20:40:20,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=742746.6666666666, ans=0.125 2024-09-19 20:40:41,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.06 vs. limit=15.0 2024-09-19 20:40:56,607 INFO [train.py:1198] (1/2) Epoch 42, batch 250, loss[loss=0.2239, simple_loss=0.2842, pruned_loss=0.06063, ctc_loss=0.1289, cr_loss=0.4155, over 34181.00 frames. ], tot_loss[loss=0.2026, simple_loss=0.2605, pruned_loss=0.05317, ctc_loss=0.1146, cr_loss=0.3883, over 4834349.52 frames. ], batch size: 117, lr: 2.85e-03, grad_scale: 32.0 2024-09-19 20:41:10,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=742840.0, ans=0.5 2024-09-19 20:41:16,648 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.262e+02 2.702e+02 3.227e+02 3.895e+02 9.398e+02, threshold=6.454e+02, percent-clipped=6.0 2024-09-19 20:41:46,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=742980.0, ans=0.1 2024-09-19 20:41:56,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=742980.0, ans=0.1 2024-09-19 20:42:02,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=743026.6666666666, ans=0.125 2024-09-19 20:42:04,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743026.6666666666, ans=0.1 2024-09-19 20:42:11,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=743026.6666666666, ans=0.0 2024-09-19 20:42:17,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=743026.6666666666, ans=0.125 2024-09-19 20:42:20,626 INFO [train.py:1198] (1/2) Epoch 42, batch 300, loss[loss=0.222, simple_loss=0.2796, pruned_loss=0.06029, ctc_loss=0.1283, cr_loss=0.454, over 34351.00 frames. ], tot_loss[loss=0.2025, simple_loss=0.2604, pruned_loss=0.05305, ctc_loss=0.1145, cr_loss=0.3886, over 5264251.15 frames. ], batch size: 107, lr: 2.85e-03, grad_scale: 16.0 2024-09-19 20:42:42,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=743120.0, ans=0.2 2024-09-19 20:42:48,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743120.0, ans=0.1 2024-09-19 20:42:58,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=743166.6666666666, ans=0.0 2024-09-19 20:43:18,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=743213.3333333334, ans=0.125 2024-09-19 20:43:21,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=743213.3333333334, ans=0.0 2024-09-19 20:43:22,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=743213.3333333334, ans=0.025 2024-09-19 20:43:42,354 INFO [train.py:1198] (1/2) Epoch 42, batch 350, loss[loss=0.1797, simple_loss=0.2392, pruned_loss=0.04322, ctc_loss=0.09703, cr_loss=0.3591, over 34291.00 frames. ], tot_loss[loss=0.203, simple_loss=0.2611, pruned_loss=0.05326, ctc_loss=0.1148, cr_loss=0.3892, over 5597753.18 frames. ], batch size: 83, lr: 2.85e-03, grad_scale: 16.0 2024-09-19 20:43:55,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=743306.6666666666, ans=0.0 2024-09-19 20:44:00,194 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.010e+02 2.481e+02 2.757e+02 3.316e+02 5.967e+02, threshold=5.514e+02, percent-clipped=0.0 2024-09-19 20:44:07,973 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=12.0 2024-09-19 20:44:12,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=743353.3333333334, ans=0.125 2024-09-19 20:44:23,836 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.79 vs. limit=22.5 2024-09-19 20:44:37,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=743446.6666666666, ans=0.2 2024-09-19 20:44:48,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2024-09-19 20:45:03,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=743493.3333333334, ans=0.1 2024-09-19 20:45:08,403 INFO [train.py:1198] (1/2) Epoch 42, batch 400, loss[loss=0.2051, simple_loss=0.2649, pruned_loss=0.05323, ctc_loss=0.1149, cr_loss=0.3999, over 34429.00 frames. ], tot_loss[loss=0.2027, simple_loss=0.2607, pruned_loss=0.05309, ctc_loss=0.1145, cr_loss=0.3885, over 5866384.49 frames. ], batch size: 95, lr: 2.85e-03, grad_scale: 32.0 2024-09-19 20:45:33,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=743586.6666666666, ans=0.2 2024-09-19 20:45:34,417 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=15.0 2024-09-19 20:46:08,947 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.86 vs. limit=10.0 2024-09-19 20:46:31,334 INFO [train.py:1198] (1/2) Epoch 42, batch 450, loss[loss=0.2198, simple_loss=0.2811, pruned_loss=0.05865, ctc_loss=0.1221, cr_loss=0.4182, over 34707.00 frames. ], tot_loss[loss=0.2025, simple_loss=0.2608, pruned_loss=0.05296, ctc_loss=0.1143, cr_loss=0.388, over 6056711.33 frames. ], batch size: 97, lr: 2.85e-03, grad_scale: 32.0 2024-09-19 20:46:31,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=743773.3333333334, ans=0.125 2024-09-19 20:46:43,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=743773.3333333334, ans=0.04949747468305833 2024-09-19 20:46:49,418 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.444e+02 2.748e+02 3.412e+02 5.594e+02, threshold=5.496e+02, percent-clipped=2.0 2024-09-19 20:47:11,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=743866.6666666666, ans=0.125 2024-09-19 20:47:19,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=743913.3333333334, ans=0.125 2024-09-19 20:47:31,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=743913.3333333334, ans=0.1 2024-09-19 20:47:54,213 INFO [train.py:1198] (1/2) Epoch 42, batch 500, loss[loss=0.2205, simple_loss=0.2755, pruned_loss=0.0611, ctc_loss=0.1313, cr_loss=0.4256, over 34484.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2599, pruned_loss=0.05259, ctc_loss=0.1136, cr_loss=0.3865, over 6223637.15 frames. ], batch size: 110, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 20:48:12,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=744053.3333333334, ans=0.07 2024-09-19 20:48:33,049 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=744100.0, ans=0.125 2024-09-19 20:48:55,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.63 vs. limit=15.0 2024-09-19 20:49:01,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=744146.6666666666, ans=0.0 2024-09-19 20:49:11,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=744193.3333333334, ans=0.05 2024-09-19 20:49:18,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=744193.3333333334, ans=0.125 2024-09-19 20:49:21,037 INFO [train.py:1198] (1/2) Epoch 42, batch 550, loss[loss=0.2055, simple_loss=0.2665, pruned_loss=0.053, ctc_loss=0.1131, cr_loss=0.3941, over 33929.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2596, pruned_loss=0.05247, ctc_loss=0.1134, cr_loss=0.3859, over 6331235.27 frames. ], batch size: 122, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 20:49:38,963 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.101e+02 2.517e+02 2.951e+02 3.770e+02 6.862e+02, threshold=5.901e+02, percent-clipped=6.0 2024-09-19 20:49:40,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=744286.6666666666, ans=0.125 2024-09-19 20:49:45,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=744286.6666666666, ans=0.0 2024-09-19 20:49:45,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=744286.6666666666, ans=0.125 2024-09-19 20:49:57,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744333.3333333334, ans=0.1 2024-09-19 20:50:02,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=744333.3333333334, ans=0.125 2024-09-19 20:50:23,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=744380.0, ans=0.125 2024-09-19 20:50:38,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=744426.6666666666, ans=0.125 2024-09-19 20:50:38,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=744426.6666666666, ans=0.125 2024-09-19 20:50:40,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.59 vs. limit=15.0 2024-09-19 20:50:43,360 INFO [train.py:1198] (1/2) Epoch 42, batch 600, loss[loss=0.2235, simple_loss=0.2788, pruned_loss=0.06224, ctc_loss=0.1311, cr_loss=0.4363, over 34206.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.2601, pruned_loss=0.05271, ctc_loss=0.1138, cr_loss=0.3872, over 6431937.02 frames. ], batch size: 117, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 20:50:50,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=744473.3333333334, ans=0.0 2024-09-19 20:51:15,404 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.25 vs. limit=15.0 2024-09-19 20:51:37,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=744613.3333333334, ans=0.2 2024-09-19 20:51:37,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=744613.3333333334, ans=0.1 2024-09-19 20:51:41,207 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.26 vs. limit=15.0 2024-09-19 20:52:04,892 INFO [train.py:1198] (1/2) Epoch 42, batch 650, loss[loss=0.2011, simple_loss=0.2629, pruned_loss=0.0504, ctc_loss=0.1142, cr_loss=0.3918, over 34526.00 frames. ], tot_loss[loss=0.201, simple_loss=0.2595, pruned_loss=0.05223, ctc_loss=0.113, cr_loss=0.3849, over 6523106.32 frames. ], batch size: 94, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 20:52:10,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=744706.6666666666, ans=0.0 2024-09-19 20:52:22,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=744753.3333333334, ans=0.125 2024-09-19 20:52:26,882 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.181e+02 2.559e+02 2.975e+02 3.654e+02 7.475e+02, threshold=5.949e+02, percent-clipped=1.0 2024-09-19 20:52:47,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=744800.0, ans=0.125 2024-09-19 20:52:52,371 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:52:54,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=12.0 2024-09-19 20:53:03,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=744846.6666666666, ans=0.125 2024-09-19 20:53:05,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=744846.6666666666, ans=0.0 2024-09-19 20:53:07,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=744846.6666666666, ans=0.025 2024-09-19 20:53:09,289 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.00 vs. limit=10.0 2024-09-19 20:53:17,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.49 vs. limit=22.5 2024-09-19 20:53:25,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=744893.3333333334, ans=0.125 2024-09-19 20:53:27,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=744893.3333333334, ans=0.5 2024-09-19 20:53:31,726 INFO [train.py:1198] (1/2) Epoch 42, batch 700, loss[loss=0.1946, simple_loss=0.2506, pruned_loss=0.05077, ctc_loss=0.1095, cr_loss=0.3819, over 34575.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2599, pruned_loss=0.05245, ctc_loss=0.1135, cr_loss=0.386, over 6580932.36 frames. ], batch size: 89, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 20:53:40,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744940.0, ans=0.1 2024-09-19 20:53:50,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.55 vs. limit=15.0 2024-09-19 20:53:58,022 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 20:54:03,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=745033.3333333334, ans=0.1 2024-09-19 20:54:06,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=745033.3333333334, ans=0.125 2024-09-19 20:54:19,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=745080.0, ans=0.0 2024-09-19 20:54:35,122 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.03 vs. limit=22.5 2024-09-19 20:54:53,761 INFO [train.py:1198] (1/2) Epoch 42, batch 750, loss[loss=0.1917, simple_loss=0.2527, pruned_loss=0.04787, ctc_loss=0.1028, cr_loss=0.3616, over 34404.00 frames. ], tot_loss[loss=0.201, simple_loss=0.2594, pruned_loss=0.05232, ctc_loss=0.1132, cr_loss=0.3851, over 6623687.58 frames. ], batch size: 95, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 20:55:02,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=745173.3333333334, ans=0.2 2024-09-19 20:55:11,606 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.130e+02 2.608e+02 3.046e+02 3.987e+02 5.994e+02, threshold=6.093e+02, percent-clipped=1.0 2024-09-19 20:55:21,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=745220.0, ans=0.125 2024-09-19 20:55:27,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.10 vs. limit=10.0 2024-09-19 20:55:51,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=745313.3333333334, ans=10.0 2024-09-19 20:55:59,612 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2024-09-19 20:56:04,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=745360.0, ans=0.1 2024-09-19 20:56:17,580 INFO [train.py:1198] (1/2) Epoch 42, batch 800, loss[loss=0.1792, simple_loss=0.2384, pruned_loss=0.04303, ctc_loss=0.1001, cr_loss=0.3493, over 34450.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2596, pruned_loss=0.0525, ctc_loss=0.1134, cr_loss=0.3856, over 6660592.10 frames. ], batch size: 85, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 20:56:55,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=745500.0, ans=0.0 2024-09-19 20:56:57,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=745500.0, ans=0.035 2024-09-19 20:57:25,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=745593.3333333334, ans=0.0 2024-09-19 20:57:36,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=745593.3333333334, ans=0.0 2024-09-19 20:57:37,197 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.40 vs. limit=15.0 2024-09-19 20:57:39,452 INFO [train.py:1198] (1/2) Epoch 42, batch 850, loss[loss=0.2164, simple_loss=0.2778, pruned_loss=0.05698, ctc_loss=0.1246, cr_loss=0.4044, over 34357.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2596, pruned_loss=0.05258, ctc_loss=0.1136, cr_loss=0.3857, over 6693163.35 frames. ], batch size: 103, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 20:57:42,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=745640.0, ans=0.125 2024-09-19 20:57:57,132 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.150e+02 2.523e+02 2.997e+02 4.147e+02 6.312e+02, threshold=5.993e+02, percent-clipped=1.0 2024-09-19 20:58:24,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.15 vs. limit=12.0 2024-09-19 20:58:30,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=745780.0, ans=0.125 2024-09-19 20:58:35,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=745780.0, ans=0.1 2024-09-19 20:58:51,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=745826.6666666666, ans=0.0 2024-09-19 20:58:52,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=745826.6666666666, ans=0.1 2024-09-19 20:59:01,899 INFO [train.py:1198] (1/2) Epoch 42, batch 900, loss[loss=0.1806, simple_loss=0.2409, pruned_loss=0.0433, ctc_loss=0.09734, cr_loss=0.3562, over 34492.00 frames. ], tot_loss[loss=0.202, simple_loss=0.26, pruned_loss=0.05284, ctc_loss=0.1141, cr_loss=0.3867, over 6698981.49 frames. ], batch size: 85, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 20:59:13,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=745873.3333333334, ans=0.1 2024-09-19 20:59:15,205 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=745873.3333333334, ans=0.2 2024-09-19 20:59:31,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=745920.0, ans=0.0 2024-09-19 20:59:34,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=745966.6666666666, ans=0.125 2024-09-19 20:59:38,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=745966.6666666666, ans=0.0 2024-09-19 20:59:38,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=745966.6666666666, ans=0.1 2024-09-19 20:59:57,466 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.40 vs. limit=15.0 2024-09-19 21:00:13,381 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:00:16,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=746060.0, ans=0.0 2024-09-19 21:00:23,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=746060.0, ans=0.125 2024-09-19 21:00:27,537 INFO [train.py:1198] (1/2) Epoch 42, batch 950, loss[loss=0.1974, simple_loss=0.2537, pruned_loss=0.05169, ctc_loss=0.1116, cr_loss=0.3837, over 34687.00 frames. ], tot_loss[loss=0.2021, simple_loss=0.2601, pruned_loss=0.05288, ctc_loss=0.1141, cr_loss=0.3866, over 6700076.77 frames. ], batch size: 87, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 21:00:33,355 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=15.0 2024-09-19 21:00:47,412 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.624e+02 3.067e+02 3.661e+02 5.879e+02, threshold=6.135e+02, percent-clipped=0.0 2024-09-19 21:01:07,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=746200.0, ans=0.025 2024-09-19 21:01:22,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746246.6666666666, ans=0.1 2024-09-19 21:01:34,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=746293.3333333334, ans=0.025 2024-09-19 21:01:47,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=746293.3333333334, ans=0.125 2024-09-19 21:01:50,101 INFO [train.py:1198] (1/2) Epoch 42, batch 1000, loss[loss=0.1982, simple_loss=0.2567, pruned_loss=0.05113, ctc_loss=0.1098, cr_loss=0.3869, over 34487.00 frames. ], tot_loss[loss=0.2029, simple_loss=0.2609, pruned_loss=0.0532, ctc_loss=0.1147, cr_loss=0.3876, over 6695013.26 frames. ], batch size: 90, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 21:02:15,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=746386.6666666666, ans=0.125 2024-09-19 21:02:20,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746386.6666666666, ans=0.1 2024-09-19 21:02:33,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=746433.3333333334, ans=0.125 2024-09-19 21:02:51,278 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=746480.0, ans=0.125 2024-09-19 21:03:12,535 INFO [train.py:1198] (1/2) Epoch 42, batch 1050, loss[loss=0.2099, simple_loss=0.2725, pruned_loss=0.05479, ctc_loss=0.1143, cr_loss=0.3725, over 34560.00 frames. ], tot_loss[loss=0.2026, simple_loss=0.2605, pruned_loss=0.05318, ctc_loss=0.1145, cr_loss=0.387, over 6704623.37 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 21:03:15,055 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.61 vs. limit=22.5 2024-09-19 21:03:33,804 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.111e+02 2.541e+02 2.889e+02 3.308e+02 6.913e+02, threshold=5.778e+02, percent-clipped=1.0 2024-09-19 21:04:04,064 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=5.11 vs. limit=12.0 2024-09-19 21:04:12,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.61 vs. limit=5.0 2024-09-19 21:04:19,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=746713.3333333334, ans=0.0 2024-09-19 21:04:44,294 INFO [train.py:1198] (1/2) Epoch 42, batch 1100, loss[loss=0.1852, simple_loss=0.2428, pruned_loss=0.04638, ctc_loss=0.1032, cr_loss=0.3544, over 34354.00 frames. ], tot_loss[loss=0.2021, simple_loss=0.2599, pruned_loss=0.05297, ctc_loss=0.1142, cr_loss=0.3859, over 6717710.57 frames. ], batch size: 91, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 21:04:46,895 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.20 vs. limit=10.0 2024-09-19 21:04:54,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=746806.6666666666, ans=0.2 2024-09-19 21:05:13,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.12 vs. limit=15.0 2024-09-19 21:05:25,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=746900.0, ans=0.0 2024-09-19 21:05:30,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=746900.0, ans=0.1 2024-09-19 21:05:50,405 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:06:06,626 INFO [train.py:1198] (1/2) Epoch 42, batch 1150, loss[loss=0.2032, simple_loss=0.261, pruned_loss=0.05358, ctc_loss=0.1133, cr_loss=0.3892, over 34769.00 frames. ], tot_loss[loss=0.2021, simple_loss=0.2598, pruned_loss=0.05303, ctc_loss=0.1143, cr_loss=0.3863, over 6715567.61 frames. ], batch size: 92, lr: 2.84e-03, grad_scale: 16.0 2024-09-19 21:06:07,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.47 vs. limit=22.5 2024-09-19 21:06:23,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=747086.6666666666, ans=0.1 2024-09-19 21:06:26,414 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.952e+02 2.472e+02 2.751e+02 3.271e+02 5.011e+02, threshold=5.502e+02, percent-clipped=0.0 2024-09-19 21:06:43,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=747133.3333333334, ans=0.0 2024-09-19 21:06:52,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.75 vs. limit=22.5 2024-09-19 21:06:55,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=747180.0, ans=0.05 2024-09-19 21:07:27,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=747226.6666666666, ans=0.04949747468305833 2024-09-19 21:07:29,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=747273.3333333334, ans=0.125 2024-09-19 21:07:29,976 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=22.5 2024-09-19 21:07:30,857 INFO [train.py:1198] (1/2) Epoch 42, batch 1200, loss[loss=0.2016, simple_loss=0.2641, pruned_loss=0.05085, ctc_loss=0.1126, cr_loss=0.3731, over 34569.00 frames. ], tot_loss[loss=0.2029, simple_loss=0.2607, pruned_loss=0.05326, ctc_loss=0.1149, cr_loss=0.3879, over 6707564.73 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 21:07:52,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=15.0 2024-09-19 21:08:03,762 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2024-09-19 21:08:12,035 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.19 vs. limit=15.0 2024-09-19 21:08:13,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=747366.6666666666, ans=0.025 2024-09-19 21:08:13,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747366.6666666666, ans=0.1 2024-09-19 21:08:14,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=747366.6666666666, ans=0.125 2024-09-19 21:08:26,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=747413.3333333334, ans=0.125 2024-09-19 21:08:31,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=747413.3333333334, ans=0.0 2024-09-19 21:08:44,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn1.whiten, num_groups=1, num_channels=768, metric=12.43 vs. limit=22.5 2024-09-19 21:08:55,437 INFO [train.py:1198] (1/2) Epoch 42, batch 1250, loss[loss=0.2094, simple_loss=0.2721, pruned_loss=0.05416, ctc_loss=0.1163, cr_loss=0.3801, over 34360.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2615, pruned_loss=0.05365, ctc_loss=0.1156, cr_loss=0.3898, over 6741146.65 frames. ], batch size: 107, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 21:08:55,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=747506.6666666666, ans=0.0 2024-09-19 21:09:00,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=747506.6666666666, ans=0.125 2024-09-19 21:09:15,298 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.214e+02 2.576e+02 2.938e+02 3.936e+02 6.734e+02, threshold=5.875e+02, percent-clipped=2.0 2024-09-19 21:09:29,665 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=5.47 vs. limit=15.0 2024-09-19 21:09:55,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=747646.6666666666, ans=0.0 2024-09-19 21:10:07,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=747693.3333333334, ans=0.125 2024-09-19 21:10:18,022 INFO [train.py:1198] (1/2) Epoch 42, batch 1300, loss[loss=0.201, simple_loss=0.2643, pruned_loss=0.05034, ctc_loss=0.1099, cr_loss=0.377, over 33129.00 frames. ], tot_loss[loss=0.2028, simple_loss=0.2605, pruned_loss=0.05332, ctc_loss=0.1149, cr_loss=0.3882, over 6744793.32 frames. ], batch size: 130, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 21:11:01,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=747833.3333333334, ans=0.125 2024-09-19 21:11:21,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=747880.0, ans=0.125 2024-09-19 21:11:33,460 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:11:44,309 INFO [train.py:1198] (1/2) Epoch 42, batch 1350, loss[loss=0.2012, simple_loss=0.2572, pruned_loss=0.05307, ctc_loss=0.115, cr_loss=0.4018, over 34539.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2601, pruned_loss=0.05316, ctc_loss=0.1145, cr_loss=0.3878, over 6762675.74 frames. ], batch size: 94, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 21:11:47,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=747973.3333333334, ans=0.125 2024-09-19 21:11:59,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=748020.0, ans=0.2 2024-09-19 21:12:03,826 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.064e+02 2.529e+02 3.132e+02 3.801e+02 6.710e+02, threshold=6.263e+02, percent-clipped=2.0 2024-09-19 21:12:05,099 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.57 vs. limit=10.0 2024-09-19 21:12:10,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=748020.0, ans=0.125 2024-09-19 21:12:30,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=748066.6666666666, ans=0.0 2024-09-19 21:12:56,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=748160.0, ans=0.0 2024-09-19 21:13:05,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=748206.6666666666, ans=0.125 2024-09-19 21:13:06,422 INFO [train.py:1198] (1/2) Epoch 42, batch 1400, loss[loss=0.1792, simple_loss=0.2335, pruned_loss=0.04589, ctc_loss=0.09707, cr_loss=0.344, over 34270.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2598, pruned_loss=0.05291, ctc_loss=0.1142, cr_loss=0.3868, over 6776038.09 frames. ], batch size: 80, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 21:13:42,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=748300.0, ans=0.125 2024-09-19 21:13:49,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=748300.0, ans=0.1 2024-09-19 21:13:55,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=748346.6666666666, ans=0.125 2024-09-19 21:14:00,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=748346.6666666666, ans=0.0 2024-09-19 21:14:16,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.48 vs. limit=22.5 2024-09-19 21:14:28,077 INFO [train.py:1198] (1/2) Epoch 42, batch 1450, loss[loss=0.2165, simple_loss=0.2731, pruned_loss=0.05948, ctc_loss=0.1244, cr_loss=0.4029, over 34477.00 frames. ], tot_loss[loss=0.2026, simple_loss=0.2607, pruned_loss=0.05309, ctc_loss=0.1146, cr_loss=0.3881, over 6772369.97 frames. ], batch size: 110, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 21:14:31,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=748440.0, ans=0.125 2024-09-19 21:14:36,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=748440.0, ans=0.0 2024-09-19 21:14:49,736 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.127e+02 2.611e+02 3.050e+02 3.748e+02 8.274e+02, threshold=6.100e+02, percent-clipped=2.0 2024-09-19 21:15:03,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748533.3333333334, ans=0.1 2024-09-19 21:15:10,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=748533.3333333334, ans=0.125 2024-09-19 21:15:15,920 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=22.5 2024-09-19 21:15:43,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=748626.6666666666, ans=0.1 2024-09-19 21:15:51,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=748626.6666666666, ans=0.0 2024-09-19 21:15:54,422 INFO [train.py:1198] (1/2) Epoch 42, batch 1500, loss[loss=0.2047, simple_loss=0.2677, pruned_loss=0.05227, ctc_loss=0.1118, cr_loss=0.3668, over 34447.00 frames. ], tot_loss[loss=0.2031, simple_loss=0.2611, pruned_loss=0.05327, ctc_loss=0.1149, cr_loss=0.3887, over 6773155.89 frames. ], batch size: 100, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 21:15:58,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=748673.3333333334, ans=0.1 2024-09-19 21:16:01,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=748673.3333333334, ans=0.1 2024-09-19 21:16:39,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=748766.6666666666, ans=0.025 2024-09-19 21:16:39,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=748766.6666666666, ans=0.0 2024-09-19 21:16:46,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=748813.3333333334, ans=0.125 2024-09-19 21:17:02,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=748860.0, ans=0.125 2024-09-19 21:17:09,348 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:17:17,224 INFO [train.py:1198] (1/2) Epoch 42, batch 1550, loss[loss=0.2239, simple_loss=0.2813, pruned_loss=0.06217, ctc_loss=0.128, cr_loss=0.416, over 34390.00 frames. ], tot_loss[loss=0.2034, simple_loss=0.2612, pruned_loss=0.05346, ctc_loss=0.1153, cr_loss=0.3895, over 6746600.82 frames. ], batch size: 105, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 21:17:25,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=748906.6666666666, ans=0.125 2024-09-19 21:17:37,017 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.138e+02 2.454e+02 2.849e+02 3.693e+02 7.241e+02, threshold=5.697e+02, percent-clipped=3.0 2024-09-19 21:17:58,650 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=749000.0, ans=0.125 2024-09-19 21:18:12,554 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.48 vs. limit=22.5 2024-09-19 21:18:34,973 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=749093.3333333334, ans=0.2 2024-09-19 21:18:41,127 INFO [train.py:1198] (1/2) Epoch 42, batch 1600, loss[loss=0.2049, simple_loss=0.2625, pruned_loss=0.05409, ctc_loss=0.1165, cr_loss=0.3986, over 34573.00 frames. ], tot_loss[loss=0.2032, simple_loss=0.261, pruned_loss=0.05342, ctc_loss=0.1152, cr_loss=0.3892, over 6726412.50 frames. ], batch size: 99, lr: 2.84e-03, grad_scale: 32.0 2024-09-19 21:19:20,825 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2024-09-19 21:19:21,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=749233.3333333334, ans=0.125 2024-09-19 21:19:26,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=749233.3333333334, ans=0.0 2024-09-19 21:19:26,695 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:19:34,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=749280.0, ans=0.1 2024-09-19 21:19:37,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=749280.0, ans=0.1 2024-09-19 21:19:54,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=749326.6666666666, ans=0.0 2024-09-19 21:20:05,422 INFO [train.py:1198] (1/2) Epoch 42, batch 1650, loss[loss=0.2078, simple_loss=0.2695, pruned_loss=0.05355, ctc_loss=0.1145, cr_loss=0.4021, over 34379.00 frames. ], tot_loss[loss=0.2026, simple_loss=0.2605, pruned_loss=0.05312, ctc_loss=0.1147, cr_loss=0.3881, over 6720051.75 frames. ], batch size: 103, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:20:11,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.16 vs. limit=15.0 2024-09-19 21:20:24,900 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.167e+02 2.478e+02 2.918e+02 3.386e+02 6.357e+02, threshold=5.835e+02, percent-clipped=3.0 2024-09-19 21:20:47,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.32 vs. limit=15.0 2024-09-19 21:20:58,758 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.74 vs. limit=12.0 2024-09-19 21:21:27,516 INFO [train.py:1198] (1/2) Epoch 42, batch 1700, loss[loss=0.1669, simple_loss=0.2223, pruned_loss=0.04051, ctc_loss=0.0884, cr_loss=0.3211, over 34289.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2601, pruned_loss=0.05279, ctc_loss=0.114, cr_loss=0.3869, over 6745120.73 frames. ], batch size: 80, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:21:32,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=749606.6666666666, ans=0.125 2024-09-19 21:21:37,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=749606.6666666666, ans=0.07 2024-09-19 21:21:44,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=749653.3333333334, ans=0.125 2024-09-19 21:22:24,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=749746.6666666666, ans=0.1 2024-09-19 21:22:29,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=749746.6666666666, ans=0.125 2024-09-19 21:22:32,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=749746.6666666666, ans=0.5 2024-09-19 21:22:35,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=749793.3333333334, ans=0.025 2024-09-19 21:22:54,091 INFO [train.py:1198] (1/2) Epoch 42, batch 1750, loss[loss=0.1698, simple_loss=0.2258, pruned_loss=0.04123, ctc_loss=0.08963, cr_loss=0.3324, over 34225.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2597, pruned_loss=0.05266, ctc_loss=0.1137, cr_loss=0.3861, over 6754506.92 frames. ], batch size: 78, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:23:13,901 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.101e+02 2.532e+02 2.875e+02 3.315e+02 6.540e+02, threshold=5.751e+02, percent-clipped=2.0 2024-09-19 21:23:21,707 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2024-09-19 21:23:33,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=749933.3333333334, ans=0.025 2024-09-19 21:23:35,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=749933.3333333334, ans=0.125 2024-09-19 21:23:41,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=749980.0, ans=0.02 2024-09-19 21:23:59,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=750026.6666666666, ans=0.125 2024-09-19 21:24:08,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=750026.6666666666, ans=0.125 2024-09-19 21:24:11,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=750026.6666666666, ans=0.125 2024-09-19 21:24:15,883 INFO [train.py:1198] (1/2) Epoch 42, batch 1800, loss[loss=0.1962, simple_loss=0.261, pruned_loss=0.0479, ctc_loss=0.1055, cr_loss=0.361, over 34683.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2597, pruned_loss=0.05255, ctc_loss=0.1136, cr_loss=0.3859, over 6757740.83 frames. ], batch size: 97, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:24:23,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=750073.3333333334, ans=0.2 2024-09-19 21:24:33,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.06 vs. limit=22.5 2024-09-19 21:24:34,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=750120.0, ans=0.125 2024-09-19 21:24:46,422 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.45 vs. limit=12.0 2024-09-19 21:25:12,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=750213.3333333334, ans=0.125 2024-09-19 21:25:13,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=750213.3333333334, ans=0.125 2024-09-19 21:25:38,310 INFO [train.py:1198] (1/2) Epoch 42, batch 1850, loss[loss=0.1981, simple_loss=0.2608, pruned_loss=0.04972, ctc_loss=0.1063, cr_loss=0.3658, over 34491.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2598, pruned_loss=0.05263, ctc_loss=0.1137, cr_loss=0.3866, over 6765168.85 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:25:45,745 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.21 vs. limit=15.0 2024-09-19 21:25:56,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=750353.3333333334, ans=0.0 2024-09-19 21:25:56,664 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:25:59,412 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.176e+02 2.589e+02 3.168e+02 4.138e+02 7.265e+02, threshold=6.335e+02, percent-clipped=2.0 2024-09-19 21:26:28,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=750446.6666666666, ans=0.125 2024-09-19 21:26:34,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=750446.6666666666, ans=0.125 2024-09-19 21:26:55,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=750493.3333333334, ans=0.125 2024-09-19 21:26:56,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=750493.3333333334, ans=0.05 2024-09-19 21:27:04,472 INFO [train.py:1198] (1/2) Epoch 42, batch 1900, loss[loss=0.2242, simple_loss=0.2811, pruned_loss=0.06209, ctc_loss=0.1318, cr_loss=0.4186, over 34374.00 frames. ], tot_loss[loss=0.2027, simple_loss=0.2609, pruned_loss=0.05303, ctc_loss=0.1145, cr_loss=0.388, over 6773508.44 frames. ], batch size: 103, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:27:13,427 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=12.0 2024-09-19 21:27:14,788 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=750540.0, ans=0.0 2024-09-19 21:27:19,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=750586.6666666666, ans=0.2 2024-09-19 21:27:24,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=750586.6666666666, ans=0.125 2024-09-19 21:27:34,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=750586.6666666666, ans=0.125 2024-09-19 21:27:45,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=750633.3333333334, ans=0.2 2024-09-19 21:28:26,405 INFO [train.py:1198] (1/2) Epoch 42, batch 1950, loss[loss=0.196, simple_loss=0.2503, pruned_loss=0.05157, ctc_loss=0.1148, cr_loss=0.3923, over 34744.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.2617, pruned_loss=0.05332, ctc_loss=0.1151, cr_loss=0.3898, over 6790685.86 frames. ], batch size: 92, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:28:46,237 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.166e+02 2.514e+02 2.719e+02 3.496e+02 1.114e+03, threshold=5.438e+02, percent-clipped=1.0 2024-09-19 21:28:59,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=750866.6666666666, ans=0.0 2024-09-19 21:29:02,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=750866.6666666666, ans=0.125 2024-09-19 21:29:03,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.30 vs. limit=22.5 2024-09-19 21:29:21,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=750913.3333333334, ans=0.125 2024-09-19 21:29:40,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=750960.0, ans=0.2 2024-09-19 21:29:47,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=750960.0, ans=0.0 2024-09-19 21:29:50,338 INFO [train.py:1198] (1/2) Epoch 42, batch 2000, loss[loss=0.191, simple_loss=0.2427, pruned_loss=0.05114, ctc_loss=0.1093, cr_loss=0.3787, over 34215.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.262, pruned_loss=0.05345, ctc_loss=0.1154, cr_loss=0.3898, over 6765939.20 frames. ], batch size: 78, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:30:14,274 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.39 vs. limit=15.0 2024-09-19 21:30:39,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=751100.0, ans=0.125 2024-09-19 21:30:44,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=751146.6666666666, ans=0.025 2024-09-19 21:31:15,393 INFO [train.py:1198] (1/2) Epoch 42, batch 2050, loss[loss=0.185, simple_loss=0.2391, pruned_loss=0.0478, ctc_loss=0.1034, cr_loss=0.3649, over 34480.00 frames. ], tot_loss[loss=0.2027, simple_loss=0.2609, pruned_loss=0.05309, ctc_loss=0.1146, cr_loss=0.3879, over 6757457.08 frames. ], batch size: 82, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:31:35,304 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.498e+02 2.809e+02 3.282e+02 7.135e+02, threshold=5.618e+02, percent-clipped=3.0 2024-09-19 21:31:45,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=751286.6666666666, ans=0.0 2024-09-19 21:31:52,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=751333.3333333334, ans=0.125 2024-09-19 21:31:57,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=751333.3333333334, ans=0.125 2024-09-19 21:31:58,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=751333.3333333334, ans=0.125 2024-09-19 21:32:10,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=751380.0, ans=0.125 2024-09-19 21:32:12,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=751380.0, ans=0.125 2024-09-19 21:32:16,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.50 vs. limit=12.0 2024-09-19 21:32:27,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2024-09-19 21:32:28,702 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=751426.6666666666, ans=0.0 2024-09-19 21:32:38,280 INFO [train.py:1198] (1/2) Epoch 42, batch 2100, loss[loss=0.212, simple_loss=0.2692, pruned_loss=0.05707, ctc_loss=0.1209, cr_loss=0.4119, over 34551.00 frames. ], tot_loss[loss=0.2026, simple_loss=0.2606, pruned_loss=0.05307, ctc_loss=0.1145, cr_loss=0.3873, over 6772010.92 frames. ], batch size: 94, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:32:40,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=751473.3333333334, ans=0.0 2024-09-19 21:32:51,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=751473.3333333334, ans=0.0 2024-09-19 21:32:59,637 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:33:35,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=751613.3333333334, ans=0.0 2024-09-19 21:33:55,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=751660.0, ans=0.2 2024-09-19 21:34:01,779 INFO [train.py:1198] (1/2) Epoch 42, batch 2150, loss[loss=0.2004, simple_loss=0.2568, pruned_loss=0.05284, ctc_loss=0.1143, cr_loss=0.385, over 34356.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2598, pruned_loss=0.05261, ctc_loss=0.1136, cr_loss=0.3857, over 6790830.23 frames. ], batch size: 91, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:34:14,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.94 vs. limit=22.5 2024-09-19 21:34:23,986 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.111e+02 2.558e+02 2.906e+02 3.767e+02 6.895e+02, threshold=5.812e+02, percent-clipped=5.0 2024-09-19 21:34:29,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=751753.3333333334, ans=0.0 2024-09-19 21:34:34,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=751753.3333333334, ans=0.0 2024-09-19 21:34:45,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=751800.0, ans=0.125 2024-09-19 21:34:46,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=751800.0, ans=15.0 2024-09-19 21:34:55,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=751846.6666666666, ans=0.025 2024-09-19 21:34:58,935 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:34:59,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=751846.6666666666, ans=0.125 2024-09-19 21:35:15,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=751893.3333333334, ans=0.125 2024-09-19 21:35:26,446 INFO [train.py:1198] (1/2) Epoch 42, batch 2200, loss[loss=0.197, simple_loss=0.2613, pruned_loss=0.04865, ctc_loss=0.1049, cr_loss=0.3619, over 34450.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.2601, pruned_loss=0.05274, ctc_loss=0.1139, cr_loss=0.3863, over 6785137.49 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:35:28,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=751940.0, ans=0.2 2024-09-19 21:35:33,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.89 vs. limit=15.0 2024-09-19 21:35:45,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2024-09-19 21:35:46,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=751986.6666666666, ans=0.125 2024-09-19 21:36:35,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=752126.6666666666, ans=0.0 2024-09-19 21:36:37,826 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.46 vs. limit=6.0 2024-09-19 21:36:40,768 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2024-09-19 21:36:43,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=752126.6666666666, ans=0.125 2024-09-19 21:36:48,281 INFO [train.py:1198] (1/2) Epoch 42, batch 2250, loss[loss=0.2087, simple_loss=0.2665, pruned_loss=0.0559, ctc_loss=0.1159, cr_loss=0.3994, over 34410.00 frames. ], tot_loss[loss=0.2022, simple_loss=0.2603, pruned_loss=0.05287, ctc_loss=0.1141, cr_loss=0.3866, over 6781517.54 frames. ], batch size: 95, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:37:09,680 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.606e+02 2.972e+02 3.923e+02 5.798e+02, threshold=5.944e+02, percent-clipped=0.0 2024-09-19 21:37:09,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=752220.0, ans=0.015 2024-09-19 21:37:33,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=752266.6666666666, ans=0.0 2024-09-19 21:37:40,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=752313.3333333334, ans=0.125 2024-09-19 21:37:49,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=752313.3333333334, ans=0.125 2024-09-19 21:37:50,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=22.5 2024-09-19 21:38:14,467 INFO [train.py:1198] (1/2) Epoch 42, batch 2300, loss[loss=0.1859, simple_loss=0.2412, pruned_loss=0.04775, ctc_loss=0.1049, cr_loss=0.3516, over 34290.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2593, pruned_loss=0.05254, ctc_loss=0.1135, cr_loss=0.3855, over 6769003.42 frames. ], batch size: 83, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:39:34,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=752640.0, ans=0.125 2024-09-19 21:39:36,061 INFO [train.py:1198] (1/2) Epoch 42, batch 2350, loss[loss=0.2057, simple_loss=0.2623, pruned_loss=0.0546, ctc_loss=0.1192, cr_loss=0.4032, over 34696.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2593, pruned_loss=0.05258, ctc_loss=0.1136, cr_loss=0.3861, over 6773617.95 frames. ], batch size: 97, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:39:38,577 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.73 vs. limit=6.0 2024-09-19 21:39:47,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=752640.0, ans=0.125 2024-09-19 21:39:55,735 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.458e+02 2.857e+02 3.476e+02 5.156e+02, threshold=5.713e+02, percent-clipped=0.0 2024-09-19 21:40:09,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=752733.3333333334, ans=0.0 2024-09-19 21:40:55,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=752826.6666666666, ans=0.1 2024-09-19 21:41:00,293 INFO [train.py:1198] (1/2) Epoch 42, batch 2400, loss[loss=0.1999, simple_loss=0.2549, pruned_loss=0.05323, ctc_loss=0.1145, cr_loss=0.391, over 34572.00 frames. ], tot_loss[loss=0.202, simple_loss=0.26, pruned_loss=0.05289, ctc_loss=0.1142, cr_loss=0.3876, over 6777873.19 frames. ], batch size: 89, lr: 2.83e-03, grad_scale: 32.0 2024-09-19 21:41:11,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=752873.3333333334, ans=0.2 2024-09-19 21:41:22,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=752920.0, ans=0.125 2024-09-19 21:41:25,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=752920.0, ans=0.125 2024-09-19 21:41:25,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=752920.0, ans=0.1 2024-09-19 21:41:32,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=752966.6666666666, ans=0.025 2024-09-19 21:42:10,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=753060.0, ans=0.1 2024-09-19 21:42:20,694 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.75 vs. limit=15.0 2024-09-19 21:42:24,742 INFO [train.py:1198] (1/2) Epoch 42, batch 2450, loss[loss=0.2099, simple_loss=0.2676, pruned_loss=0.05586, ctc_loss=0.1208, cr_loss=0.4108, over 34432.00 frames. ], tot_loss[loss=0.203, simple_loss=0.261, pruned_loss=0.05325, ctc_loss=0.115, cr_loss=0.3893, over 6750249.94 frames. ], batch size: 95, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 21:42:26,851 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.327e-02 2024-09-19 21:42:37,361 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.21 vs. limit=22.5 2024-09-19 21:42:45,920 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.178e+02 2.727e+02 3.238e+02 4.231e+02 8.266e+02, threshold=6.476e+02, percent-clipped=3.0 2024-09-19 21:43:04,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=753200.0, ans=0.025 2024-09-19 21:43:13,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=753246.6666666666, ans=0.125 2024-09-19 21:43:16,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=753246.6666666666, ans=0.0 2024-09-19 21:43:33,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2024-09-19 21:43:40,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=753293.3333333334, ans=0.125 2024-09-19 21:43:47,189 INFO [train.py:1198] (1/2) Epoch 42, batch 2500, loss[loss=0.2142, simple_loss=0.2717, pruned_loss=0.05829, ctc_loss=0.1202, cr_loss=0.4012, over 34466.00 frames. ], tot_loss[loss=0.2031, simple_loss=0.261, pruned_loss=0.05331, ctc_loss=0.115, cr_loss=0.3887, over 6761828.98 frames. ], batch size: 100, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 21:43:50,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=753340.0, ans=0.125 2024-09-19 21:44:13,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=753386.6666666666, ans=0.125 2024-09-19 21:44:38,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=753480.0, ans=0.125 2024-09-19 21:44:55,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=753526.6666666666, ans=0.2 2024-09-19 21:44:59,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=753526.6666666666, ans=0.125 2024-09-19 21:45:00,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=753526.6666666666, ans=0.125 2024-09-19 21:45:08,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=753526.6666666666, ans=0.0 2024-09-19 21:45:08,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=753526.6666666666, ans=0.09899494936611666 2024-09-19 21:45:11,361 INFO [train.py:1198] (1/2) Epoch 42, batch 2550, loss[loss=0.1713, simple_loss=0.2269, pruned_loss=0.04178, ctc_loss=0.09391, cr_loss=0.3354, over 34164.00 frames. ], tot_loss[loss=0.2031, simple_loss=0.261, pruned_loss=0.05333, ctc_loss=0.115, cr_loss=0.3893, over 6765821.93 frames. ], batch size: 78, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 21:45:21,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=753573.3333333334, ans=0.125 2024-09-19 21:45:28,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=753620.0, ans=0.0 2024-09-19 21:45:32,711 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.157e+02 2.593e+02 2.929e+02 3.725e+02 7.329e+02, threshold=5.859e+02, percent-clipped=4.0 2024-09-19 21:45:34,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=753620.0, ans=0.125 2024-09-19 21:45:43,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=753620.0, ans=0.125 2024-09-19 21:45:45,681 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.09 vs. limit=15.0 2024-09-19 21:45:54,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=753666.6666666666, ans=0.125 2024-09-19 21:45:58,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=753666.6666666666, ans=0.125 2024-09-19 21:46:03,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=753713.3333333334, ans=15.0 2024-09-19 21:46:04,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=753713.3333333334, ans=0.125 2024-09-19 21:46:24,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=753760.0, ans=0.125 2024-09-19 21:46:29,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=753760.0, ans=0.0 2024-09-19 21:46:35,894 INFO [train.py:1198] (1/2) Epoch 42, batch 2600, loss[loss=0.1948, simple_loss=0.252, pruned_loss=0.05, ctc_loss=0.1108, cr_loss=0.3879, over 34350.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.2612, pruned_loss=0.05339, ctc_loss=0.1151, cr_loss=0.3892, over 6759103.92 frames. ], batch size: 91, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 21:46:49,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=753806.6666666666, ans=0.125 2024-09-19 21:46:56,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.51 vs. limit=6.0 2024-09-19 21:47:41,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=753993.3333333334, ans=0.0 2024-09-19 21:47:41,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=753993.3333333334, ans=0.05 2024-09-19 21:47:47,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=753993.3333333334, ans=0.0 2024-09-19 21:47:54,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=753993.3333333334, ans=0.0 2024-09-19 21:47:57,377 INFO [train.py:1198] (1/2) Epoch 42, batch 2650, loss[loss=0.2144, simple_loss=0.2761, pruned_loss=0.05572, ctc_loss=0.1213, cr_loss=0.4275, over 34298.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.2615, pruned_loss=0.0534, ctc_loss=0.1153, cr_loss=0.3901, over 6766605.10 frames. ], batch size: 117, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 21:48:20,320 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.198e+02 2.542e+02 2.877e+02 3.424e+02 5.257e+02, threshold=5.753e+02, percent-clipped=0.0 2024-09-19 21:48:25,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=754086.6666666666, ans=0.0 2024-09-19 21:48:40,487 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 21:49:21,109 INFO [train.py:1198] (1/2) Epoch 42, batch 2700, loss[loss=0.208, simple_loss=0.272, pruned_loss=0.05254, ctc_loss=0.1157, cr_loss=0.3935, over 34626.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2619, pruned_loss=0.05352, ctc_loss=0.1155, cr_loss=0.3908, over 6762344.06 frames. ], batch size: 102, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 21:49:38,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=754320.0, ans=0.0 2024-09-19 21:49:44,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=754320.0, ans=0.0 2024-09-19 21:49:52,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=754320.0, ans=0.125 2024-09-19 21:50:04,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2024-09-19 21:50:09,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=754366.6666666666, ans=0.125 2024-09-19 21:50:13,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=22.5 2024-09-19 21:50:46,126 INFO [train.py:1198] (1/2) Epoch 42, batch 2750, loss[loss=0.1966, simple_loss=0.2499, pruned_loss=0.05241, ctc_loss=0.1161, cr_loss=0.3796, over 34637.00 frames. ], tot_loss[loss=0.2031, simple_loss=0.2609, pruned_loss=0.05331, ctc_loss=0.115, cr_loss=0.3898, over 6759692.22 frames. ], batch size: 88, lr: 2.83e-03, grad_scale: 16.0 2024-09-19 21:50:58,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=754506.6666666666, ans=0.0 2024-09-19 21:50:58,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=754506.6666666666, ans=0.2 2024-09-19 21:50:59,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=754506.6666666666, ans=0.2 2024-09-19 21:51:07,611 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.157e+02 2.643e+02 3.085e+02 3.680e+02 6.377e+02, threshold=6.171e+02, percent-clipped=3.0 2024-09-19 21:51:11,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=754553.3333333334, ans=0.125 2024-09-19 21:51:21,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=754600.0, ans=0.125 2024-09-19 21:51:37,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=754646.6666666666, ans=0.125 2024-09-19 21:51:42,896 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2024-09-19 21:51:53,251 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2024-09-19 21:52:10,219 INFO [train.py:1198] (1/2) Epoch 42, batch 2800, loss[loss=0.2321, simple_loss=0.2855, pruned_loss=0.06664, ctc_loss=0.142, cr_loss=0.4284, over 23791.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.2612, pruned_loss=0.05353, ctc_loss=0.1154, cr_loss=0.3903, over 6739102.84 frames. ], batch size: 245, lr: 2.82e-03, grad_scale: 32.0 2024-09-19 21:52:25,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=754786.6666666666, ans=0.125 2024-09-19 21:52:30,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=754786.6666666666, ans=0.0 2024-09-19 21:52:48,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=754833.3333333334, ans=0.125 2024-09-19 21:53:10,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=754880.0, ans=0.2 2024-09-19 21:53:24,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=754926.6666666666, ans=0.125 2024-09-19 21:53:34,296 INFO [train.py:1198] (1/2) Epoch 42, batch 2850, loss[loss=0.1935, simple_loss=0.2509, pruned_loss=0.04934, ctc_loss=0.1102, cr_loss=0.3819, over 34461.00 frames. ], tot_loss[loss=0.2034, simple_loss=0.2613, pruned_loss=0.05341, ctc_loss=0.1154, cr_loss=0.39, over 6723589.76 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 32.0 2024-09-19 21:53:39,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=754973.3333333334, ans=0.0 2024-09-19 21:53:44,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=754973.3333333334, ans=0.2 2024-09-19 21:53:55,759 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.125e+02 2.570e+02 3.066e+02 3.846e+02 6.035e+02, threshold=6.133e+02, percent-clipped=0.0 2024-09-19 21:54:00,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=755020.0, ans=0.1 2024-09-19 21:54:17,026 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.50 vs. limit=5.0 2024-09-19 21:54:22,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=755113.3333333334, ans=0.0 2024-09-19 21:54:32,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=755113.3333333334, ans=0.125 2024-09-19 21:54:38,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=755160.0, ans=0.0 2024-09-19 21:54:43,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=755160.0, ans=0.125 2024-09-19 21:54:46,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=755160.0, ans=0.0 2024-09-19 21:54:56,643 INFO [train.py:1198] (1/2) Epoch 42, batch 2900, loss[loss=0.1966, simple_loss=0.2572, pruned_loss=0.04984, ctc_loss=0.1064, cr_loss=0.3755, over 34551.00 frames. ], tot_loss[loss=0.2043, simple_loss=0.2622, pruned_loss=0.05373, ctc_loss=0.1159, cr_loss=0.3917, over 6754429.98 frames. ], batch size: 94, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 21:54:58,529 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=755206.6666666666, ans=0.125 2024-09-19 21:55:40,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=755300.0, ans=0.125 2024-09-19 21:55:55,473 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.79 vs. limit=22.5 2024-09-19 21:55:56,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=755346.6666666666, ans=0.125 2024-09-19 21:56:07,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=755393.3333333334, ans=0.1 2024-09-19 21:56:15,048 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.71 vs. limit=22.5 2024-09-19 21:56:19,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=755440.0, ans=0.1 2024-09-19 21:56:20,836 INFO [train.py:1198] (1/2) Epoch 42, batch 2950, loss[loss=0.1975, simple_loss=0.2525, pruned_loss=0.05261, ctc_loss=0.112, cr_loss=0.3732, over 34649.00 frames. ], tot_loss[loss=0.2026, simple_loss=0.2606, pruned_loss=0.05303, ctc_loss=0.1146, cr_loss=0.388, over 6748160.97 frames. ], batch size: 88, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 21:56:43,881 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.215e+02 2.594e+02 3.115e+02 3.892e+02 6.735e+02, threshold=6.230e+02, percent-clipped=2.0 2024-09-19 21:57:03,552 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.50 vs. limit=10.0 2024-09-19 21:57:09,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=755533.3333333334, ans=0.2 2024-09-19 21:57:45,194 INFO [train.py:1198] (1/2) Epoch 42, batch 3000, loss[loss=0.2025, simple_loss=0.2593, pruned_loss=0.05367, ctc_loss=0.1151, cr_loss=0.3817, over 34531.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.2604, pruned_loss=0.05296, ctc_loss=0.1144, cr_loss=0.3878, over 6746943.17 frames. ], batch size: 94, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 21:57:45,194 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 21:58:02,003 INFO [train.py:1230] (1/2) Epoch 42, validation: loss=0.1488, simple_loss=0.2423, pruned_loss=0.02376, ctc_loss=0.03935, cr_loss=2.224e-14, over 944034.00 frames. 2024-09-19 21:58:02,003 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 21:58:14,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=755673.3333333334, ans=0.125 2024-09-19 21:58:28,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=755720.0, ans=0.1 2024-09-19 21:59:15,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755860.0, ans=0.1 2024-09-19 21:59:24,834 INFO [train.py:1198] (1/2) Epoch 42, batch 3050, loss[loss=0.1899, simple_loss=0.2472, pruned_loss=0.0483, ctc_loss=0.1056, cr_loss=0.3709, over 34585.00 frames. ], tot_loss[loss=0.2029, simple_loss=0.2609, pruned_loss=0.05318, ctc_loss=0.1148, cr_loss=0.3889, over 6739476.67 frames. ], batch size: 89, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 21:59:28,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.93 vs. limit=15.0 2024-09-19 21:59:47,557 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.196e+02 2.554e+02 2.918e+02 3.607e+02 1.341e+03, threshold=5.836e+02, percent-clipped=1.0 2024-09-19 22:00:15,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=756046.6666666666, ans=0.125 2024-09-19 22:00:36,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=756093.3333333334, ans=0.1 2024-09-19 22:00:41,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=756093.3333333334, ans=0.025 2024-09-19 22:00:46,060 INFO [train.py:1198] (1/2) Epoch 42, batch 3100, loss[loss=0.2051, simple_loss=0.2687, pruned_loss=0.05202, ctc_loss=0.1128, cr_loss=0.3717, over 34196.00 frames. ], tot_loss[loss=0.2027, simple_loss=0.2607, pruned_loss=0.05312, ctc_loss=0.1148, cr_loss=0.3888, over 6740823.22 frames. ], batch size: 117, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 22:00:49,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=756140.0, ans=0.125 2024-09-19 22:00:57,568 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=756140.0, ans=0.125 2024-09-19 22:00:59,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=756140.0, ans=0.125 2024-09-19 22:01:02,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=756186.6666666666, ans=0.125 2024-09-19 22:01:06,972 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=756186.6666666666, ans=0.125 2024-09-19 22:01:28,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.27 vs. limit=12.0 2024-09-19 22:01:29,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=756233.3333333334, ans=0.125 2024-09-19 22:02:08,786 INFO [train.py:1198] (1/2) Epoch 42, batch 3150, loss[loss=0.2114, simple_loss=0.2724, pruned_loss=0.05519, ctc_loss=0.1208, cr_loss=0.3965, over 33841.00 frames. ], tot_loss[loss=0.2027, simple_loss=0.2607, pruned_loss=0.0531, ctc_loss=0.1147, cr_loss=0.3886, over 6747025.14 frames. ], batch size: 122, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 22:02:25,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=756420.0, ans=0.05 2024-09-19 22:02:26,058 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=14.29 vs. limit=15.0 2024-09-19 22:02:31,463 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.017e+02 2.540e+02 3.095e+02 3.963e+02 7.756e+02, threshold=6.189e+02, percent-clipped=5.0 2024-09-19 22:02:51,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=756466.6666666666, ans=0.025 2024-09-19 22:03:20,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=756560.0, ans=0.0 2024-09-19 22:03:29,571 INFO [train.py:1198] (1/2) Epoch 42, batch 3200, loss[loss=0.2121, simple_loss=0.2667, pruned_loss=0.05855, ctc_loss=0.1201, cr_loss=0.4081, over 34557.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.2602, pruned_loss=0.05303, ctc_loss=0.1144, cr_loss=0.3877, over 6761041.37 frames. ], batch size: 94, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 22:03:46,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=756653.3333333334, ans=0.125 2024-09-19 22:03:48,382 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.50 vs. limit=15.0 2024-09-19 22:04:12,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=756700.0, ans=0.0 2024-09-19 22:04:50,493 INFO [train.py:1198] (1/2) Epoch 42, batch 3250, loss[loss=0.2047, simple_loss=0.2659, pruned_loss=0.05292, ctc_loss=0.1138, cr_loss=0.3741, over 34642.00 frames. ], tot_loss[loss=0.2027, simple_loss=0.2607, pruned_loss=0.05315, ctc_loss=0.1147, cr_loss=0.3886, over 6769886.96 frames. ], batch size: 98, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 22:05:00,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=756840.0, ans=0.0 2024-09-19 22:05:06,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=756886.6666666666, ans=0.0 2024-09-19 22:05:14,005 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.92 vs. limit=6.0 2024-09-19 22:05:14,437 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.115e+02 2.529e+02 2.890e+02 3.460e+02 5.688e+02, threshold=5.780e+02, percent-clipped=0.0 2024-09-19 22:05:16,252 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:05:20,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=756886.6666666666, ans=0.125 2024-09-19 22:05:55,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=757026.6666666666, ans=0.125 2024-09-19 22:06:00,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=757026.6666666666, ans=0.2 2024-09-19 22:06:12,665 INFO [train.py:1198] (1/2) Epoch 42, batch 3300, loss[loss=0.1904, simple_loss=0.2594, pruned_loss=0.04376, ctc_loss=0.1002, cr_loss=0.3434, over 33206.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2594, pruned_loss=0.05274, ctc_loss=0.1139, cr_loss=0.3865, over 6768355.79 frames. ], batch size: 130, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 22:06:46,069 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.84 vs. limit=22.5 2024-09-19 22:06:53,951 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.60 vs. limit=15.0 2024-09-19 22:07:15,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=757260.0, ans=0.125 2024-09-19 22:07:32,974 INFO [train.py:1198] (1/2) Epoch 42, batch 3350, loss[loss=0.2084, simple_loss=0.2693, pruned_loss=0.05407, ctc_loss=0.1187, cr_loss=0.3896, over 33942.00 frames. ], tot_loss[loss=0.203, simple_loss=0.2607, pruned_loss=0.05342, ctc_loss=0.1151, cr_loss=0.389, over 6743124.83 frames. ], batch size: 122, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 22:07:37,111 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.85 vs. limit=15.0 2024-09-19 22:07:38,598 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.89 vs. limit=15.0 2024-09-19 22:07:58,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.150e+02 2.525e+02 2.856e+02 3.391e+02 5.390e+02, threshold=5.713e+02, percent-clipped=0.0 2024-09-19 22:08:04,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=5.83 vs. limit=12.0 2024-09-19 22:08:24,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=757446.6666666666, ans=0.125 2024-09-19 22:08:36,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=757446.6666666666, ans=0.1 2024-09-19 22:08:39,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=757493.3333333334, ans=0.025 2024-09-19 22:08:54,972 INFO [train.py:1198] (1/2) Epoch 42, batch 3400, loss[loss=0.1833, simple_loss=0.2356, pruned_loss=0.04818, ctc_loss=0.1024, cr_loss=0.3518, over 34156.00 frames. ], tot_loss[loss=0.2028, simple_loss=0.2605, pruned_loss=0.05334, ctc_loss=0.115, cr_loss=0.3886, over 6731884.62 frames. ], batch size: 78, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 22:08:55,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=757540.0, ans=0.0 2024-09-19 22:08:58,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=757540.0, ans=0.2 2024-09-19 22:09:01,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=757540.0, ans=0.2 2024-09-19 22:09:03,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=757540.0, ans=0.07 2024-09-19 22:09:22,699 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2024-09-19 22:09:28,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=757633.3333333334, ans=0.2 2024-09-19 22:09:36,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=757633.3333333334, ans=0.125 2024-09-19 22:09:39,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=757633.3333333334, ans=0.5 2024-09-19 22:09:47,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757680.0, ans=0.1 2024-09-19 22:09:55,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.74 vs. limit=22.5 2024-09-19 22:09:56,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.35 vs. limit=22.5 2024-09-19 22:10:11,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=757726.6666666666, ans=0.125 2024-09-19 22:10:15,980 INFO [train.py:1198] (1/2) Epoch 42, batch 3450, loss[loss=0.2117, simple_loss=0.275, pruned_loss=0.05414, ctc_loss=0.1193, cr_loss=0.4091, over 33008.00 frames. ], tot_loss[loss=0.2029, simple_loss=0.2606, pruned_loss=0.05331, ctc_loss=0.115, cr_loss=0.3884, over 6744402.98 frames. ], batch size: 130, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 22:10:32,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=757820.0, ans=0.2 2024-09-19 22:10:37,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=757820.0, ans=0.125 2024-09-19 22:10:40,098 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.062e+02 2.426e+02 2.739e+02 3.207e+02 5.932e+02, threshold=5.479e+02, percent-clipped=1.0 2024-09-19 22:10:45,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=757820.0, ans=0.0 2024-09-19 22:10:45,673 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2024-09-19 22:11:29,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=757960.0, ans=0.125 2024-09-19 22:11:35,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=758006.6666666666, ans=0.1 2024-09-19 22:11:36,652 INFO [train.py:1198] (1/2) Epoch 42, batch 3500, loss[loss=0.1778, simple_loss=0.2401, pruned_loss=0.04216, ctc_loss=0.09029, cr_loss=0.3273, over 34446.00 frames. ], tot_loss[loss=0.2022, simple_loss=0.26, pruned_loss=0.05303, ctc_loss=0.1144, cr_loss=0.3866, over 6746827.43 frames. ], batch size: 85, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 22:11:38,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=758006.6666666666, ans=0.125 2024-09-19 22:12:06,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=758053.3333333334, ans=0.125 2024-09-19 22:12:10,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=758100.0, ans=0.04949747468305833 2024-09-19 22:12:29,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=758146.6666666666, ans=0.0 2024-09-19 22:12:43,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=758193.3333333334, ans=0.0 2024-09-19 22:12:58,015 INFO [train.py:1198] (1/2) Epoch 42, batch 3550, loss[loss=0.2018, simple_loss=0.2694, pruned_loss=0.04901, ctc_loss=0.1072, cr_loss=0.3698, over 34343.00 frames. ], tot_loss[loss=0.2021, simple_loss=0.26, pruned_loss=0.05291, ctc_loss=0.1141, cr_loss=0.3864, over 6756749.06 frames. ], batch size: 103, lr: 2.82e-03, grad_scale: 16.0 2024-09-19 22:13:07,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=758240.0, ans=0.05 2024-09-19 22:13:22,011 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.138e+02 2.587e+02 3.117e+02 4.147e+02 6.550e+02, threshold=6.235e+02, percent-clipped=6.0 2024-09-19 22:13:25,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=758286.6666666666, ans=0.1 2024-09-19 22:13:30,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=758333.3333333334, ans=0.0 2024-09-19 22:13:38,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=758333.3333333334, ans=0.125 2024-09-19 22:13:56,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=758380.0, ans=0.125 2024-09-19 22:13:59,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=758380.0, ans=0.0 2024-09-19 22:14:03,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=758426.6666666666, ans=0.0 2024-09-19 22:14:09,549 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=758426.6666666666, ans=0.025 2024-09-19 22:14:18,858 INFO [train.py:1198] (1/2) Epoch 42, batch 3600, loss[loss=0.1976, simple_loss=0.2518, pruned_loss=0.05266, ctc_loss=0.1122, cr_loss=0.3904, over 34464.00 frames. ], tot_loss[loss=0.2029, simple_loss=0.2607, pruned_loss=0.05329, ctc_loss=0.1148, cr_loss=0.3885, over 6766392.23 frames. ], batch size: 90, lr: 2.82e-03, grad_scale: 32.0 2024-09-19 22:14:22,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=758473.3333333334, ans=0.2 2024-09-19 22:14:25,478 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=758473.3333333334, ans=0.1 2024-09-19 22:14:57,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=758566.6666666666, ans=0.125 2024-09-19 22:15:13,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=758613.3333333334, ans=0.0 2024-09-19 22:15:27,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=758660.0, ans=10.0 2024-09-19 22:15:33,009 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2024-09-19 22:15:38,432 INFO [train.py:1198] (1/2) Epoch 42, batch 3650, loss[loss=0.2209, simple_loss=0.2805, pruned_loss=0.05973, ctc_loss=0.1258, cr_loss=0.4172, over 34423.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2602, pruned_loss=0.05309, ctc_loss=0.1145, cr_loss=0.3881, over 6768833.42 frames. ], batch size: 110, lr: 2.82e-03, grad_scale: 32.0 2024-09-19 22:16:02,440 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.166e+02 2.598e+02 3.195e+02 3.996e+02 6.563e+02, threshold=6.389e+02, percent-clipped=1.0 2024-09-19 22:16:13,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=758800.0, ans=0.0 2024-09-19 22:16:59,561 INFO [train.py:1198] (1/2) Epoch 42, batch 3700, loss[loss=0.2127, simple_loss=0.271, pruned_loss=0.05676, ctc_loss=0.124, cr_loss=0.4056, over 34629.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.2604, pruned_loss=0.05291, ctc_loss=0.1143, cr_loss=0.3874, over 6783199.25 frames. ], batch size: 102, lr: 2.82e-03, grad_scale: 32.0 2024-09-19 22:17:03,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=758940.0, ans=0.125 2024-09-19 22:17:06,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=758940.0, ans=0.125 2024-09-19 22:17:25,226 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.61 vs. limit=15.0 2024-09-19 22:17:25,439 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=22.5 2024-09-19 22:17:31,925 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.12 vs. limit=15.0 2024-09-19 22:18:20,661 INFO [train.py:1198] (1/2) Epoch 42, batch 3750, loss[loss=0.2272, simple_loss=0.2804, pruned_loss=0.06523, ctc_loss=0.1332, cr_loss=0.4232, over 34369.00 frames. ], tot_loss[loss=0.2054, simple_loss=0.2636, pruned_loss=0.0541, ctc_loss=0.1166, cr_loss=0.393, over 6784652.47 frames. ], batch size: 113, lr: 2.82e-03, grad_scale: 32.0 2024-09-19 22:18:22,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=759173.3333333334, ans=0.025 2024-09-19 22:18:28,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=759173.3333333334, ans=0.1 2024-09-19 22:18:35,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=759220.0, ans=0.0 2024-09-19 22:18:35,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=759220.0, ans=0.125 2024-09-19 22:18:36,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=759220.0, ans=0.125 2024-09-19 22:18:44,434 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.150e+02 2.439e+02 2.631e+02 2.954e+02 6.464e+02, threshold=5.263e+02, percent-clipped=1.0 2024-09-19 22:18:46,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=759220.0, ans=0.125 2024-09-19 22:19:06,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=759266.6666666666, ans=0.5 2024-09-19 22:19:23,908 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=759360.0, ans=0.125 2024-09-19 22:19:23,930 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=759360.0, ans=0.2 2024-09-19 22:19:25,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=759360.0, ans=0.125 2024-09-19 22:19:37,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.20 vs. limit=15.0 2024-09-19 22:19:38,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=759360.0, ans=0.0 2024-09-19 22:19:41,498 INFO [train.py:1198] (1/2) Epoch 42, batch 3800, loss[loss=0.2279, simple_loss=0.2806, pruned_loss=0.06561, ctc_loss=0.1357, cr_loss=0.4198, over 30315.00 frames. ], tot_loss[loss=0.2085, simple_loss=0.2662, pruned_loss=0.05548, ctc_loss=0.1192, cr_loss=0.3991, over 6675839.06 frames. ], batch size: 176, lr: 2.82e-03, grad_scale: 32.0 2024-09-19 22:19:42,787 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.52 vs. limit=15.0 2024-09-19 22:20:00,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=759453.3333333334, ans=0.125 2024-09-19 22:20:24,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.95 vs. limit=15.0 2024-09-19 22:20:41,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=759546.6666666666, ans=0.125 2024-09-19 22:20:47,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=759593.3333333334, ans=0.0 2024-09-19 22:20:56,072 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.58 vs. limit=15.0 2024-09-19 22:21:05,360 INFO [train.py:1198] (1/2) Epoch 42, batch 3850, loss[loss=0.2201, simple_loss=0.2728, pruned_loss=0.06185, ctc_loss=0.133, cr_loss=0.4265, over 24169.00 frames. ], tot_loss[loss=0.2111, simple_loss=0.2679, pruned_loss=0.05688, ctc_loss=0.1221, cr_loss=0.4029, over 6249769.33 frames. ], batch size: 245, lr: 2.82e-03, grad_scale: 32.0 2024-09-19 22:21:17,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=759640.0, ans=0.125 2024-09-19 22:21:20,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=759686.6666666666, ans=0.025 2024-09-19 22:21:29,332 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.36 vs. limit=15.0 2024-09-19 22:21:29,931 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.205e+02 2.524e+02 2.759e+02 3.211e+02 5.278e+02, threshold=5.518e+02, percent-clipped=1.0 2024-09-19 22:21:40,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=759733.3333333334, ans=0.0 2024-09-19 22:21:40,666 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:21:43,797 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=759733.3333333334, ans=0.125 2024-09-19 22:22:30,840 INFO [train.py:1198] (1/2) Epoch 43, batch 0, loss[loss=0.1863, simple_loss=0.2486, pruned_loss=0.04527, ctc_loss=0.09763, cr_loss=0.3504, over 34459.00 frames. ], tot_loss[loss=0.1863, simple_loss=0.2486, pruned_loss=0.04527, ctc_loss=0.09763, cr_loss=0.3504, over 34459.00 frames. ], batch size: 85, lr: 2.78e-03, grad_scale: 32.0 2024-09-19 22:22:30,840 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 22:22:33,686 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([2.9541, 3.9770, 4.5957, 4.4767], device='cuda:1') 2024-09-19 22:22:47,554 INFO [train.py:1230] (1/2) Epoch 43, validation: loss=0.1486, simple_loss=0.2428, pruned_loss=0.02339, ctc_loss=0.03837, cr_loss=2.176e-14, over 944034.00 frames. 2024-09-19 22:22:47,554 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 22:22:47,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=759761.3333333334, ans=0.0 2024-09-19 22:22:55,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759761.3333333334, ans=0.1 2024-09-19 22:22:56,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.68 vs. limit=12.0 2024-09-19 22:23:14,828 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.96 vs. limit=15.0 2024-09-19 22:23:24,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=759854.6666666666, ans=0.125 2024-09-19 22:24:09,865 INFO [train.py:1198] (1/2) Epoch 43, batch 50, loss[loss=0.1775, simple_loss=0.2328, pruned_loss=0.04451, ctc_loss=0.09763, cr_loss=0.3421, over 34490.00 frames. ], tot_loss[loss=0.2043, simple_loss=0.262, pruned_loss=0.05389, ctc_loss=0.1159, cr_loss=0.3906, over 1481500.23 frames. ], batch size: 82, lr: 2.78e-03, grad_scale: 32.0 2024-09-19 22:24:25,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=759994.6666666666, ans=0.125 2024-09-19 22:24:49,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=760088.0, ans=0.2 2024-09-19 22:25:17,381 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.158e+02 2.565e+02 2.719e+02 3.284e+02 5.570e+02, threshold=5.438e+02, percent-clipped=1.0 2024-09-19 22:25:25,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=760181.3333333334, ans=0.125 2024-09-19 22:25:31,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=760181.3333333334, ans=0.2 2024-09-19 22:25:35,503 INFO [train.py:1198] (1/2) Epoch 43, batch 100, loss[loss=0.1894, simple_loss=0.2471, pruned_loss=0.04832, ctc_loss=0.1045, cr_loss=0.3561, over 34569.00 frames. ], tot_loss[loss=0.2056, simple_loss=0.2636, pruned_loss=0.05424, ctc_loss=0.1167, cr_loss=0.3935, over 2630097.94 frames. ], batch size: 89, lr: 2.78e-03, grad_scale: 32.0 2024-09-19 22:25:50,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=760274.6666666666, ans=0.1 2024-09-19 22:25:51,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=760274.6666666666, ans=22.5 2024-09-19 22:26:18,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=760321.3333333334, ans=0.1 2024-09-19 22:26:19,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=760321.3333333334, ans=0.2 2024-09-19 22:26:23,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=760368.0, ans=22.5 2024-09-19 22:26:25,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.93 vs. limit=15.0 2024-09-19 22:26:47,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=760414.6666666666, ans=0.0 2024-09-19 22:26:53,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=760414.6666666666, ans=0.1 2024-09-19 22:26:56,636 INFO [train.py:1198] (1/2) Epoch 43, batch 150, loss[loss=0.1739, simple_loss=0.2318, pruned_loss=0.04216, ctc_loss=0.09293, cr_loss=0.3301, over 34476.00 frames. ], tot_loss[loss=0.2028, simple_loss=0.2613, pruned_loss=0.05291, ctc_loss=0.1144, cr_loss=0.3887, over 3557249.31 frames. ], batch size: 82, lr: 2.78e-03, grad_scale: 32.0 2024-09-19 22:27:03,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=760461.3333333334, ans=0.2 2024-09-19 22:27:06,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=760461.3333333334, ans=0.0 2024-09-19 22:27:15,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=760508.0, ans=0.2 2024-09-19 22:27:21,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=760508.0, ans=0.0 2024-09-19 22:27:29,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=760554.6666666666, ans=0.2 2024-09-19 22:27:33,769 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.99 vs. limit=15.0 2024-09-19 22:28:02,507 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.045e+02 2.457e+02 2.729e+02 3.363e+02 6.510e+02, threshold=5.457e+02, percent-clipped=3.0 2024-09-19 22:28:06,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=15.0 2024-09-19 22:28:22,909 INFO [train.py:1198] (1/2) Epoch 43, batch 200, loss[loss=0.2214, simple_loss=0.2772, pruned_loss=0.0612, ctc_loss=0.1305, cr_loss=0.4301, over 31755.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2604, pruned_loss=0.05301, ctc_loss=0.1145, cr_loss=0.3879, over 4271571.48 frames. ], batch size: 145, lr: 2.78e-03, grad_scale: 32.0 2024-09-19 22:28:30,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2024-09-19 22:28:41,935 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.26 vs. limit=15.0 2024-09-19 22:28:44,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=760741.3333333334, ans=0.125 2024-09-19 22:28:51,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=760741.3333333334, ans=0.125 2024-09-19 22:29:19,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=760834.6666666666, ans=0.125 2024-09-19 22:29:35,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=760881.3333333334, ans=0.125 2024-09-19 22:29:45,066 INFO [train.py:1198] (1/2) Epoch 43, batch 250, loss[loss=0.2209, simple_loss=0.2789, pruned_loss=0.06011, ctc_loss=0.1295, cr_loss=0.416, over 34227.00 frames. ], tot_loss[loss=0.2021, simple_loss=0.2603, pruned_loss=0.05284, ctc_loss=0.1142, cr_loss=0.3869, over 4834717.03 frames. ], batch size: 117, lr: 2.78e-03, grad_scale: 16.0 2024-09-19 22:29:49,492 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2024-09-19 22:30:18,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=761021.3333333334, ans=0.05 2024-09-19 22:30:18,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=761021.3333333334, ans=0.125 2024-09-19 22:30:23,544 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.85 vs. limit=6.0 2024-09-19 22:30:46,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=761068.0, ans=0.1 2024-09-19 22:30:50,870 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.142e+02 2.597e+02 3.186e+02 3.841e+02 8.423e+02, threshold=6.371e+02, percent-clipped=4.0 2024-09-19 22:30:54,463 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:31:04,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=761114.6666666666, ans=0.1 2024-09-19 22:31:07,322 INFO [train.py:1198] (1/2) Epoch 43, batch 300, loss[loss=0.2247, simple_loss=0.2824, pruned_loss=0.0618, ctc_loss=0.1324, cr_loss=0.4242, over 34335.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.26, pruned_loss=0.05272, ctc_loss=0.114, cr_loss=0.387, over 5263705.23 frames. ], batch size: 107, lr: 2.78e-03, grad_scale: 16.0 2024-09-19 22:31:18,294 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.89 vs. limit=15.0 2024-09-19 22:31:22,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-09-19 22:31:42,070 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=12.0 2024-09-19 22:31:52,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=761254.6666666666, ans=0.1 2024-09-19 22:31:53,833 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:32:17,439 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=22.5 2024-09-19 22:32:31,056 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.42 vs. limit=15.0 2024-09-19 22:32:33,132 INFO [train.py:1198] (1/2) Epoch 43, batch 350, loss[loss=0.181, simple_loss=0.2393, pruned_loss=0.04444, ctc_loss=0.09907, cr_loss=0.3524, over 34289.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2606, pruned_loss=0.05288, ctc_loss=0.1143, cr_loss=0.3876, over 5598246.13 frames. ], batch size: 83, lr: 2.78e-03, grad_scale: 16.0 2024-09-19 22:32:34,065 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=5.26 vs. limit=12.0 2024-09-19 22:32:36,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=761394.6666666666, ans=0.1 2024-09-19 22:33:08,831 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2024-09-19 22:33:21,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=761534.6666666666, ans=0.09899494936611666 2024-09-19 22:33:23,019 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=761534.6666666666, ans=0.0 2024-09-19 22:33:39,101 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.174e+02 2.526e+02 2.918e+02 3.557e+02 6.086e+02, threshold=5.837e+02, percent-clipped=0.0 2024-09-19 22:33:46,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=761581.3333333334, ans=0.125 2024-09-19 22:33:55,488 INFO [train.py:1198] (1/2) Epoch 43, batch 400, loss[loss=0.2125, simple_loss=0.2725, pruned_loss=0.05586, ctc_loss=0.1231, cr_loss=0.4014, over 34399.00 frames. ], tot_loss[loss=0.2018, simple_loss=0.2601, pruned_loss=0.05265, ctc_loss=0.1139, cr_loss=0.3869, over 5864073.44 frames. ], batch size: 95, lr: 2.78e-03, grad_scale: 32.0 2024-09-19 22:34:10,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=761674.6666666666, ans=0.0 2024-09-19 22:34:24,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.54 vs. limit=15.0 2024-09-19 22:34:33,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=761721.3333333334, ans=0.125 2024-09-19 22:34:35,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=761721.3333333334, ans=0.125 2024-09-19 22:34:36,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2024-09-19 22:35:17,680 INFO [train.py:1198] (1/2) Epoch 43, batch 450, loss[loss=0.2088, simple_loss=0.2728, pruned_loss=0.05324, ctc_loss=0.1124, cr_loss=0.3935, over 34690.00 frames. ], tot_loss[loss=0.2018, simple_loss=0.2601, pruned_loss=0.05263, ctc_loss=0.1139, cr_loss=0.387, over 6054538.69 frames. ], batch size: 97, lr: 2.78e-03, grad_scale: 32.0 2024-09-19 22:35:29,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=761861.3333333334, ans=0.2 2024-09-19 22:35:32,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=761908.0, ans=0.125 2024-09-19 22:35:39,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=761908.0, ans=0.0 2024-09-19 22:35:39,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=761908.0, ans=0.125 2024-09-19 22:36:00,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.99 vs. limit=15.0 2024-09-19 22:36:11,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=762001.3333333334, ans=0.07 2024-09-19 22:36:26,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=762048.0, ans=0.125 2024-09-19 22:36:27,562 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.077e+02 2.473e+02 2.679e+02 3.374e+02 5.020e+02, threshold=5.358e+02, percent-clipped=0.0 2024-09-19 22:36:29,759 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:36:43,892 INFO [train.py:1198] (1/2) Epoch 43, batch 500, loss[loss=0.2169, simple_loss=0.2751, pruned_loss=0.05868, ctc_loss=0.125, cr_loss=0.4086, over 34465.00 frames. ], tot_loss[loss=0.2008, simple_loss=0.2592, pruned_loss=0.05226, ctc_loss=0.1131, cr_loss=0.385, over 6221212.53 frames. ], batch size: 110, lr: 2.78e-03, grad_scale: 32.0 2024-09-19 22:36:44,816 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.72 vs. limit=15.0 2024-09-19 22:36:50,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=762094.6666666666, ans=0.125 2024-09-19 22:37:20,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=762188.0, ans=0.125 2024-09-19 22:37:20,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=762188.0, ans=0.1 2024-09-19 22:37:25,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=762188.0, ans=0.1 2024-09-19 22:37:46,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=762234.6666666666, ans=0.0 2024-09-19 22:38:06,191 INFO [train.py:1198] (1/2) Epoch 43, batch 550, loss[loss=0.2213, simple_loss=0.2753, pruned_loss=0.06193, ctc_loss=0.1293, cr_loss=0.4413, over 33918.00 frames. ], tot_loss[loss=0.2012, simple_loss=0.2594, pruned_loss=0.05247, ctc_loss=0.1135, cr_loss=0.3856, over 6331618.19 frames. ], batch size: 122, lr: 2.78e-03, grad_scale: 32.0 2024-09-19 22:38:18,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=762328.0, ans=0.1 2024-09-19 22:38:21,238 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=762374.6666666666, ans=0.09899494936611666 2024-09-19 22:38:26,225 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:38:35,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=762374.6666666666, ans=0.2 2024-09-19 22:39:11,642 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.180e+02 2.498e+02 2.764e+02 3.401e+02 5.404e+02, threshold=5.528e+02, percent-clipped=2.0 2024-09-19 22:39:32,683 INFO [train.py:1198] (1/2) Epoch 43, batch 600, loss[loss=0.2139, simple_loss=0.2736, pruned_loss=0.05713, ctc_loss=0.1207, cr_loss=0.3935, over 34248.00 frames. ], tot_loss[loss=0.2017, simple_loss=0.2599, pruned_loss=0.05267, ctc_loss=0.1139, cr_loss=0.3869, over 6434362.58 frames. ], batch size: 117, lr: 2.78e-03, grad_scale: 32.0 2024-09-19 22:39:39,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=762561.3333333334, ans=0.125 2024-09-19 22:40:25,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=762701.3333333334, ans=0.125 2024-09-19 22:40:29,161 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.64 vs. limit=15.0 2024-09-19 22:40:41,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=762748.0, ans=0.125 2024-09-19 22:40:54,488 INFO [train.py:1198] (1/2) Epoch 43, batch 650, loss[loss=0.1986, simple_loss=0.2522, pruned_loss=0.05311, ctc_loss=0.1145, cr_loss=0.3975, over 34551.00 frames. ], tot_loss[loss=0.201, simple_loss=0.2593, pruned_loss=0.05229, ctc_loss=0.1132, cr_loss=0.3854, over 6525105.91 frames. ], batch size: 94, lr: 2.78e-03, grad_scale: 32.0 2024-09-19 22:41:21,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=762841.3333333334, ans=0.125 2024-09-19 22:41:55,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=762934.6666666666, ans=0.015 2024-09-19 22:42:00,278 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.973e+02 2.594e+02 3.107e+02 4.159e+02 7.221e+02, threshold=6.213e+02, percent-clipped=3.0 2024-09-19 22:42:16,701 INFO [train.py:1198] (1/2) Epoch 43, batch 700, loss[loss=0.209, simple_loss=0.2637, pruned_loss=0.05736, ctc_loss=0.1187, cr_loss=0.3989, over 34597.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2597, pruned_loss=0.05255, ctc_loss=0.1136, cr_loss=0.3861, over 6582008.89 frames. ], batch size: 89, lr: 2.78e-03, grad_scale: 32.0 2024-09-19 22:42:18,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=763028.0, ans=0.125 2024-09-19 22:42:31,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=763074.6666666666, ans=0.125 2024-09-19 22:42:36,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=763074.6666666666, ans=0.125 2024-09-19 22:42:41,406 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff3.min_abs, batch_count=763074.6666666666, ans=0.2 2024-09-19 22:42:48,406 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2024-09-19 22:42:49,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=763121.3333333334, ans=0.2 2024-09-19 22:42:51,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=763121.3333333334, ans=0.0 2024-09-19 22:43:06,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.22 vs. limit=10.0 2024-09-19 22:43:42,456 INFO [train.py:1198] (1/2) Epoch 43, batch 750, loss[loss=0.2066, simple_loss=0.2649, pruned_loss=0.05412, ctc_loss=0.1208, cr_loss=0.4001, over 34421.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2592, pruned_loss=0.05233, ctc_loss=0.1132, cr_loss=0.3855, over 6623946.85 frames. ], batch size: 95, lr: 2.78e-03, grad_scale: 32.0 2024-09-19 22:43:44,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=763261.3333333334, ans=0.125 2024-09-19 22:44:29,059 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.22 vs. limit=15.0 2024-09-19 22:44:43,705 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=763401.3333333334, ans=0.0 2024-09-19 22:44:46,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=763448.0, ans=0.125 2024-09-19 22:44:48,269 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.096e+02 2.591e+02 3.070e+02 3.810e+02 9.705e+02, threshold=6.140e+02, percent-clipped=3.0 2024-09-19 22:45:04,977 INFO [train.py:1198] (1/2) Epoch 43, batch 800, loss[loss=0.1793, simple_loss=0.2387, pruned_loss=0.04328, ctc_loss=0.09735, cr_loss=0.3467, over 34471.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2594, pruned_loss=0.0525, ctc_loss=0.1135, cr_loss=0.3862, over 6658705.65 frames. ], batch size: 85, lr: 2.78e-03, grad_scale: 32.0 2024-09-19 22:45:11,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=763494.6666666666, ans=0.0 2024-09-19 22:45:18,368 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:45:23,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.71 vs. limit=10.0 2024-09-19 22:45:28,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=763541.3333333334, ans=0.125 2024-09-19 22:45:36,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=763588.0, ans=0.1 2024-09-19 22:45:39,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=763588.0, ans=0.125 2024-09-19 22:45:44,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=763588.0, ans=0.05 2024-09-19 22:46:09,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=763681.3333333334, ans=0.125 2024-09-19 22:46:26,721 INFO [train.py:1198] (1/2) Epoch 43, batch 850, loss[loss=0.205, simple_loss=0.2675, pruned_loss=0.05231, ctc_loss=0.1145, cr_loss=0.378, over 34335.00 frames. ], tot_loss[loss=0.2008, simple_loss=0.2588, pruned_loss=0.05231, ctc_loss=0.1132, cr_loss=0.3852, over 6693060.05 frames. ], batch size: 103, lr: 2.77e-03, grad_scale: 32.0 2024-09-19 22:46:56,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=763774.6666666666, ans=0.125 2024-09-19 22:46:56,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=763774.6666666666, ans=0.0 2024-09-19 22:47:18,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=763868.0, ans=0.0 2024-09-19 22:47:36,361 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.215e+02 2.570e+02 3.023e+02 3.913e+02 6.163e+02, threshold=6.046e+02, percent-clipped=1.0 2024-09-19 22:47:48,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=763914.6666666666, ans=0.125 2024-09-19 22:47:52,868 INFO [train.py:1198] (1/2) Epoch 43, batch 900, loss[loss=0.1697, simple_loss=0.2294, pruned_loss=0.03917, ctc_loss=0.09109, cr_loss=0.3337, over 34478.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2591, pruned_loss=0.05228, ctc_loss=0.1133, cr_loss=0.3856, over 6697958.95 frames. ], batch size: 85, lr: 2.77e-03, grad_scale: 32.0 2024-09-19 22:47:54,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=763961.3333333334, ans=0.09899494936611666 2024-09-19 22:48:05,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.40 vs. limit=15.0 2024-09-19 22:48:05,395 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.68 vs. limit=10.0 2024-09-19 22:48:20,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=764008.0, ans=0.125 2024-09-19 22:48:27,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=764054.6666666666, ans=0.1 2024-09-19 22:48:32,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=764054.6666666666, ans=0.125 2024-09-19 22:48:45,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=764101.3333333334, ans=0.0 2024-09-19 22:48:55,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=764101.3333333334, ans=0.1 2024-09-19 22:49:02,897 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.92 vs. limit=15.0 2024-09-19 22:49:10,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=764148.0, ans=0.0 2024-09-19 22:49:14,828 INFO [train.py:1198] (1/2) Epoch 43, batch 950, loss[loss=0.1866, simple_loss=0.248, pruned_loss=0.04526, ctc_loss=0.1, cr_loss=0.3673, over 34665.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2597, pruned_loss=0.05242, ctc_loss=0.1136, cr_loss=0.3866, over 6702052.53 frames. ], batch size: 87, lr: 2.77e-03, grad_scale: 32.0 2024-09-19 22:49:37,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.71 vs. limit=15.0 2024-09-19 22:49:41,395 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=764241.3333333334, ans=0.02 2024-09-19 22:49:58,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=764288.0, ans=0.2 2024-09-19 22:50:03,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=764334.6666666666, ans=0.0 2024-09-19 22:50:19,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=764381.3333333334, ans=0.2 2024-09-19 22:50:20,997 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.226e+02 2.722e+02 3.135e+02 4.083e+02 5.940e+02, threshold=6.270e+02, percent-clipped=0.0 2024-09-19 22:50:26,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=764381.3333333334, ans=0.2 2024-09-19 22:50:39,553 INFO [train.py:1198] (1/2) Epoch 43, batch 1000, loss[loss=0.1875, simple_loss=0.2447, pruned_loss=0.04757, ctc_loss=0.1047, cr_loss=0.3563, over 34498.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2605, pruned_loss=0.05289, ctc_loss=0.1144, cr_loss=0.388, over 6694515.09 frames. ], batch size: 90, lr: 2.77e-03, grad_scale: 32.0 2024-09-19 22:50:50,472 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.85 vs. limit=15.0 2024-09-19 22:51:14,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=764521.3333333334, ans=0.125 2024-09-19 22:51:16,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=764521.3333333334, ans=0.125 2024-09-19 22:51:52,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=764614.6666666666, ans=0.2 2024-09-19 22:51:52,806 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.02 vs. limit=22.5 2024-09-19 22:52:03,620 INFO [train.py:1198] (1/2) Epoch 43, batch 1050, loss[loss=0.2025, simple_loss=0.2655, pruned_loss=0.05089, ctc_loss=0.1117, cr_loss=0.386, over 34562.00 frames. ], tot_loss[loss=0.202, simple_loss=0.26, pruned_loss=0.05282, ctc_loss=0.1142, cr_loss=0.3877, over 6703513.25 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2024-09-19 22:52:08,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=764661.3333333334, ans=0.2 2024-09-19 22:52:56,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=764801.3333333334, ans=0.125 2024-09-19 22:53:06,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=764801.3333333334, ans=0.0 2024-09-19 22:53:09,614 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.455e+02 2.723e+02 3.272e+02 5.148e+02, threshold=5.445e+02, percent-clipped=0.0 2024-09-19 22:53:14,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=764848.0, ans=0.2 2024-09-19 22:53:21,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=764848.0, ans=0.2 2024-09-19 22:53:23,954 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.88 vs. limit=6.0 2024-09-19 22:53:25,998 INFO [train.py:1198] (1/2) Epoch 43, batch 1100, loss[loss=0.1895, simple_loss=0.2499, pruned_loss=0.0467, ctc_loss=0.1046, cr_loss=0.37, over 34355.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2595, pruned_loss=0.05252, ctc_loss=0.1136, cr_loss=0.3863, over 6717214.50 frames. ], batch size: 91, lr: 2.77e-03, grad_scale: 32.0 2024-09-19 22:53:29,013 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.48 vs. limit=5.0 2024-09-19 22:53:30,187 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.90 vs. limit=15.0 2024-09-19 22:53:41,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=764941.3333333334, ans=0.125 2024-09-19 22:53:47,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=764941.3333333334, ans=0.1 2024-09-19 22:54:29,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=765034.6666666666, ans=0.125 2024-09-19 22:54:52,689 INFO [train.py:1198] (1/2) Epoch 43, batch 1150, loss[loss=0.1868, simple_loss=0.2443, pruned_loss=0.04733, ctc_loss=0.1021, cr_loss=0.3527, over 34739.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2594, pruned_loss=0.05255, ctc_loss=0.1136, cr_loss=0.386, over 6714675.43 frames. ], batch size: 92, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 22:54:53,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765128.0, ans=0.1 2024-09-19 22:54:59,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=765128.0, ans=0.0 2024-09-19 22:55:32,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=765221.3333333334, ans=0.125 2024-09-19 22:55:37,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=765221.3333333334, ans=0.0 2024-09-19 22:56:00,141 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.183e+02 2.512e+02 2.857e+02 3.326e+02 5.298e+02, threshold=5.713e+02, percent-clipped=0.0 2024-09-19 22:56:21,305 INFO [train.py:1198] (1/2) Epoch 43, batch 1200, loss[loss=0.1993, simple_loss=0.263, pruned_loss=0.04925, ctc_loss=0.1093, cr_loss=0.382, over 34563.00 frames. ], tot_loss[loss=0.2022, simple_loss=0.2604, pruned_loss=0.05284, ctc_loss=0.1142, cr_loss=0.3877, over 6708097.25 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2024-09-19 22:56:58,684 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.98 vs. limit=6.0 2024-09-19 22:57:09,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=765501.3333333334, ans=0.2 2024-09-19 22:57:35,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=765548.0, ans=0.125 2024-09-19 22:57:42,256 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=765594.6666666666, ans=0.2 2024-09-19 22:57:43,495 INFO [train.py:1198] (1/2) Epoch 43, batch 1250, loss[loss=0.2109, simple_loss=0.2686, pruned_loss=0.05618, ctc_loss=0.121, cr_loss=0.4131, over 34355.00 frames. ], tot_loss[loss=0.2029, simple_loss=0.2611, pruned_loss=0.05306, ctc_loss=0.1145, cr_loss=0.3883, over 6741722.25 frames. ], batch size: 107, lr: 2.77e-03, grad_scale: 32.0 2024-09-19 22:57:52,864 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2024-09-19 22:57:53,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=765594.6666666666, ans=0.125 2024-09-19 22:58:08,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=765641.3333333334, ans=0.125 2024-09-19 22:58:42,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.85 vs. limit=22.5 2024-09-19 22:58:44,007 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=765734.6666666666, ans=0.125 2024-09-19 22:58:50,651 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 22:58:56,836 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.140e+02 2.560e+02 2.944e+02 3.645e+02 6.233e+02, threshold=5.887e+02, percent-clipped=1.0 2024-09-19 22:59:09,719 INFO [train.py:1198] (1/2) Epoch 43, batch 1300, loss[loss=0.2101, simple_loss=0.2741, pruned_loss=0.05335, ctc_loss=0.1163, cr_loss=0.4004, over 33026.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.2605, pruned_loss=0.05289, ctc_loss=0.1143, cr_loss=0.3877, over 6745045.69 frames. ], batch size: 130, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 22:59:10,022 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=765828.0, ans=0.025 2024-09-19 22:59:11,011 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.50 vs. limit=22.5 2024-09-19 22:59:23,795 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=15.0 2024-09-19 22:59:54,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=765921.3333333334, ans=0.125 2024-09-19 23:00:32,313 INFO [train.py:1198] (1/2) Epoch 43, batch 1350, loss[loss=0.2074, simple_loss=0.2618, pruned_loss=0.05663, ctc_loss=0.1195, cr_loss=0.3963, over 34554.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.2603, pruned_loss=0.0529, ctc_loss=0.1143, cr_loss=0.3882, over 6762610.44 frames. ], batch size: 94, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 23:01:03,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=766154.6666666666, ans=0.125 2024-09-19 23:01:18,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=766154.6666666666, ans=0.1 2024-09-19 23:01:26,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=766201.3333333334, ans=0.1 2024-09-19 23:01:38,188 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=766248.0, ans=0.1 2024-09-19 23:01:41,093 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.197e+02 2.623e+02 3.227e+02 3.920e+02 6.002e+02, threshold=6.454e+02, percent-clipped=2.0 2024-09-19 23:01:54,719 INFO [train.py:1198] (1/2) Epoch 43, batch 1400, loss[loss=0.1857, simple_loss=0.2407, pruned_loss=0.04776, ctc_loss=0.1039, cr_loss=0.3633, over 34287.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2605, pruned_loss=0.05293, ctc_loss=0.1144, cr_loss=0.3881, over 6775670.89 frames. ], batch size: 80, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 23:02:11,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=766341.3333333334, ans=0.125 2024-09-19 23:02:31,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=766388.0, ans=0.0 2024-09-19 23:02:35,218 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=766388.0, ans=0.2 2024-09-19 23:02:35,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=766388.0, ans=0.0 2024-09-19 23:02:46,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=766434.6666666666, ans=0.025 2024-09-19 23:02:51,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=766434.6666666666, ans=0.125 2024-09-19 23:02:59,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=766434.6666666666, ans=0.0 2024-09-19 23:03:04,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=766481.3333333334, ans=0.0 2024-09-19 23:03:20,879 INFO [train.py:1198] (1/2) Epoch 43, batch 1450, loss[loss=0.2201, simple_loss=0.2794, pruned_loss=0.05952, ctc_loss=0.1246, cr_loss=0.4193, over 34437.00 frames. ], tot_loss[loss=0.2028, simple_loss=0.261, pruned_loss=0.05302, ctc_loss=0.1145, cr_loss=0.3887, over 6773150.53 frames. ], batch size: 110, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 23:03:21,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=766528.0, ans=0.125 2024-09-19 23:03:21,906 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2024-09-19 23:03:35,355 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.96 vs. limit=8.0 2024-09-19 23:03:44,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=766574.6666666666, ans=0.125 2024-09-19 23:03:52,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=766621.3333333334, ans=0.1 2024-09-19 23:03:55,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=766621.3333333334, ans=0.1 2024-09-19 23:04:17,251 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.60 vs. limit=10.0 2024-09-19 23:04:29,715 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.261e+02 2.480e+02 2.747e+02 3.108e+02 1.029e+03, threshold=5.495e+02, percent-clipped=1.0 2024-09-19 23:04:43,030 INFO [train.py:1198] (1/2) Epoch 43, batch 1500, loss[loss=0.2129, simple_loss=0.2717, pruned_loss=0.05651, ctc_loss=0.1237, cr_loss=0.4091, over 34454.00 frames. ], tot_loss[loss=0.2027, simple_loss=0.2611, pruned_loss=0.05294, ctc_loss=0.1145, cr_loss=0.3883, over 6773074.37 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 23:04:48,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=766761.3333333334, ans=0.1 2024-09-19 23:05:16,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=766854.6666666666, ans=0.125 2024-09-19 23:05:28,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=766854.6666666666, ans=0.125 2024-09-19 23:06:07,377 INFO [train.py:1198] (1/2) Epoch 43, batch 1550, loss[loss=0.2068, simple_loss=0.2684, pruned_loss=0.05308, ctc_loss=0.1167, cr_loss=0.3952, over 34383.00 frames. ], tot_loss[loss=0.2026, simple_loss=0.2609, pruned_loss=0.05296, ctc_loss=0.1145, cr_loss=0.3882, over 6745895.18 frames. ], batch size: 105, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 23:06:13,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-09-19 23:06:29,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=767041.3333333334, ans=0.0 2024-09-19 23:06:40,741 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:07:18,419 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.151e+02 2.516e+02 2.860e+02 3.338e+02 6.388e+02, threshold=5.720e+02, percent-clipped=4.0 2024-09-19 23:07:25,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=767181.3333333334, ans=0.0 2024-09-19 23:07:31,920 INFO [train.py:1198] (1/2) Epoch 43, batch 1600, loss[loss=0.2087, simple_loss=0.2701, pruned_loss=0.05365, ctc_loss=0.1185, cr_loss=0.4071, over 34581.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2606, pruned_loss=0.0529, ctc_loss=0.1145, cr_loss=0.3882, over 6725983.57 frames. ], batch size: 99, lr: 2.77e-03, grad_scale: 32.0 2024-09-19 23:07:38,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=767228.0, ans=0.025 2024-09-19 23:07:54,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.78 vs. limit=10.0 2024-09-19 23:08:09,388 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.20 vs. limit=15.0 2024-09-19 23:08:37,728 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.86 vs. limit=6.0 2024-09-19 23:08:51,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=767414.6666666666, ans=0.125 2024-09-19 23:08:54,502 INFO [train.py:1198] (1/2) Epoch 43, batch 1650, loss[loss=0.2141, simple_loss=0.2741, pruned_loss=0.05638, ctc_loss=0.1214, cr_loss=0.4245, over 34401.00 frames. ], tot_loss[loss=0.2025, simple_loss=0.2606, pruned_loss=0.05292, ctc_loss=0.1145, cr_loss=0.3887, over 6719060.53 frames. ], batch size: 103, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 23:09:04,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=767461.3333333334, ans=0.125 2024-09-19 23:09:07,950 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=767461.3333333334, ans=0.0 2024-09-19 23:09:09,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=767508.0, ans=0.0 2024-09-19 23:09:17,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=767508.0, ans=0.125 2024-09-19 23:09:20,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=15.0 2024-09-19 23:09:27,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=767554.6666666666, ans=0.025 2024-09-19 23:09:30,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=767554.6666666666, ans=0.125 2024-09-19 23:09:34,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=767554.6666666666, ans=0.2 2024-09-19 23:09:41,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=767554.6666666666, ans=12.0 2024-09-19 23:09:42,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=767554.6666666666, ans=0.125 2024-09-19 23:10:08,777 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.217e+02 2.505e+02 2.844e+02 3.275e+02 6.556e+02, threshold=5.687e+02, percent-clipped=2.0 2024-09-19 23:10:20,085 INFO [train.py:1198] (1/2) Epoch 43, batch 1700, loss[loss=0.1779, simple_loss=0.2354, pruned_loss=0.04349, ctc_loss=0.09826, cr_loss=0.345, over 34319.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.2605, pruned_loss=0.05285, ctc_loss=0.1144, cr_loss=0.3888, over 6744825.59 frames. ], batch size: 80, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 23:10:20,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=767694.6666666666, ans=0.0 2024-09-19 23:10:22,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=767694.6666666666, ans=0.125 2024-09-19 23:10:48,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=767741.3333333334, ans=0.1 2024-09-19 23:10:53,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=767788.0, ans=0.0 2024-09-19 23:11:02,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=767788.0, ans=0.2 2024-09-19 23:11:04,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=767788.0, ans=0.1 2024-09-19 23:11:42,061 INFO [train.py:1198] (1/2) Epoch 43, batch 1750, loss[loss=0.1624, simple_loss=0.219, pruned_loss=0.03814, ctc_loss=0.086, cr_loss=0.311, over 34155.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2601, pruned_loss=0.05276, ctc_loss=0.1142, cr_loss=0.3881, over 6753614.87 frames. ], batch size: 78, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 23:11:45,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=767928.0, ans=0.025 2024-09-19 23:12:06,019 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2024-09-19 23:12:08,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=767974.6666666666, ans=0.2 2024-09-19 23:12:12,244 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:12:28,684 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=768021.3333333334, ans=0.07 2024-09-19 23:12:48,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=768114.6666666666, ans=0.025 2024-09-19 23:12:52,802 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.226e+02 2.551e+02 2.920e+02 3.581e+02 6.519e+02, threshold=5.840e+02, percent-clipped=3.0 2024-09-19 23:12:54,090 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2024-09-19 23:12:59,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=768114.6666666666, ans=0.0 2024-09-19 23:13:03,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.51 vs. limit=10.0 2024-09-19 23:13:06,168 INFO [train.py:1198] (1/2) Epoch 43, batch 1800, loss[loss=0.2104, simple_loss=0.2714, pruned_loss=0.05511, ctc_loss=0.1192, cr_loss=0.3842, over 34691.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.2605, pruned_loss=0.05287, ctc_loss=0.1143, cr_loss=0.3884, over 6756616.92 frames. ], batch size: 97, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 23:13:13,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=768161.3333333334, ans=0.0 2024-09-19 23:13:21,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=768208.0, ans=0.1 2024-09-19 23:13:23,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=768208.0, ans=0.0 2024-09-19 23:13:39,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=768254.6666666666, ans=0.0 2024-09-19 23:14:01,410 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.704e-02 2024-09-19 23:14:17,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=768348.0, ans=0.1 2024-09-19 23:14:30,496 INFO [train.py:1198] (1/2) Epoch 43, batch 1850, loss[loss=0.2087, simple_loss=0.2702, pruned_loss=0.05418, ctc_loss=0.1149, cr_loss=0.393, over 34425.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2604, pruned_loss=0.05297, ctc_loss=0.1145, cr_loss=0.3886, over 6763329.86 frames. ], batch size: 100, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 23:14:38,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=768394.6666666666, ans=0.1 2024-09-19 23:14:46,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=768441.3333333334, ans=0.0 2024-09-19 23:14:50,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=768441.3333333334, ans=0.125 2024-09-19 23:14:59,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=13.01 vs. limit=15.0 2024-09-19 23:15:16,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=768488.0, ans=0.0 2024-09-19 23:15:25,582 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.23 vs. limit=10.0 2024-09-19 23:15:37,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=768581.3333333334, ans=0.125 2024-09-19 23:15:39,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=768581.3333333334, ans=0.125 2024-09-19 23:15:40,758 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.038e+02 2.629e+02 3.255e+02 4.291e+02 9.630e+02, threshold=6.511e+02, percent-clipped=6.0 2024-09-19 23:15:49,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=768581.3333333334, ans=0.2 2024-09-19 23:15:52,095 INFO [train.py:1198] (1/2) Epoch 43, batch 1900, loss[loss=0.206, simple_loss=0.267, pruned_loss=0.05343, ctc_loss=0.1142, cr_loss=0.3848, over 34366.00 frames. ], tot_loss[loss=0.203, simple_loss=0.2611, pruned_loss=0.05316, ctc_loss=0.1149, cr_loss=0.3892, over 6772150.86 frames. ], batch size: 103, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 23:16:14,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.16 vs. limit=15.0 2024-09-19 23:16:23,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=768721.3333333334, ans=0.2 2024-09-19 23:16:36,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=768721.3333333334, ans=0.125 2024-09-19 23:16:38,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=768721.3333333334, ans=0.2 2024-09-19 23:16:58,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=768814.6666666666, ans=0.1 2024-09-19 23:17:04,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=768814.6666666666, ans=0.0 2024-09-19 23:17:14,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=768861.3333333334, ans=0.0 2024-09-19 23:17:15,768 INFO [train.py:1198] (1/2) Epoch 43, batch 1950, loss[loss=0.2021, simple_loss=0.2595, pruned_loss=0.05292, ctc_loss=0.1157, cr_loss=0.3909, over 34345.00 frames. ], tot_loss[loss=0.2038, simple_loss=0.2621, pruned_loss=0.05342, ctc_loss=0.1154, cr_loss=0.3907, over 6788773.78 frames. ], batch size: 91, lr: 2.77e-03, grad_scale: 16.0 2024-09-19 23:17:17,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=768861.3333333334, ans=0.125 2024-09-19 23:17:36,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=768908.0, ans=0.0 2024-09-19 23:17:38,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=768908.0, ans=0.0 2024-09-19 23:18:00,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2024-09-19 23:18:28,617 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.203e+02 2.520e+02 2.968e+02 3.656e+02 6.828e+02, threshold=5.935e+02, percent-clipped=1.0 2024-09-19 23:18:40,476 INFO [train.py:1198] (1/2) Epoch 43, batch 2000, loss[loss=0.1754, simple_loss=0.2309, pruned_loss=0.04362, ctc_loss=0.09589, cr_loss=0.3363, over 34147.00 frames. ], tot_loss[loss=0.2039, simple_loss=0.2624, pruned_loss=0.05341, ctc_loss=0.1153, cr_loss=0.3902, over 6763925.57 frames. ], batch size: 78, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:18:50,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=769094.6666666666, ans=0.2 2024-09-19 23:19:40,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=769234.6666666666, ans=0.125 2024-09-19 23:19:41,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=769234.6666666666, ans=0.0 2024-09-19 23:20:01,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=769328.0, ans=0.09899494936611666 2024-09-19 23:20:02,950 INFO [train.py:1198] (1/2) Epoch 43, batch 2050, loss[loss=0.182, simple_loss=0.2405, pruned_loss=0.04447, ctc_loss=0.09977, cr_loss=0.3643, over 34471.00 frames. ], tot_loss[loss=0.2025, simple_loss=0.2609, pruned_loss=0.05283, ctc_loss=0.1143, cr_loss=0.3877, over 6754974.57 frames. ], batch size: 82, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:20:22,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=769374.6666666666, ans=0.125 2024-09-19 23:20:22,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=769374.6666666666, ans=0.125 2024-09-19 23:20:26,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=769374.6666666666, ans=0.0 2024-09-19 23:20:32,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=769374.6666666666, ans=0.1 2024-09-19 23:20:41,559 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.63 vs. limit=15.0 2024-09-19 23:20:41,763 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.96 vs. limit=15.0 2024-09-19 23:20:54,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=20.40 vs. limit=22.5 2024-09-19 23:20:57,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=769468.0, ans=0.1 2024-09-19 23:21:01,310 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=6.93 vs. limit=15.0 2024-09-19 23:21:14,767 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.195e+02 2.628e+02 3.116e+02 4.039e+02 7.687e+02, threshold=6.231e+02, percent-clipped=3.0 2024-09-19 23:21:21,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.29 vs. limit=15.0 2024-09-19 23:21:28,268 INFO [train.py:1198] (1/2) Epoch 43, batch 2100, loss[loss=0.1902, simple_loss=0.2529, pruned_loss=0.04614, ctc_loss=0.1024, cr_loss=0.3677, over 34517.00 frames. ], tot_loss[loss=0.2021, simple_loss=0.2605, pruned_loss=0.05271, ctc_loss=0.114, cr_loss=0.3867, over 6769268.87 frames. ], batch size: 94, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:21:46,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=769608.0, ans=0.0 2024-09-19 23:21:58,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=769608.0, ans=0.2 2024-09-19 23:22:20,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=769701.3333333334, ans=0.125 2024-09-19 23:22:21,558 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.80 vs. limit=22.5 2024-09-19 23:22:21,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=769701.3333333334, ans=6.0 2024-09-19 23:22:31,235 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=12.0 2024-09-19 23:22:33,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.77 vs. limit=22.5 2024-09-19 23:22:40,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=769748.0, ans=0.125 2024-09-19 23:22:48,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=769794.6666666666, ans=0.1 2024-09-19 23:22:50,097 INFO [train.py:1198] (1/2) Epoch 43, batch 2150, loss[loss=0.2086, simple_loss=0.2601, pruned_loss=0.05845, ctc_loss=0.1196, cr_loss=0.4068, over 34348.00 frames. ], tot_loss[loss=0.2011, simple_loss=0.2595, pruned_loss=0.05235, ctc_loss=0.1133, cr_loss=0.385, over 6788449.36 frames. ], batch size: 91, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:22:53,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=769794.6666666666, ans=0.2 2024-09-19 23:23:20,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=769841.3333333334, ans=0.0 2024-09-19 23:23:33,989 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=7.74 vs. limit=15.0 2024-09-19 23:23:38,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=769934.6666666666, ans=0.125 2024-09-19 23:23:46,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=769934.6666666666, ans=0.0 2024-09-19 23:24:01,012 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.183e+02 2.714e+02 3.217e+02 4.275e+02 6.954e+02, threshold=6.435e+02, percent-clipped=4.0 2024-09-19 23:24:10,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.20 vs. limit=10.0 2024-09-19 23:24:13,225 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:24:14,503 INFO [train.py:1198] (1/2) Epoch 43, batch 2200, loss[loss=0.2195, simple_loss=0.2787, pruned_loss=0.05914, ctc_loss=0.126, cr_loss=0.4216, over 34437.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2598, pruned_loss=0.05249, ctc_loss=0.1135, cr_loss=0.3858, over 6782958.15 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:24:26,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=770028.0, ans=0.1 2024-09-19 23:24:29,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=770074.6666666666, ans=0.1 2024-09-19 23:24:32,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=770074.6666666666, ans=0.035 2024-09-19 23:24:34,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=770074.6666666666, ans=0.125 2024-09-19 23:24:43,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2024-09-19 23:25:08,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=770168.0, ans=0.125 2024-09-19 23:25:38,614 INFO [train.py:1198] (1/2) Epoch 43, batch 2250, loss[loss=0.2045, simple_loss=0.2641, pruned_loss=0.05322, ctc_loss=0.1154, cr_loss=0.3808, over 34423.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2597, pruned_loss=0.05251, ctc_loss=0.1136, cr_loss=0.386, over 6779895.40 frames. ], batch size: 95, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:25:43,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=770261.3333333334, ans=0.0 2024-09-19 23:25:58,518 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:26:01,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=770308.0, ans=0.0 2024-09-19 23:26:06,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=770308.0, ans=0.025 2024-09-19 23:26:18,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=770354.6666666666, ans=0.1 2024-09-19 23:26:37,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=770401.3333333334, ans=0.025 2024-09-19 23:26:37,807 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=770401.3333333334, ans=0.0 2024-09-19 23:26:48,892 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.099e+02 2.567e+02 3.084e+02 3.690e+02 5.598e+02, threshold=6.168e+02, percent-clipped=0.0 2024-09-19 23:26:55,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=770448.0, ans=0.0 2024-09-19 23:26:57,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=770448.0, ans=0.125 2024-09-19 23:27:00,540 INFO [train.py:1198] (1/2) Epoch 43, batch 2300, loss[loss=0.1815, simple_loss=0.2353, pruned_loss=0.04623, ctc_loss=0.1033, cr_loss=0.3635, over 34685.00 frames. ], tot_loss[loss=0.2003, simple_loss=0.2586, pruned_loss=0.05205, ctc_loss=0.1127, cr_loss=0.3837, over 6764783.58 frames. ], batch size: 84, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:27:24,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=770541.3333333334, ans=0.125 2024-09-19 23:27:26,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=770541.3333333334, ans=0.125 2024-09-19 23:27:28,986 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.08 vs. limit=22.5 2024-09-19 23:27:56,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=770634.6666666666, ans=0.1 2024-09-19 23:28:24,075 INFO [train.py:1198] (1/2) Epoch 43, batch 2350, loss[loss=0.2055, simple_loss=0.2646, pruned_loss=0.05354, ctc_loss=0.1153, cr_loss=0.4055, over 34696.00 frames. ], tot_loss[loss=0.2007, simple_loss=0.2589, pruned_loss=0.05227, ctc_loss=0.1131, cr_loss=0.3846, over 6772189.14 frames. ], batch size: 97, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:28:30,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=770728.0, ans=0.025 2024-09-19 23:28:46,444 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=15.0 2024-09-19 23:29:05,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=770821.3333333334, ans=0.0 2024-09-19 23:29:07,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=770821.3333333334, ans=0.025 2024-09-19 23:29:12,180 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:29:13,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=770868.0, ans=0.05 2024-09-19 23:29:16,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=770868.0, ans=0.125 2024-09-19 23:29:24,959 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=770868.0, ans=0.125 2024-09-19 23:29:36,316 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.208e+02 2.514e+02 2.816e+02 3.402e+02 5.389e+02, threshold=5.633e+02, percent-clipped=0.0 2024-09-19 23:29:43,664 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=770914.6666666666, ans=0.2 2024-09-19 23:29:48,296 INFO [train.py:1198] (1/2) Epoch 43, batch 2400, loss[loss=0.1909, simple_loss=0.2486, pruned_loss=0.04862, ctc_loss=0.107, cr_loss=0.364, over 34575.00 frames. ], tot_loss[loss=0.2011, simple_loss=0.2593, pruned_loss=0.05244, ctc_loss=0.1134, cr_loss=0.3851, over 6776320.94 frames. ], batch size: 89, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:29:58,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=770961.3333333334, ans=0.09899494936611666 2024-09-19 23:30:03,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=771008.0, ans=0.125 2024-09-19 23:30:46,614 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:30:54,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=771148.0, ans=0.0 2024-09-19 23:30:59,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=771148.0, ans=0.0 2024-09-19 23:31:07,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=771148.0, ans=0.125 2024-09-19 23:31:07,980 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:31:10,722 INFO [train.py:1198] (1/2) Epoch 43, batch 2450, loss[loss=0.2095, simple_loss=0.2689, pruned_loss=0.0553, ctc_loss=0.1182, cr_loss=0.3949, over 34416.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.2604, pruned_loss=0.0529, ctc_loss=0.1143, cr_loss=0.3873, over 6750076.62 frames. ], batch size: 95, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:31:13,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.81 vs. limit=12.0 2024-09-19 23:31:23,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=771194.6666666666, ans=0.2 2024-09-19 23:31:25,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=771241.3333333334, ans=0.125 2024-09-19 23:31:27,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=771241.3333333334, ans=0.125 2024-09-19 23:31:37,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.78 vs. limit=15.0 2024-09-19 23:31:38,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=771241.3333333334, ans=0.125 2024-09-19 23:32:22,683 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.226e+02 2.600e+02 2.989e+02 3.954e+02 6.420e+02, threshold=5.977e+02, percent-clipped=2.0 2024-09-19 23:32:32,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=771428.0, ans=0.0 2024-09-19 23:32:34,177 INFO [train.py:1198] (1/2) Epoch 43, batch 2500, loss[loss=0.2085, simple_loss=0.271, pruned_loss=0.05349, ctc_loss=0.1149, cr_loss=0.4034, over 34427.00 frames. ], tot_loss[loss=0.2022, simple_loss=0.2603, pruned_loss=0.05286, ctc_loss=0.1142, cr_loss=0.3876, over 6762387.97 frames. ], batch size: 100, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:32:34,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771428.0, ans=0.1 2024-09-19 23:32:49,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=771428.0, ans=0.1 2024-09-19 23:33:04,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=771474.6666666666, ans=0.125 2024-09-19 23:33:36,837 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.53 vs. limit=6.0 2024-09-19 23:33:50,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=771614.6666666666, ans=0.04949747468305833 2024-09-19 23:33:53,874 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=771614.6666666666, ans=0.0 2024-09-19 23:33:54,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=771614.6666666666, ans=0.125 2024-09-19 23:33:58,581 INFO [train.py:1198] (1/2) Epoch 43, batch 2550, loss[loss=0.1803, simple_loss=0.2322, pruned_loss=0.04693, ctc_loss=0.1026, cr_loss=0.3518, over 34191.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2601, pruned_loss=0.05283, ctc_loss=0.114, cr_loss=0.3872, over 6766529.37 frames. ], batch size: 78, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:33:58,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=771661.3333333334, ans=0.04949747468305833 2024-09-19 23:34:02,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=771661.3333333334, ans=0.2 2024-09-19 23:34:05,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=771661.3333333334, ans=0.2 2024-09-19 23:34:09,015 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=22.5 2024-09-19 23:34:32,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.27 vs. limit=15.0 2024-09-19 23:34:37,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=771754.6666666666, ans=0.125 2024-09-19 23:34:57,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=771801.3333333334, ans=0.125 2024-09-19 23:34:57,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=771801.3333333334, ans=0.125 2024-09-19 23:35:04,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.72 vs. limit=12.0 2024-09-19 23:35:10,650 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.148e+02 2.588e+02 2.987e+02 3.719e+02 5.722e+02, threshold=5.975e+02, percent-clipped=0.0 2024-09-19 23:35:14,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer_ff2.min_abs, batch_count=771848.0, ans=0.1 2024-09-19 23:35:17,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=771848.0, ans=0.0 2024-09-19 23:35:22,450 INFO [train.py:1198] (1/2) Epoch 43, batch 2600, loss[loss=0.2068, simple_loss=0.2619, pruned_loss=0.05579, ctc_loss=0.1213, cr_loss=0.3967, over 34348.00 frames. ], tot_loss[loss=0.2022, simple_loss=0.2604, pruned_loss=0.05283, ctc_loss=0.114, cr_loss=0.3872, over 6762598.10 frames. ], batch size: 91, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:35:29,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=771894.6666666666, ans=0.125 2024-09-19 23:35:39,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2024-09-19 23:35:44,366 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=8.19 vs. limit=15.0 2024-09-19 23:35:55,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=771988.0, ans=0.2 2024-09-19 23:36:19,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=772034.6666666666, ans=0.0 2024-09-19 23:36:23,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=772034.6666666666, ans=0.2 2024-09-19 23:36:23,623 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.82 vs. limit=12.0 2024-09-19 23:36:45,885 INFO [train.py:1198] (1/2) Epoch 43, batch 2650, loss[loss=0.2011, simple_loss=0.2676, pruned_loss=0.04886, ctc_loss=0.109, cr_loss=0.3738, over 34227.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.2607, pruned_loss=0.05277, ctc_loss=0.114, cr_loss=0.3871, over 6769790.87 frames. ], batch size: 117, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:36:57,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=772128.0, ans=0.125 2024-09-19 23:36:59,711 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-09-19 23:37:49,914 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=772314.6666666666, ans=0.125 2024-09-19 23:37:56,222 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.003e+02 2.668e+02 3.136e+02 3.846e+02 5.440e+02, threshold=6.272e+02, percent-clipped=0.0 2024-09-19 23:37:58,400 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.622e-02 2024-09-19 23:38:03,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=772314.6666666666, ans=0.125 2024-09-19 23:38:07,843 INFO [train.py:1198] (1/2) Epoch 43, batch 2700, loss[loss=0.2183, simple_loss=0.2762, pruned_loss=0.05959, ctc_loss=0.125, cr_loss=0.4073, over 34609.00 frames. ], tot_loss[loss=0.2029, simple_loss=0.2612, pruned_loss=0.05305, ctc_loss=0.1145, cr_loss=0.3885, over 6764359.87 frames. ], batch size: 102, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:38:09,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=772361.3333333334, ans=0.025 2024-09-19 23:38:16,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=772361.3333333334, ans=0.125 2024-09-19 23:38:34,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=772408.0, ans=0.2 2024-09-19 23:38:39,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=772454.6666666666, ans=0.125 2024-09-19 23:38:40,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=772454.6666666666, ans=0.125 2024-09-19 23:38:56,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=772454.6666666666, ans=0.125 2024-09-19 23:38:57,580 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=772501.3333333334, ans=0.125 2024-09-19 23:38:57,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=772501.3333333334, ans=0.0 2024-09-19 23:39:04,141 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:39:23,103 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2024-09-19 23:39:30,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=772594.6666666666, ans=0.125 2024-09-19 23:39:31,750 INFO [train.py:1198] (1/2) Epoch 43, batch 2750, loss[loss=0.1892, simple_loss=0.2455, pruned_loss=0.04868, ctc_loss=0.1043, cr_loss=0.3692, over 34612.00 frames. ], tot_loss[loss=0.2018, simple_loss=0.2601, pruned_loss=0.05263, ctc_loss=0.1137, cr_loss=0.3864, over 6762469.24 frames. ], batch size: 88, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:39:37,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=772594.6666666666, ans=0.125 2024-09-19 23:39:41,878 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=772594.6666666666, ans=0.125 2024-09-19 23:40:06,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=772688.0, ans=0.05 2024-09-19 23:40:11,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=772688.0, ans=0.125 2024-09-19 23:40:23,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=772734.6666666666, ans=0.125 2024-09-19 23:40:29,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772734.6666666666, ans=0.1 2024-09-19 23:40:44,631 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.116e+02 2.619e+02 3.170e+02 4.031e+02 6.078e+02, threshold=6.341e+02, percent-clipped=0.0 2024-09-19 23:40:56,708 INFO [train.py:1198] (1/2) Epoch 43, batch 2800, loss[loss=0.2295, simple_loss=0.2856, pruned_loss=0.06471, ctc_loss=0.137, cr_loss=0.4156, over 23787.00 frames. ], tot_loss[loss=0.2022, simple_loss=0.2603, pruned_loss=0.05287, ctc_loss=0.1142, cr_loss=0.3876, over 6740693.57 frames. ], batch size: 244, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:40:58,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=772828.0, ans=0.1 2024-09-19 23:41:15,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=772874.6666666666, ans=0.1 2024-09-19 23:41:16,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=772874.6666666666, ans=0.0 2024-09-19 23:41:40,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=772921.3333333334, ans=0.125 2024-09-19 23:41:41,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=772921.3333333334, ans=0.5 2024-09-19 23:41:50,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=15.0 2024-09-19 23:41:59,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=772968.0, ans=0.125 2024-09-19 23:42:14,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=773014.6666666666, ans=0.125 2024-09-19 23:42:18,888 INFO [train.py:1198] (1/2) Epoch 43, batch 2850, loss[loss=0.2022, simple_loss=0.2558, pruned_loss=0.05468, ctc_loss=0.1192, cr_loss=0.387, over 34470.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2605, pruned_loss=0.05293, ctc_loss=0.1143, cr_loss=0.3875, over 6724538.34 frames. ], batch size: 90, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:42:39,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=773108.0, ans=0.07 2024-09-19 23:42:42,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=773108.0, ans=0.0 2024-09-19 23:42:49,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=773108.0, ans=0.0 2024-09-19 23:43:07,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=773154.6666666666, ans=0.125 2024-09-19 23:43:12,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=773201.3333333334, ans=0.125 2024-09-19 23:43:16,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.01 vs. limit=15.0 2024-09-19 23:43:31,443 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.265e+02 2.589e+02 3.030e+02 3.782e+02 6.065e+02, threshold=6.061e+02, percent-clipped=0.0 2024-09-19 23:43:36,503 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=773248.0, ans=0.125 2024-09-19 23:43:42,861 INFO [train.py:1198] (1/2) Epoch 43, batch 2900, loss[loss=0.1832, simple_loss=0.2442, pruned_loss=0.04464, ctc_loss=0.09702, cr_loss=0.3367, over 34513.00 frames. ], tot_loss[loss=0.2031, simple_loss=0.2614, pruned_loss=0.05315, ctc_loss=0.1147, cr_loss=0.3887, over 6755457.93 frames. ], batch size: 94, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:43:59,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=773341.3333333334, ans=0.0 2024-09-19 23:44:27,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.01 vs. limit=15.0 2024-09-19 23:44:42,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=773434.6666666666, ans=0.5 2024-09-19 23:44:42,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=773434.6666666666, ans=0.0 2024-09-19 23:44:43,489 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=11.56 vs. limit=15.0 2024-09-19 23:44:51,444 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=773481.3333333334, ans=0.0 2024-09-19 23:44:55,021 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.86 vs. limit=6.0 2024-09-19 23:45:07,616 INFO [train.py:1198] (1/2) Epoch 43, batch 2950, loss[loss=0.1969, simple_loss=0.2522, pruned_loss=0.05148, ctc_loss=0.1143, cr_loss=0.3962, over 34615.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.2601, pruned_loss=0.05272, ctc_loss=0.1139, cr_loss=0.3861, over 6751303.03 frames. ], batch size: 88, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:45:10,323 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.18 vs. limit=15.0 2024-09-19 23:45:22,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=773574.6666666666, ans=0.025 2024-09-19 23:45:31,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=773574.6666666666, ans=0.125 2024-09-19 23:45:43,640 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.58 vs. limit=15.0 2024-09-19 23:45:44,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=773621.3333333334, ans=0.0 2024-09-19 23:45:47,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=773621.3333333334, ans=0.09899494936611666 2024-09-19 23:46:00,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=773668.0, ans=0.0 2024-09-19 23:46:20,251 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.139e+02 2.512e+02 2.955e+02 3.556e+02 5.159e+02, threshold=5.910e+02, percent-clipped=0.0 2024-09-19 23:46:29,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=773714.6666666666, ans=0.2 2024-09-19 23:46:32,218 INFO [train.py:1198] (1/2) Epoch 43, batch 3000, loss[loss=0.1958, simple_loss=0.2565, pruned_loss=0.04944, ctc_loss=0.1054, cr_loss=0.3775, over 34540.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2597, pruned_loss=0.05248, ctc_loss=0.1135, cr_loss=0.3854, over 6751420.44 frames. ], batch size: 94, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:46:32,219 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-19 23:46:49,007 INFO [train.py:1230] (1/2) Epoch 43, validation: loss=0.1495, simple_loss=0.2426, pruned_loss=0.02427, ctc_loss=0.03953, cr_loss=2.319e-14, over 944034.00 frames. 2024-09-19 23:46:49,007 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-19 23:47:28,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=773854.6666666666, ans=0.125 2024-09-19 23:47:36,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=773901.3333333334, ans=0.1 2024-09-19 23:47:37,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=22.5 2024-09-19 23:48:12,118 INFO [train.py:1198] (1/2) Epoch 43, batch 3050, loss[loss=0.1879, simple_loss=0.247, pruned_loss=0.04714, ctc_loss=0.1032, cr_loss=0.3472, over 34607.00 frames. ], tot_loss[loss=0.2025, simple_loss=0.2606, pruned_loss=0.053, ctc_loss=0.1144, cr_loss=0.3873, over 6742671.03 frames. ], batch size: 89, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:48:33,533 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=774041.3333333334, ans=0.2 2024-09-19 23:48:40,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.45 vs. limit=10.0 2024-09-19 23:48:55,249 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.69 vs. limit=22.5 2024-09-19 23:49:00,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=774134.6666666666, ans=0.125 2024-09-19 23:49:21,581 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.168e+02 2.480e+02 2.859e+02 3.318e+02 6.210e+02, threshold=5.718e+02, percent-clipped=1.0 2024-09-19 23:49:26,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=774181.3333333334, ans=0.0 2024-09-19 23:49:32,796 INFO [train.py:1198] (1/2) Epoch 43, batch 3100, loss[loss=0.2154, simple_loss=0.2738, pruned_loss=0.05797, ctc_loss=0.1255, cr_loss=0.3996, over 34197.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2605, pruned_loss=0.05297, ctc_loss=0.1144, cr_loss=0.3872, over 6741879.58 frames. ], batch size: 117, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:49:36,263 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=774228.0, ans=0.0 2024-09-19 23:49:36,749 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=15.0 2024-09-19 23:49:50,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=774274.6666666666, ans=0.07 2024-09-19 23:49:59,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=22.5 2024-09-19 23:50:08,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=774321.3333333334, ans=0.0 2024-09-19 23:50:19,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=774368.0, ans=0.05 2024-09-19 23:50:24,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=774368.0, ans=0.125 2024-09-19 23:50:31,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774368.0, ans=0.1 2024-09-19 23:50:31,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.88 vs. limit=10.0 2024-09-19 23:50:46,181 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.02 vs. limit=22.5 2024-09-19 23:50:53,282 INFO [train.py:1198] (1/2) Epoch 43, batch 3150, loss[loss=0.1993, simple_loss=0.2635, pruned_loss=0.04941, ctc_loss=0.1086, cr_loss=0.3669, over 33933.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.2603, pruned_loss=0.05265, ctc_loss=0.1139, cr_loss=0.3859, over 6748246.88 frames. ], batch size: 122, lr: 2.76e-03, grad_scale: 32.0 2024-09-19 23:51:08,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=774508.0, ans=0.1 2024-09-19 23:51:28,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=774554.6666666666, ans=0.125 2024-09-19 23:51:43,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=774601.3333333334, ans=0.0 2024-09-19 23:51:44,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=774601.3333333334, ans=0.025 2024-09-19 23:51:46,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=774601.3333333334, ans=0.125 2024-09-19 23:51:51,346 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774601.3333333334, ans=0.1 2024-09-19 23:52:03,825 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.514e+02 2.856e+02 3.535e+02 6.755e+02, threshold=5.712e+02, percent-clipped=5.0 2024-09-19 23:52:09,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=774648.0, ans=0.0 2024-09-19 23:52:15,353 INFO [train.py:1198] (1/2) Epoch 43, batch 3200, loss[loss=0.1919, simple_loss=0.248, pruned_loss=0.04971, ctc_loss=0.1077, cr_loss=0.368, over 34524.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2597, pruned_loss=0.0524, ctc_loss=0.1135, cr_loss=0.3848, over 6762425.34 frames. ], batch size: 94, lr: 2.75e-03, grad_scale: 32.0 2024-09-19 23:52:17,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.78 vs. limit=15.0 2024-09-19 23:52:25,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=774694.6666666666, ans=0.5 2024-09-19 23:52:26,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=774694.6666666666, ans=0.2 2024-09-19 23:52:34,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=774741.3333333334, ans=0.0 2024-09-19 23:52:49,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=774788.0, ans=0.0 2024-09-19 23:53:02,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=774834.6666666666, ans=0.0 2024-09-19 23:53:05,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774834.6666666666, ans=0.1 2024-09-19 23:53:10,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=774834.6666666666, ans=0.125 2024-09-19 23:53:18,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=774881.3333333334, ans=0.2 2024-09-19 23:53:35,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.87 vs. limit=22.5 2024-09-19 23:53:35,873 INFO [train.py:1198] (1/2) Epoch 43, batch 3250, loss[loss=0.2093, simple_loss=0.2663, pruned_loss=0.05625, ctc_loss=0.1201, cr_loss=0.3943, over 34677.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2604, pruned_loss=0.0527, ctc_loss=0.1139, cr_loss=0.3863, over 6771771.06 frames. ], batch size: 98, lr: 2.75e-03, grad_scale: 32.0 2024-09-19 23:53:56,184 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.52 vs. limit=22.5 2024-09-19 23:54:03,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=774974.6666666666, ans=0.125 2024-09-19 23:54:25,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=775068.0, ans=0.1 2024-09-19 23:54:32,861 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.97 vs. limit=15.0 2024-09-19 23:54:40,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=775114.6666666666, ans=0.125 2024-09-19 23:54:46,414 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.146e+02 2.534e+02 2.813e+02 3.707e+02 6.656e+02, threshold=5.625e+02, percent-clipped=1.0 2024-09-19 23:54:47,319 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.46 vs. limit=15.0 2024-09-19 23:54:55,609 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=12.0 2024-09-19 23:54:57,954 INFO [train.py:1198] (1/2) Epoch 43, batch 3300, loss[loss=0.2113, simple_loss=0.2715, pruned_loss=0.05559, ctc_loss=0.1198, cr_loss=0.3994, over 33255.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2591, pruned_loss=0.0523, ctc_loss=0.1131, cr_loss=0.3845, over 6770746.93 frames. ], batch size: 130, lr: 2.75e-03, grad_scale: 32.0 2024-09-19 23:55:30,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.54 vs. limit=10.0 2024-09-19 23:55:43,023 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775254.6666666666, ans=0.1 2024-09-19 23:55:59,792 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.62 vs. limit=15.0 2024-09-19 23:56:00,942 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=775348.0, ans=0.0 2024-09-19 23:56:18,043 INFO [train.py:1198] (1/2) Epoch 43, batch 3350, loss[loss=0.2092, simple_loss=0.2688, pruned_loss=0.05477, ctc_loss=0.1185, cr_loss=0.4082, over 33863.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.2602, pruned_loss=0.0527, ctc_loss=0.114, cr_loss=0.3866, over 6744548.55 frames. ], batch size: 122, lr: 2.75e-03, grad_scale: 32.0 2024-09-19 23:56:29,579 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=775394.6666666666, ans=0.025 2024-09-19 23:56:37,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=775441.3333333334, ans=0.125 2024-09-19 23:56:50,117 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:57:06,307 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-19 23:57:27,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.41 vs. limit=15.0 2024-09-19 23:57:28,347 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.097e+02 2.520e+02 2.731e+02 3.159e+02 5.043e+02, threshold=5.462e+02, percent-clipped=0.0 2024-09-19 23:57:37,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=775581.3333333334, ans=0.125 2024-09-19 23:57:38,737 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=775628.0, ans=0.125 2024-09-19 23:57:40,037 INFO [train.py:1198] (1/2) Epoch 43, batch 3400, loss[loss=0.187, simple_loss=0.2424, pruned_loss=0.04833, ctc_loss=0.1033, cr_loss=0.3586, over 34198.00 frames. ], tot_loss[loss=0.2022, simple_loss=0.2605, pruned_loss=0.05279, ctc_loss=0.1141, cr_loss=0.3872, over 6734604.31 frames. ], batch size: 78, lr: 2.75e-03, grad_scale: 32.0 2024-09-19 23:58:10,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=775721.3333333334, ans=0.1 2024-09-19 23:58:17,370 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=775721.3333333334, ans=0.125 2024-09-19 23:58:35,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=775768.0, ans=0.125 2024-09-19 23:58:52,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=775814.6666666666, ans=0.0 2024-09-19 23:59:00,482 INFO [train.py:1198] (1/2) Epoch 43, batch 3450, loss[loss=0.2072, simple_loss=0.2702, pruned_loss=0.05298, ctc_loss=0.1137, cr_loss=0.3867, over 32982.00 frames. ], tot_loss[loss=0.2021, simple_loss=0.2607, pruned_loss=0.05262, ctc_loss=0.1138, cr_loss=0.3867, over 6747344.96 frames. ], batch size: 130, lr: 2.75e-03, grad_scale: 32.0 2024-09-19 23:59:21,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=775908.0, ans=0.0 2024-09-20 00:00:10,863 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.214e+02 2.543e+02 2.916e+02 3.404e+02 8.643e+02, threshold=5.833e+02, percent-clipped=4.0 2024-09-20 00:00:22,063 INFO [train.py:1198] (1/2) Epoch 43, batch 3500, loss[loss=0.1804, simple_loss=0.2397, pruned_loss=0.0439, ctc_loss=0.09678, cr_loss=0.3455, over 34460.00 frames. ], tot_loss[loss=0.2018, simple_loss=0.2601, pruned_loss=0.05258, ctc_loss=0.1137, cr_loss=0.3866, over 6747760.66 frames. ], batch size: 85, lr: 2.75e-03, grad_scale: 32.0 2024-09-20 00:00:24,516 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.60 vs. limit=15.0 2024-09-20 00:01:23,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=776234.6666666666, ans=0.125 2024-09-20 00:01:42,529 INFO [train.py:1198] (1/2) Epoch 43, batch 3550, loss[loss=0.199, simple_loss=0.2677, pruned_loss=0.04727, ctc_loss=0.1055, cr_loss=0.3643, over 34405.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2605, pruned_loss=0.05263, ctc_loss=0.1138, cr_loss=0.3872, over 6756616.78 frames. ], batch size: 103, lr: 2.75e-03, grad_scale: 32.0 2024-09-20 00:01:44,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=776328.0, ans=0.125 2024-09-20 00:02:12,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=776421.3333333334, ans=0.0 2024-09-20 00:02:13,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=776421.3333333334, ans=0.2 2024-09-20 00:02:21,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=776421.3333333334, ans=0.0 2024-09-20 00:02:25,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=776421.3333333334, ans=0.1 2024-09-20 00:02:30,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=776468.0, ans=0.1 2024-09-20 00:02:37,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=776468.0, ans=15.0 2024-09-20 00:02:51,179 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.143e+02 2.614e+02 2.947e+02 3.784e+02 6.519e+02, threshold=5.894e+02, percent-clipped=3.0 2024-09-20 00:02:52,483 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=22.5 2024-09-20 00:02:55,573 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.66 vs. limit=15.0 2024-09-20 00:02:59,868 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=776514.6666666666, ans=0.125 2024-09-20 00:02:59,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=776514.6666666666, ans=0.125 2024-09-20 00:03:01,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=776561.3333333334, ans=0.0 2024-09-20 00:03:01,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=776561.3333333334, ans=0.125 2024-09-20 00:03:02,756 INFO [train.py:1198] (1/2) Epoch 43, batch 3600, loss[loss=0.1932, simple_loss=0.2492, pruned_loss=0.05001, ctc_loss=0.109, cr_loss=0.3865, over 34474.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.2604, pruned_loss=0.05257, ctc_loss=0.1137, cr_loss=0.3868, over 6767008.68 frames. ], batch size: 90, lr: 2.75e-03, grad_scale: 32.0 2024-09-20 00:03:04,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=776561.3333333334, ans=0.125 2024-09-20 00:03:15,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=776561.3333333334, ans=0.2 2024-09-20 00:03:21,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=776608.0, ans=0.5 2024-09-20 00:04:23,738 INFO [train.py:1198] (1/2) Epoch 43, batch 3650, loss[loss=0.22, simple_loss=0.2761, pruned_loss=0.06104, ctc_loss=0.125, cr_loss=0.4222, over 34467.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2598, pruned_loss=0.05246, ctc_loss=0.1135, cr_loss=0.3862, over 6770901.75 frames. ], batch size: 110, lr: 2.75e-03, grad_scale: 64.0 2024-09-20 00:04:40,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=776841.3333333334, ans=0.1 2024-09-20 00:04:42,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=776841.3333333334, ans=0.025 2024-09-20 00:04:43,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.72 vs. limit=15.0 2024-09-20 00:04:48,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=776841.3333333334, ans=0.025 2024-09-20 00:05:09,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=776888.0, ans=0.125 2024-09-20 00:05:15,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=776934.6666666666, ans=0.1 2024-09-20 00:05:20,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=776934.6666666666, ans=0.125 2024-09-20 00:05:28,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=776981.3333333334, ans=0.1 2024-09-20 00:05:28,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=776981.3333333334, ans=15.0 2024-09-20 00:05:34,547 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.082e+02 2.534e+02 2.832e+02 3.672e+02 7.371e+02, threshold=5.665e+02, percent-clipped=5.0 2024-09-20 00:05:36,578 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=776981.3333333334, ans=0.0 2024-09-20 00:05:44,022 INFO [train.py:1198] (1/2) Epoch 43, batch 3700, loss[loss=0.1988, simple_loss=0.2659, pruned_loss=0.04798, ctc_loss=0.1062, cr_loss=0.3637, over 34582.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2599, pruned_loss=0.05231, ctc_loss=0.1132, cr_loss=0.3851, over 6785635.48 frames. ], batch size: 102, lr: 2.75e-03, grad_scale: 32.0 2024-09-20 00:06:02,010 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=777074.6666666666, ans=0.125 2024-09-20 00:06:03,616 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=777074.6666666666, ans=0.0 2024-09-20 00:07:04,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=777261.3333333334, ans=0.025 2024-09-20 00:07:05,412 INFO [train.py:1198] (1/2) Epoch 43, batch 3750, loss[loss=0.2071, simple_loss=0.2681, pruned_loss=0.05371, ctc_loss=0.1154, cr_loss=0.3889, over 34342.00 frames. ], tot_loss[loss=0.2044, simple_loss=0.2629, pruned_loss=0.05356, ctc_loss=0.1157, cr_loss=0.3912, over 6787273.91 frames. ], batch size: 113, lr: 2.75e-03, grad_scale: 32.0 2024-09-20 00:07:21,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=777308.0, ans=0.125 2024-09-20 00:07:29,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.69 vs. limit=15.0 2024-09-20 00:07:58,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=777401.3333333334, ans=0.025 2024-09-20 00:08:05,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.63 vs. limit=22.5 2024-09-20 00:08:06,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2024-09-20 00:08:15,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=777448.0, ans=0.2 2024-09-20 00:08:17,225 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.225e+02 2.417e+02 2.571e+02 2.846e+02 4.870e+02, threshold=5.143e+02, percent-clipped=0.0 2024-09-20 00:08:17,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=777448.0, ans=0.0 2024-09-20 00:08:26,917 INFO [train.py:1198] (1/2) Epoch 43, batch 3800, loss[loss=0.235, simple_loss=0.2858, pruned_loss=0.06895, ctc_loss=0.1404, cr_loss=0.4538, over 29670.00 frames. ], tot_loss[loss=0.207, simple_loss=0.2652, pruned_loss=0.0547, ctc_loss=0.1177, cr_loss=0.3958, over 6674892.64 frames. ], batch size: 175, lr: 2.75e-03, grad_scale: 32.0 2024-09-20 00:08:40,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=777494.6666666666, ans=0.125 2024-09-20 00:08:45,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=777541.3333333334, ans=0.125 2024-09-20 00:09:25,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=777634.6666666666, ans=0.125 2024-09-20 00:09:27,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.23 vs. limit=15.0 2024-09-20 00:09:29,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=777634.6666666666, ans=0.09899494936611666 2024-09-20 00:09:32,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten.whitening_limit, batch_count=777681.3333333334, ans=15.0 2024-09-20 00:09:38,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=777681.3333333334, ans=0.2 2024-09-20 00:09:45,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=777681.3333333334, ans=0.2 2024-09-20 00:09:49,885 INFO [train.py:1198] (1/2) Epoch 43, batch 3850, loss[loss=0.2175, simple_loss=0.2724, pruned_loss=0.06067, ctc_loss=0.1262, cr_loss=0.399, over 23375.00 frames. ], tot_loss[loss=0.2095, simple_loss=0.2668, pruned_loss=0.05603, ctc_loss=0.1205, cr_loss=0.3991, over 6248064.09 frames. ], batch size: 245, lr: 2.75e-03, grad_scale: 32.0 2024-09-20 00:09:53,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=777728.0, ans=0.04949747468305833 2024-09-20 00:10:01,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=777728.0, ans=0.2 2024-09-20 00:10:05,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=777774.6666666666, ans=0.0 2024-09-20 00:10:10,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=777774.6666666666, ans=0.0 2024-09-20 00:10:25,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=777821.3333333334, ans=0.0 2024-09-20 00:11:14,684 INFO [train.py:1198] (1/2) Epoch 44, batch 0, loss[loss=0.1968, simple_loss=0.2525, pruned_loss=0.05198, ctc_loss=0.1127, cr_loss=0.3631, over 34454.00 frames. ], tot_loss[loss=0.1968, simple_loss=0.2525, pruned_loss=0.05198, ctc_loss=0.1127, cr_loss=0.3631, over 34454.00 frames. ], batch size: 85, lr: 2.72e-03, grad_scale: 32.0 2024-09-20 00:11:14,684 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 00:11:31,556 INFO [train.py:1230] (1/2) Epoch 44, validation: loss=0.1487, simple_loss=0.2427, pruned_loss=0.02346, ctc_loss=0.03898, cr_loss=2.274e-14, over 944034.00 frames. 2024-09-20 00:11:31,556 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-20 00:11:36,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=777849.3333333334, ans=0.125 2024-09-20 00:11:41,010 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.38 vs. limit=10.0 2024-09-20 00:11:52,257 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=777896.0, ans=0.2 2024-09-20 00:12:05,124 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.304e+02 2.606e+02 2.767e+02 2.960e+02 5.542e+02, threshold=5.534e+02, percent-clipped=1.0 2024-09-20 00:12:49,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=778036.0, ans=0.2 2024-09-20 00:12:57,771 INFO [train.py:1198] (1/2) Epoch 44, batch 50, loss[loss=0.178, simple_loss=0.2337, pruned_loss=0.04401, ctc_loss=0.1007, cr_loss=0.3532, over 34470.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2605, pruned_loss=0.05291, ctc_loss=0.1145, cr_loss=0.3892, over 1482791.30 frames. ], batch size: 82, lr: 2.72e-03, grad_scale: 32.0 2024-09-20 00:12:58,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=778082.6666666666, ans=0.125 2024-09-20 00:13:03,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=778082.6666666666, ans=0.5 2024-09-20 00:13:05,390 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.46 vs. limit=12.0 2024-09-20 00:13:06,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=778082.6666666666, ans=0.125 2024-09-20 00:13:21,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=778129.3333333334, ans=0.025 2024-09-20 00:13:27,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=15.0 2024-09-20 00:13:32,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=778176.0, ans=0.0 2024-09-20 00:13:34,368 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=778176.0, ans=0.0 2024-09-20 00:13:49,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=778222.6666666666, ans=0.125 2024-09-20 00:13:57,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=778222.6666666666, ans=0.125 2024-09-20 00:14:01,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=778222.6666666666, ans=0.125 2024-09-20 00:14:05,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=778269.3333333334, ans=0.125 2024-09-20 00:14:20,265 INFO [train.py:1198] (1/2) Epoch 44, batch 100, loss[loss=0.1862, simple_loss=0.2446, pruned_loss=0.04655, ctc_loss=0.1022, cr_loss=0.357, over 34562.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.263, pruned_loss=0.05327, ctc_loss=0.1154, cr_loss=0.3911, over 2630327.70 frames. ], batch size: 89, lr: 2.72e-03, grad_scale: 32.0 2024-09-20 00:14:33,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=778316.0, ans=0.125 2024-09-20 00:14:50,121 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.237e+02 2.524e+02 2.884e+02 3.329e+02 5.481e+02, threshold=5.768e+02, percent-clipped=0.0 2024-09-20 00:15:00,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=778409.3333333334, ans=0.04949747468305833 2024-09-20 00:15:09,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.93 vs. limit=6.0 2024-09-20 00:15:10,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=778456.0, ans=0.1 2024-09-20 00:15:38,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=778502.6666666666, ans=0.2 2024-09-20 00:15:46,508 INFO [train.py:1198] (1/2) Epoch 44, batch 150, loss[loss=0.1812, simple_loss=0.2365, pruned_loss=0.04631, ctc_loss=0.09804, cr_loss=0.3426, over 34477.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2603, pruned_loss=0.0523, ctc_loss=0.1136, cr_loss=0.3872, over 3557204.67 frames. ], batch size: 82, lr: 2.72e-03, grad_scale: 16.0 2024-09-20 00:15:50,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=778549.3333333334, ans=0.125 2024-09-20 00:15:55,736 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2024-09-20 00:16:01,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=778596.0, ans=0.0 2024-09-20 00:16:05,343 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.69 vs. limit=15.0 2024-09-20 00:17:08,258 INFO [train.py:1198] (1/2) Epoch 44, batch 200, loss[loss=0.2232, simple_loss=0.282, pruned_loss=0.06028, ctc_loss=0.1326, cr_loss=0.4328, over 31869.00 frames. ], tot_loss[loss=0.2012, simple_loss=0.2598, pruned_loss=0.05223, ctc_loss=0.1133, cr_loss=0.3861, over 4270886.76 frames. ], batch size: 145, lr: 2.72e-03, grad_scale: 16.0 2024-09-20 00:17:19,066 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.98 vs. limit=15.0 2024-09-20 00:17:20,682 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-09-20 00:17:23,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=778829.3333333334, ans=0.125 2024-09-20 00:17:34,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=778829.3333333334, ans=0.2 2024-09-20 00:17:39,367 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.180e+02 2.670e+02 3.324e+02 4.135e+02 8.776e+02, threshold=6.648e+02, percent-clipped=7.0 2024-09-20 00:18:20,507 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=778969.3333333334, ans=0.125 2024-09-20 00:18:29,982 INFO [train.py:1198] (1/2) Epoch 44, batch 250, loss[loss=0.2237, simple_loss=0.2842, pruned_loss=0.06083, ctc_loss=0.127, cr_loss=0.4041, over 34238.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.26, pruned_loss=0.05232, ctc_loss=0.1135, cr_loss=0.387, over 4833946.62 frames. ], batch size: 117, lr: 2.72e-03, grad_scale: 16.0 2024-09-20 00:18:33,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=779016.0, ans=0.0 2024-09-20 00:18:38,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=779016.0, ans=0.125 2024-09-20 00:18:43,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=779016.0, ans=0.0 2024-09-20 00:19:41,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=779202.6666666666, ans=0.1 2024-09-20 00:19:49,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=779202.6666666666, ans=0.125 2024-09-20 00:19:55,793 INFO [train.py:1198] (1/2) Epoch 44, batch 300, loss[loss=0.2211, simple_loss=0.2742, pruned_loss=0.06245, ctc_loss=0.1309, cr_loss=0.4225, over 34347.00 frames. ], tot_loss[loss=0.201, simple_loss=0.2595, pruned_loss=0.05223, ctc_loss=0.1132, cr_loss=0.3858, over 5262162.31 frames. ], batch size: 107, lr: 2.72e-03, grad_scale: 16.0 2024-09-20 00:20:10,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=779296.0, ans=0.125 2024-09-20 00:20:27,121 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.100e+02 2.508e+02 2.930e+02 3.684e+02 5.396e+02, threshold=5.859e+02, percent-clipped=0.0 2024-09-20 00:20:42,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=779342.6666666666, ans=0.0 2024-09-20 00:20:42,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.64 vs. limit=15.0 2024-09-20 00:20:57,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.53 vs. limit=15.0 2024-09-20 00:20:58,559 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=779389.3333333334, ans=0.125 2024-09-20 00:20:58,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=779389.3333333334, ans=0.125 2024-09-20 00:21:12,446 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.39 vs. limit=15.0 2024-09-20 00:21:17,919 INFO [train.py:1198] (1/2) Epoch 44, batch 350, loss[loss=0.1744, simple_loss=0.2343, pruned_loss=0.04119, ctc_loss=0.09297, cr_loss=0.3401, over 34301.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.26, pruned_loss=0.05251, ctc_loss=0.1137, cr_loss=0.3873, over 5596504.87 frames. ], batch size: 83, lr: 2.71e-03, grad_scale: 16.0 2024-09-20 00:21:44,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=779529.3333333334, ans=0.125 2024-09-20 00:22:17,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=15.0 2024-09-20 00:22:31,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=779669.3333333334, ans=0.0 2024-09-20 00:22:39,223 INFO [train.py:1198] (1/2) Epoch 44, batch 400, loss[loss=0.2006, simple_loss=0.2596, pruned_loss=0.05185, ctc_loss=0.1126, cr_loss=0.3847, over 34400.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2597, pruned_loss=0.05236, ctc_loss=0.1133, cr_loss=0.3858, over 5862455.73 frames. ], batch size: 95, lr: 2.71e-03, grad_scale: 32.0 2024-09-20 00:22:57,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=779762.6666666666, ans=0.1 2024-09-20 00:22:59,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=779762.6666666666, ans=0.125 2024-09-20 00:23:14,688 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.137e+02 2.482e+02 2.887e+02 3.625e+02 5.468e+02, threshold=5.774e+02, percent-clipped=0.0 2024-09-20 00:23:15,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=779809.3333333334, ans=0.125 2024-09-20 00:23:16,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779809.3333333334, ans=0.1 2024-09-20 00:23:26,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=779809.3333333334, ans=0.125 2024-09-20 00:23:31,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=779856.0, ans=0.1 2024-09-20 00:23:31,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=779856.0, ans=0.125 2024-09-20 00:23:36,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779856.0, ans=0.1 2024-09-20 00:23:40,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=779856.0, ans=0.1 2024-09-20 00:23:40,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=779856.0, ans=0.1 2024-09-20 00:23:58,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=779902.6666666666, ans=0.0 2024-09-20 00:24:06,174 INFO [train.py:1198] (1/2) Epoch 44, batch 450, loss[loss=0.2054, simple_loss=0.2702, pruned_loss=0.05152, ctc_loss=0.1128, cr_loss=0.3733, over 34690.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2598, pruned_loss=0.05239, ctc_loss=0.1133, cr_loss=0.3858, over 6053176.76 frames. ], batch size: 97, lr: 2.71e-03, grad_scale: 32.0 2024-09-20 00:24:47,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=780042.6666666666, ans=0.0 2024-09-20 00:25:28,641 INFO [train.py:1198] (1/2) Epoch 44, batch 500, loss[loss=0.2158, simple_loss=0.2722, pruned_loss=0.05886, ctc_loss=0.1257, cr_loss=0.415, over 34434.00 frames. ], tot_loss[loss=0.2006, simple_loss=0.259, pruned_loss=0.05214, ctc_loss=0.1129, cr_loss=0.3847, over 6219585.75 frames. ], batch size: 110, lr: 2.71e-03, grad_scale: 32.0 2024-09-20 00:25:59,934 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.141e+02 2.458e+02 2.878e+02 3.639e+02 6.603e+02, threshold=5.757e+02, percent-clipped=3.0 2024-09-20 00:26:08,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=780276.0, ans=0.125 2024-09-20 00:26:11,879 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:26:11,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780276.0, ans=0.1 2024-09-20 00:26:22,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=780322.6666666666, ans=22.5 2024-09-20 00:26:22,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.02 vs. limit=15.0 2024-09-20 00:26:36,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=780369.3333333334, ans=0.0 2024-09-20 00:26:52,776 INFO [train.py:1198] (1/2) Epoch 44, batch 550, loss[loss=0.2087, simple_loss=0.2682, pruned_loss=0.05486, ctc_loss=0.1174, cr_loss=0.3991, over 33946.00 frames. ], tot_loss[loss=0.2003, simple_loss=0.2586, pruned_loss=0.052, ctc_loss=0.1127, cr_loss=0.3843, over 6329625.95 frames. ], batch size: 122, lr: 2.71e-03, grad_scale: 32.0 2024-09-20 00:27:20,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=780462.6666666666, ans=0.1 2024-09-20 00:27:33,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=780509.3333333334, ans=0.125 2024-09-20 00:27:33,642 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2024-09-20 00:28:09,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=780602.6666666666, ans=0.05 2024-09-20 00:28:17,427 INFO [train.py:1198] (1/2) Epoch 44, batch 600, loss[loss=0.2148, simple_loss=0.2708, pruned_loss=0.05766, ctc_loss=0.1292, cr_loss=0.4421, over 34178.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2592, pruned_loss=0.05226, ctc_loss=0.1132, cr_loss=0.3852, over 6432363.93 frames. ], batch size: 117, lr: 2.71e-03, grad_scale: 16.0 2024-09-20 00:28:32,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=780696.0, ans=0.125 2024-09-20 00:28:34,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=780696.0, ans=0.0 2024-09-20 00:28:50,172 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.142e+02 2.536e+02 2.921e+02 3.730e+02 6.482e+02, threshold=5.843e+02, percent-clipped=2.0 2024-09-20 00:28:52,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=780742.6666666666, ans=0.0 2024-09-20 00:29:00,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780742.6666666666, ans=0.1 2024-09-20 00:29:21,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780836.0, ans=0.1 2024-09-20 00:29:39,359 INFO [train.py:1198] (1/2) Epoch 44, batch 650, loss[loss=0.203, simple_loss=0.2613, pruned_loss=0.05295, ctc_loss=0.1145, cr_loss=0.3969, over 34541.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2586, pruned_loss=0.05195, ctc_loss=0.1126, cr_loss=0.384, over 6524029.30 frames. ], batch size: 94, lr: 2.71e-03, grad_scale: 16.0 2024-09-20 00:29:41,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=780882.6666666666, ans=0.125 2024-09-20 00:30:00,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=780929.3333333334, ans=0.2 2024-09-20 00:30:07,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=780929.3333333334, ans=0.0 2024-09-20 00:30:24,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=780976.0, ans=0.0 2024-09-20 00:30:31,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=781022.6666666666, ans=0.1 2024-09-20 00:31:05,634 INFO [train.py:1198] (1/2) Epoch 44, batch 700, loss[loss=0.1896, simple_loss=0.2452, pruned_loss=0.04917, ctc_loss=0.1052, cr_loss=0.3643, over 34585.00 frames. ], tot_loss[loss=0.2008, simple_loss=0.2592, pruned_loss=0.05218, ctc_loss=0.1129, cr_loss=0.385, over 6580819.73 frames. ], batch size: 89, lr: 2.71e-03, grad_scale: 16.0 2024-09-20 00:31:14,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.14 vs. limit=15.0 2024-09-20 00:31:19,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=781116.0, ans=0.125 2024-09-20 00:31:22,785 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.09 vs. limit=10.0 2024-09-20 00:31:29,531 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.64 vs. limit=22.5 2024-09-20 00:31:37,962 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.24 vs. limit=15.0 2024-09-20 00:31:38,605 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.170e+02 2.547e+02 3.006e+02 3.754e+02 1.086e+03, threshold=6.011e+02, percent-clipped=4.0 2024-09-20 00:31:40,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=781209.3333333334, ans=0.0 2024-09-20 00:31:45,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=781209.3333333334, ans=0.125 2024-09-20 00:31:48,750 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=781209.3333333334, ans=0.0 2024-09-20 00:31:50,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=781209.3333333334, ans=0.125 2024-09-20 00:32:00,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.min_positive, batch_count=781256.0, ans=0.025 2024-09-20 00:32:07,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.47 vs. limit=22.5 2024-09-20 00:32:16,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=781302.6666666666, ans=0.0 2024-09-20 00:32:28,007 INFO [train.py:1198] (1/2) Epoch 44, batch 750, loss[loss=0.1958, simple_loss=0.2531, pruned_loss=0.05065, ctc_loss=0.1094, cr_loss=0.3824, over 34439.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2587, pruned_loss=0.05191, ctc_loss=0.1125, cr_loss=0.3842, over 6625553.85 frames. ], batch size: 95, lr: 2.71e-03, grad_scale: 16.0 2024-09-20 00:32:58,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=781442.6666666666, ans=0.0 2024-09-20 00:32:59,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=781442.6666666666, ans=0.0 2024-09-20 00:33:02,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=781442.6666666666, ans=0.2 2024-09-20 00:33:40,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=781536.0, ans=0.125 2024-09-20 00:33:49,638 INFO [train.py:1198] (1/2) Epoch 44, batch 800, loss[loss=0.1859, simple_loss=0.2472, pruned_loss=0.04499, ctc_loss=0.102, cr_loss=0.3569, over 34490.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2587, pruned_loss=0.05192, ctc_loss=0.1125, cr_loss=0.384, over 6660134.27 frames. ], batch size: 85, lr: 2.71e-03, grad_scale: 32.0 2024-09-20 00:33:52,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.22 vs. limit=15.0 2024-09-20 00:34:08,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.47 vs. limit=15.0 2024-09-20 00:34:12,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=781629.3333333334, ans=0.025 2024-09-20 00:34:24,069 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.161e+02 2.448e+02 2.841e+02 3.410e+02 4.948e+02, threshold=5.683e+02, percent-clipped=0.0 2024-09-20 00:34:24,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.96 vs. limit=15.0 2024-09-20 00:34:36,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.34 vs. limit=22.5 2024-09-20 00:34:56,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=781722.6666666666, ans=0.125 2024-09-20 00:35:15,332 INFO [train.py:1198] (1/2) Epoch 44, batch 850, loss[loss=0.2089, simple_loss=0.2765, pruned_loss=0.05113, ctc_loss=0.1146, cr_loss=0.4013, over 34400.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2583, pruned_loss=0.05166, ctc_loss=0.1118, cr_loss=0.3819, over 6692992.51 frames. ], batch size: 103, lr: 2.71e-03, grad_scale: 32.0 2024-09-20 00:35:17,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=781816.0, ans=0.125 2024-09-20 00:35:19,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=781816.0, ans=0.0 2024-09-20 00:35:20,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=781816.0, ans=0.1 2024-09-20 00:35:23,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=781816.0, ans=0.125 2024-09-20 00:35:24,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=781816.0, ans=0.125 2024-09-20 00:35:32,438 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.62 vs. limit=15.0 2024-09-20 00:35:48,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=781909.3333333334, ans=0.125 2024-09-20 00:36:06,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=781956.0, ans=0.125 2024-09-20 00:36:06,904 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=15.0 2024-09-20 00:36:09,062 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-09-20 00:36:19,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=782002.6666666666, ans=0.5 2024-09-20 00:36:31,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=782002.6666666666, ans=0.0 2024-09-20 00:36:34,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=782002.6666666666, ans=0.125 2024-09-20 00:36:37,323 INFO [train.py:1198] (1/2) Epoch 44, batch 900, loss[loss=0.1762, simple_loss=0.2359, pruned_loss=0.04184, ctc_loss=0.09478, cr_loss=0.3465, over 34471.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2588, pruned_loss=0.05193, ctc_loss=0.1124, cr_loss=0.3836, over 6698992.12 frames. ], batch size: 85, lr: 2.71e-03, grad_scale: 16.0 2024-09-20 00:36:37,831 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=782049.3333333334, ans=0.2 2024-09-20 00:36:39,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=782049.3333333334, ans=0.125 2024-09-20 00:36:50,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=782049.3333333334, ans=0.1 2024-09-20 00:36:57,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=782096.0, ans=0.125 2024-09-20 00:37:00,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=782096.0, ans=0.0 2024-09-20 00:37:12,011 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.238e+02 2.485e+02 2.967e+02 3.664e+02 5.978e+02, threshold=5.933e+02, percent-clipped=1.0 2024-09-20 00:37:23,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=782142.6666666666, ans=0.125 2024-09-20 00:37:28,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=782189.3333333334, ans=0.125 2024-09-20 00:37:35,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=782189.3333333334, ans=0.0 2024-09-20 00:37:43,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=782236.0, ans=0.125 2024-09-20 00:38:01,246 INFO [train.py:1198] (1/2) Epoch 44, batch 950, loss[loss=0.1812, simple_loss=0.24, pruned_loss=0.04425, ctc_loss=0.09789, cr_loss=0.3553, over 34688.00 frames. ], tot_loss[loss=0.2007, simple_loss=0.2593, pruned_loss=0.05203, ctc_loss=0.1127, cr_loss=0.3849, over 6703710.13 frames. ], batch size: 87, lr: 2.71e-03, grad_scale: 16.0 2024-09-20 00:38:03,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=782282.6666666666, ans=0.07 2024-09-20 00:38:03,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=782282.6666666666, ans=0.125 2024-09-20 00:38:10,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.71 vs. limit=15.0 2024-09-20 00:38:36,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=782376.0, ans=0.0 2024-09-20 00:38:39,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=782376.0, ans=0.09899494936611666 2024-09-20 00:38:41,676 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:39:20,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=782469.3333333334, ans=0.0 2024-09-20 00:39:25,275 INFO [train.py:1198] (1/2) Epoch 44, batch 1000, loss[loss=0.1897, simple_loss=0.2449, pruned_loss=0.049, ctc_loss=0.1084, cr_loss=0.3705, over 34449.00 frames. ], tot_loss[loss=0.2017, simple_loss=0.2602, pruned_loss=0.05247, ctc_loss=0.1136, cr_loss=0.3865, over 6696385.83 frames. ], batch size: 90, lr: 2.71e-03, grad_scale: 16.0 2024-09-20 00:39:27,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=782516.0, ans=0.2 2024-09-20 00:39:32,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=782516.0, ans=0.0 2024-09-20 00:39:35,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=782516.0, ans=0.125 2024-09-20 00:39:56,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=782609.3333333334, ans=0.0 2024-09-20 00:39:59,904 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.194e+02 2.568e+02 3.067e+02 3.990e+02 1.264e+03, threshold=6.135e+02, percent-clipped=2.0 2024-09-20 00:40:23,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=782656.0, ans=0.125 2024-09-20 00:40:26,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=782656.0, ans=0.04949747468305833 2024-09-20 00:40:28,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=782656.0, ans=0.0 2024-09-20 00:40:34,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=782702.6666666666, ans=0.025 2024-09-20 00:40:47,350 INFO [train.py:1198] (1/2) Epoch 44, batch 1050, loss[loss=0.2056, simple_loss=0.2668, pruned_loss=0.05285, ctc_loss=0.1143, cr_loss=0.3965, over 34564.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2598, pruned_loss=0.05246, ctc_loss=0.1134, cr_loss=0.3864, over 6705323.02 frames. ], batch size: 99, lr: 2.71e-03, grad_scale: 16.0 2024-09-20 00:41:05,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=782796.0, ans=0.0 2024-09-20 00:41:10,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=782796.0, ans=0.02 2024-09-20 00:41:12,163 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:41:15,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=782796.0, ans=0.2 2024-09-20 00:41:20,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=782842.6666666666, ans=0.5 2024-09-20 00:41:48,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=782889.3333333334, ans=0.125 2024-09-20 00:42:11,289 INFO [train.py:1198] (1/2) Epoch 44, batch 1100, loss[loss=0.2141, simple_loss=0.2655, pruned_loss=0.06011, ctc_loss=0.1272, cr_loss=0.4255, over 34378.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2597, pruned_loss=0.0524, ctc_loss=0.1133, cr_loss=0.3863, over 6717877.93 frames. ], batch size: 91, lr: 2.71e-03, grad_scale: 16.0 2024-09-20 00:42:12,402 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2024-09-20 00:42:23,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=782982.6666666666, ans=0.025 2024-09-20 00:42:48,029 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.032e+02 2.545e+02 2.873e+02 3.446e+02 5.100e+02, threshold=5.746e+02, percent-clipped=0.0 2024-09-20 00:43:06,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=783122.6666666666, ans=0.1 2024-09-20 00:43:16,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=783122.6666666666, ans=0.1 2024-09-20 00:43:21,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=783169.3333333334, ans=0.0 2024-09-20 00:43:35,670 INFO [train.py:1198] (1/2) Epoch 44, batch 1150, loss[loss=0.2025, simple_loss=0.2559, pruned_loss=0.05456, ctc_loss=0.1184, cr_loss=0.4059, over 34366.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2596, pruned_loss=0.05243, ctc_loss=0.1135, cr_loss=0.3865, over 6716353.36 frames. ], batch size: 91, lr: 2.71e-03, grad_scale: 16.0 2024-09-20 00:43:36,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=783216.0, ans=0.025 2024-09-20 00:43:47,466 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=783216.0, ans=0.0 2024-09-20 00:44:00,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=783262.6666666666, ans=0.125 2024-09-20 00:44:05,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=783262.6666666666, ans=0.0 2024-09-20 00:44:09,127 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=783309.3333333334, ans=0.125 2024-09-20 00:44:25,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=783356.0, ans=0.125 2024-09-20 00:44:28,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=783356.0, ans=0.025 2024-09-20 00:44:47,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=7.75 vs. limit=15.0 2024-09-20 00:44:55,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=783402.6666666666, ans=10.0 2024-09-20 00:44:58,223 INFO [train.py:1198] (1/2) Epoch 44, batch 1200, loss[loss=0.2056, simple_loss=0.2673, pruned_loss=0.0529, ctc_loss=0.1136, cr_loss=0.3832, over 34548.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2601, pruned_loss=0.0525, ctc_loss=0.1136, cr_loss=0.3874, over 6708471.12 frames. ], batch size: 99, lr: 2.71e-03, grad_scale: 32.0 2024-09-20 00:45:18,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=783496.0, ans=0.0 2024-09-20 00:45:33,924 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.20 vs. limit=15.0 2024-09-20 00:45:34,700 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.142e+02 2.495e+02 2.981e+02 3.427e+02 4.898e+02, threshold=5.962e+02, percent-clipped=0.0 2024-09-20 00:45:36,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.min_positive, batch_count=783542.6666666666, ans=0.05 2024-09-20 00:45:56,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=783589.3333333334, ans=0.0 2024-09-20 00:46:24,639 INFO [train.py:1198] (1/2) Epoch 44, batch 1250, loss[loss=0.2203, simple_loss=0.2758, pruned_loss=0.06064, ctc_loss=0.1291, cr_loss=0.4429, over 34322.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2604, pruned_loss=0.05267, ctc_loss=0.114, cr_loss=0.3883, over 6742270.19 frames. ], batch size: 107, lr: 2.71e-03, grad_scale: 32.0 2024-09-20 00:46:34,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=783682.6666666666, ans=0.1 2024-09-20 00:46:59,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=783776.0, ans=0.125 2024-09-20 00:47:26,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=783822.6666666666, ans=0.125 2024-09-20 00:47:37,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=783869.3333333334, ans=0.1 2024-09-20 00:47:47,070 INFO [train.py:1198] (1/2) Epoch 44, batch 1300, loss[loss=0.2114, simple_loss=0.2741, pruned_loss=0.05407, ctc_loss=0.1223, cr_loss=0.4028, over 33058.00 frames. ], tot_loss[loss=0.2012, simple_loss=0.2596, pruned_loss=0.05231, ctc_loss=0.1134, cr_loss=0.3868, over 6747585.85 frames. ], batch size: 130, lr: 2.71e-03, grad_scale: 32.0 2024-09-20 00:47:54,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=783916.0, ans=0.0 2024-09-20 00:47:55,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.81 vs. limit=15.0 2024-09-20 00:48:12,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=783962.6666666666, ans=0.0 2024-09-20 00:48:28,327 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.211e+02 2.554e+02 2.828e+02 3.358e+02 5.584e+02, threshold=5.656e+02, percent-clipped=0.0 2024-09-20 00:48:30,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=784009.3333333334, ans=0.05 2024-09-20 00:48:33,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=784009.3333333334, ans=0.125 2024-09-20 00:48:35,798 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.18 vs. limit=10.0 2024-09-20 00:48:51,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=784056.0, ans=0.025 2024-09-20 00:49:03,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=784102.6666666666, ans=0.125 2024-09-20 00:49:09,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=784102.6666666666, ans=0.0 2024-09-20 00:49:13,605 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.51 vs. limit=15.0 2024-09-20 00:49:17,726 INFO [train.py:1198] (1/2) Epoch 44, batch 1350, loss[loss=0.1929, simple_loss=0.2515, pruned_loss=0.04887, ctc_loss=0.1071, cr_loss=0.3767, over 34534.00 frames. ], tot_loss[loss=0.2007, simple_loss=0.2592, pruned_loss=0.05214, ctc_loss=0.1129, cr_loss=0.3863, over 6765935.80 frames. ], batch size: 94, lr: 2.71e-03, grad_scale: 32.0 2024-09-20 00:49:19,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=784149.3333333334, ans=0.125 2024-09-20 00:49:21,855 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=784149.3333333334, ans=15.0 2024-09-20 00:49:42,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=784196.0, ans=0.0 2024-09-20 00:49:43,276 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.60 vs. limit=15.0 2024-09-20 00:49:44,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=784196.0, ans=0.125 2024-09-20 00:49:55,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.06 vs. limit=10.0 2024-09-20 00:50:19,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=784289.3333333334, ans=0.1 2024-09-20 00:50:22,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=784289.3333333334, ans=0.125 2024-09-20 00:50:41,965 INFO [train.py:1198] (1/2) Epoch 44, batch 1400, loss[loss=0.1779, simple_loss=0.2341, pruned_loss=0.04438, ctc_loss=0.09691, cr_loss=0.3374, over 34270.00 frames. ], tot_loss[loss=0.201, simple_loss=0.2593, pruned_loss=0.05229, ctc_loss=0.1131, cr_loss=0.3865, over 6777359.55 frames. ], batch size: 80, lr: 2.71e-03, grad_scale: 32.0 2024-09-20 00:50:43,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=784382.6666666666, ans=0.125 2024-09-20 00:51:06,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=784429.3333333334, ans=0.0 2024-09-20 00:51:16,276 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.202e+02 2.598e+02 3.286e+02 4.535e+02 7.396e+02, threshold=6.573e+02, percent-clipped=9.0 2024-09-20 00:51:19,714 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=784476.0, ans=0.125 2024-09-20 00:51:46,596 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.90 vs. limit=15.0 2024-09-20 00:51:50,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=784569.3333333334, ans=0.0 2024-09-20 00:52:03,637 INFO [train.py:1198] (1/2) Epoch 44, batch 1450, loss[loss=0.2205, simple_loss=0.2779, pruned_loss=0.0603, ctc_loss=0.1271, cr_loss=0.4242, over 34399.00 frames. ], tot_loss[loss=0.201, simple_loss=0.2597, pruned_loss=0.05217, ctc_loss=0.1129, cr_loss=0.3863, over 6774230.52 frames. ], batch size: 110, lr: 2.71e-03, grad_scale: 32.0 2024-09-20 00:52:07,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=784616.0, ans=0.125 2024-09-20 00:52:18,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=784662.6666666666, ans=0.04949747468305833 2024-09-20 00:52:21,985 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=784662.6666666666, ans=0.1 2024-09-20 00:52:31,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=784662.6666666666, ans=0.125 2024-09-20 00:52:33,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=784662.6666666666, ans=0.1 2024-09-20 00:53:27,181 INFO [train.py:1198] (1/2) Epoch 44, batch 1500, loss[loss=0.2186, simple_loss=0.2731, pruned_loss=0.06091, ctc_loss=0.1274, cr_loss=0.4216, over 34426.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2601, pruned_loss=0.05231, ctc_loss=0.1133, cr_loss=0.3868, over 6773251.07 frames. ], batch size: 100, lr: 2.71e-03, grad_scale: 32.0 2024-09-20 00:53:34,209 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=784849.3333333334, ans=0.2 2024-09-20 00:53:35,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=784849.3333333334, ans=0.1 2024-09-20 00:53:46,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=784896.0, ans=0.2 2024-09-20 00:54:04,501 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.169e+02 2.487e+02 2.811e+02 3.384e+02 4.975e+02, threshold=5.622e+02, percent-clipped=0.0 2024-09-20 00:54:16,590 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=784942.6666666666, ans=0.0 2024-09-20 00:54:19,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=784989.3333333334, ans=0.2 2024-09-20 00:54:21,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=784989.3333333334, ans=0.125 2024-09-20 00:54:51,920 INFO [train.py:1198] (1/2) Epoch 44, batch 1550, loss[loss=0.2096, simple_loss=0.2656, pruned_loss=0.05653, ctc_loss=0.1206, cr_loss=0.4122, over 34406.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.26, pruned_loss=0.05246, ctc_loss=0.1136, cr_loss=0.3871, over 6746016.64 frames. ], batch size: 105, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 00:54:54,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=785082.6666666666, ans=0.2 2024-09-20 00:55:10,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=785129.3333333334, ans=0.125 2024-09-20 00:55:14,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=785129.3333333334, ans=10.0 2024-09-20 00:55:18,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=785129.3333333334, ans=10.0 2024-09-20 00:55:27,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=785176.0, ans=0.125 2024-09-20 00:55:28,812 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.42 vs. limit=22.5 2024-09-20 00:55:42,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=785222.6666666666, ans=0.09899494936611666 2024-09-20 00:56:13,508 INFO [train.py:1198] (1/2) Epoch 44, batch 1600, loss[loss=0.2032, simple_loss=0.2644, pruned_loss=0.05218, ctc_loss=0.1129, cr_loss=0.376, over 34560.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2599, pruned_loss=0.05255, ctc_loss=0.1136, cr_loss=0.387, over 6724611.25 frames. ], batch size: 99, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 00:56:27,256 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 00:56:47,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=785409.3333333334, ans=0.1 2024-09-20 00:56:49,976 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.204e+02 2.591e+02 2.952e+02 3.473e+02 6.479e+02, threshold=5.903e+02, percent-clipped=4.0 2024-09-20 00:56:52,751 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.86 vs. limit=10.0 2024-09-20 00:56:53,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=785409.3333333334, ans=0.1 2024-09-20 00:57:18,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=785456.0, ans=0.125 2024-09-20 00:57:38,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=785549.3333333334, ans=22.5 2024-09-20 00:57:39,638 INFO [train.py:1198] (1/2) Epoch 44, batch 1650, loss[loss=0.2054, simple_loss=0.2706, pruned_loss=0.05083, ctc_loss=0.1139, cr_loss=0.3937, over 34389.00 frames. ], tot_loss[loss=0.2017, simple_loss=0.26, pruned_loss=0.05263, ctc_loss=0.1136, cr_loss=0.3867, over 6716524.53 frames. ], batch size: 103, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 00:57:40,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=12.0 2024-09-20 00:57:41,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=785549.3333333334, ans=0.125 2024-09-20 00:57:48,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=785549.3333333334, ans=6.0 2024-09-20 00:57:54,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=785596.0, ans=0.1 2024-09-20 00:58:08,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=785596.0, ans=0.125 2024-09-20 00:58:21,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=785642.6666666666, ans=0.0 2024-09-20 00:58:27,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=785689.3333333334, ans=0.0 2024-09-20 00:58:34,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=785689.3333333334, ans=0.2 2024-09-20 00:58:39,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=785689.3333333334, ans=0.0 2024-09-20 00:59:01,952 INFO [train.py:1198] (1/2) Epoch 44, batch 1700, loss[loss=0.1756, simple_loss=0.2343, pruned_loss=0.0423, ctc_loss=0.09388, cr_loss=0.3374, over 34315.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2596, pruned_loss=0.05243, ctc_loss=0.1133, cr_loss=0.3862, over 6742327.26 frames. ], batch size: 80, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 00:59:36,763 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.148e+02 2.538e+02 2.883e+02 3.325e+02 7.092e+02, threshold=5.766e+02, percent-clipped=1.0 2024-09-20 00:59:38,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=785876.0, ans=0.0 2024-09-20 00:59:44,350 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.06 vs. limit=22.5 2024-09-20 00:59:47,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=785876.0, ans=0.0 2024-09-20 00:59:50,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.64 vs. limit=15.0 2024-09-20 01:00:25,981 INFO [train.py:1198] (1/2) Epoch 44, batch 1750, loss[loss=0.1761, simple_loss=0.2349, pruned_loss=0.04254, ctc_loss=0.09457, cr_loss=0.3318, over 34216.00 frames. ], tot_loss[loss=0.2007, simple_loss=0.2592, pruned_loss=0.05211, ctc_loss=0.1128, cr_loss=0.3852, over 6751998.43 frames. ], batch size: 78, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:00:28,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.00 vs. limit=15.0 2024-09-20 01:00:39,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=786016.0, ans=0.0 2024-09-20 01:00:59,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=786109.3333333334, ans=0.95 2024-09-20 01:01:02,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=786109.3333333334, ans=0.1 2024-09-20 01:01:12,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=786109.3333333334, ans=0.125 2024-09-20 01:01:32,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=786202.6666666666, ans=0.125 2024-09-20 01:01:42,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=786202.6666666666, ans=0.5 2024-09-20 01:01:49,924 INFO [train.py:1198] (1/2) Epoch 44, batch 1800, loss[loss=0.2201, simple_loss=0.273, pruned_loss=0.0619, ctc_loss=0.1292, cr_loss=0.44, over 34675.00 frames. ], tot_loss[loss=0.2012, simple_loss=0.2597, pruned_loss=0.05227, ctc_loss=0.1132, cr_loss=0.3865, over 6755329.94 frames. ], batch size: 97, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:02:08,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=6.0 2024-09-20 01:02:11,742 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:02:24,272 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.129e+02 2.618e+02 2.953e+02 3.625e+02 6.423e+02, threshold=5.907e+02, percent-clipped=3.0 2024-09-20 01:02:38,286 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.23 vs. limit=6.0 2024-09-20 01:02:49,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=786389.3333333334, ans=0.1 2024-09-20 01:03:04,308 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2024-09-20 01:03:11,894 INFO [train.py:1198] (1/2) Epoch 44, batch 1850, loss[loss=0.2217, simple_loss=0.2788, pruned_loss=0.06073, ctc_loss=0.1287, cr_loss=0.4346, over 34438.00 frames. ], tot_loss[loss=0.2011, simple_loss=0.2596, pruned_loss=0.05227, ctc_loss=0.1131, cr_loss=0.3862, over 6763259.91 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:03:17,607 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.42 vs. limit=15.0 2024-09-20 01:03:33,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=786529.3333333334, ans=0.125 2024-09-20 01:03:35,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=786529.3333333334, ans=0.0 2024-09-20 01:04:21,190 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.633e-02 2024-09-20 01:04:30,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=786669.3333333334, ans=0.125 2024-09-20 01:04:32,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=786669.3333333334, ans=0.0 2024-09-20 01:04:35,385 INFO [train.py:1198] (1/2) Epoch 44, batch 1900, loss[loss=0.2069, simple_loss=0.2676, pruned_loss=0.05326, ctc_loss=0.118, cr_loss=0.4048, over 34372.00 frames. ], tot_loss[loss=0.2017, simple_loss=0.2603, pruned_loss=0.05246, ctc_loss=0.1136, cr_loss=0.3867, over 6772692.05 frames. ], batch size: 103, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:04:44,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=786716.0, ans=0.1 2024-09-20 01:04:50,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=786762.6666666666, ans=0.125 2024-09-20 01:05:11,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=786809.3333333334, ans=0.0 2024-09-20 01:05:12,256 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.107e+02 2.569e+02 2.907e+02 4.094e+02 6.826e+02, threshold=5.813e+02, percent-clipped=3.0 2024-09-20 01:05:20,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=786809.3333333334, ans=0.0 2024-09-20 01:05:42,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.60 vs. limit=15.0 2024-09-20 01:05:48,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=786902.6666666666, ans=0.0 2024-09-20 01:05:59,494 INFO [train.py:1198] (1/2) Epoch 44, batch 1950, loss[loss=0.1919, simple_loss=0.2519, pruned_loss=0.04822, ctc_loss=0.1056, cr_loss=0.36, over 34341.00 frames. ], tot_loss[loss=0.2026, simple_loss=0.2613, pruned_loss=0.05272, ctc_loss=0.1141, cr_loss=0.388, over 6789327.44 frames. ], batch size: 91, lr: 2.70e-03, grad_scale: 16.0 2024-09-20 01:06:30,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.72 vs. limit=15.0 2024-09-20 01:06:32,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=787042.6666666666, ans=0.125 2024-09-20 01:06:41,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=787042.6666666666, ans=0.025 2024-09-20 01:06:48,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.56 vs. limit=12.0 2024-09-20 01:07:07,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=787136.0, ans=0.125 2024-09-20 01:07:21,658 INFO [train.py:1198] (1/2) Epoch 44, batch 2000, loss[loss=0.186, simple_loss=0.2436, pruned_loss=0.04703, ctc_loss=0.1034, cr_loss=0.3418, over 34139.00 frames. ], tot_loss[loss=0.2028, simple_loss=0.2616, pruned_loss=0.05276, ctc_loss=0.1143, cr_loss=0.3885, over 6765412.99 frames. ], batch size: 78, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:07:22,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=787182.6666666666, ans=0.125 2024-09-20 01:07:27,842 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.56 vs. limit=10.0 2024-09-20 01:07:29,297 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.86 vs. limit=6.0 2024-09-20 01:07:41,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=787229.3333333334, ans=0.0 2024-09-20 01:07:45,032 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:07:59,588 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.290e+02 2.580e+02 2.935e+02 3.528e+02 7.848e+02, threshold=5.870e+02, percent-clipped=4.0 2024-09-20 01:08:00,036 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:08:08,971 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2024-09-20 01:08:32,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=787369.3333333334, ans=0.125 2024-09-20 01:08:47,394 INFO [train.py:1198] (1/2) Epoch 44, batch 2050, loss[loss=0.1858, simple_loss=0.2425, pruned_loss=0.04679, ctc_loss=0.1055, cr_loss=0.3611, over 34460.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.261, pruned_loss=0.05277, ctc_loss=0.1142, cr_loss=0.3883, over 6757298.92 frames. ], batch size: 82, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:09:51,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=787602.6666666666, ans=0.0 2024-09-20 01:10:09,768 INFO [train.py:1198] (1/2) Epoch 44, batch 2100, loss[loss=0.1944, simple_loss=0.2494, pruned_loss=0.05101, ctc_loss=0.1113, cr_loss=0.3751, over 34530.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2602, pruned_loss=0.05239, ctc_loss=0.1134, cr_loss=0.3859, over 6768541.89 frames. ], batch size: 94, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:10:19,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=787649.3333333334, ans=0.125 2024-09-20 01:10:20,472 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.14 vs. limit=15.0 2024-09-20 01:10:21,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=787649.3333333334, ans=0.125 2024-09-20 01:10:39,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=787696.0, ans=0.125 2024-09-20 01:10:41,357 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=787742.6666666666, ans=0.0 2024-09-20 01:10:45,969 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.170e+02 2.486e+02 2.831e+02 3.545e+02 6.306e+02, threshold=5.663e+02, percent-clipped=2.0 2024-09-20 01:11:06,093 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=12.0 2024-09-20 01:11:08,938 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=787789.3333333334, ans=0.125 2024-09-20 01:11:15,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=787836.0, ans=0.0 2024-09-20 01:11:22,859 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.11 vs. limit=10.0 2024-09-20 01:11:23,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=787836.0, ans=0.125 2024-09-20 01:11:33,297 INFO [train.py:1198] (1/2) Epoch 44, batch 2150, loss[loss=0.2033, simple_loss=0.2536, pruned_loss=0.05628, ctc_loss=0.1199, cr_loss=0.4126, over 34367.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2597, pruned_loss=0.05209, ctc_loss=0.1128, cr_loss=0.3848, over 6787348.88 frames. ], batch size: 91, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:11:35,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=787882.6666666666, ans=0.125 2024-09-20 01:11:41,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=787882.6666666666, ans=0.125 2024-09-20 01:11:55,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=787929.3333333334, ans=0.0 2024-09-20 01:12:03,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=787929.3333333334, ans=0.125 2024-09-20 01:12:03,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=787929.3333333334, ans=0.125 2024-09-20 01:12:08,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=787976.0, ans=0.0 2024-09-20 01:12:19,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=787976.0, ans=0.2 2024-09-20 01:12:22,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=788022.6666666666, ans=0.1 2024-09-20 01:12:29,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=788022.6666666666, ans=0.0 2024-09-20 01:12:32,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=788022.6666666666, ans=0.125 2024-09-20 01:12:34,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.21 vs. limit=15.0 2024-09-20 01:12:57,577 INFO [train.py:1198] (1/2) Epoch 44, batch 2200, loss[loss=0.2009, simple_loss=0.2677, pruned_loss=0.04852, ctc_loss=0.1092, cr_loss=0.381, over 34418.00 frames. ], tot_loss[loss=0.2011, simple_loss=0.2597, pruned_loss=0.05223, ctc_loss=0.113, cr_loss=0.3848, over 6782675.67 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:13:11,349 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.70 vs. limit=10.0 2024-09-20 01:13:15,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=788162.6666666666, ans=0.1 2024-09-20 01:13:17,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=788162.6666666666, ans=0.125 2024-09-20 01:13:32,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=788209.3333333334, ans=0.125 2024-09-20 01:13:33,782 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.110e+02 2.548e+02 2.976e+02 3.971e+02 6.809e+02, threshold=5.953e+02, percent-clipped=3.0 2024-09-20 01:13:50,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=788256.0, ans=0.1 2024-09-20 01:14:18,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=788349.3333333334, ans=0.125 2024-09-20 01:14:19,777 INFO [train.py:1198] (1/2) Epoch 44, batch 2250, loss[loss=0.2085, simple_loss=0.2685, pruned_loss=0.05422, ctc_loss=0.1195, cr_loss=0.4025, over 34405.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2595, pruned_loss=0.05218, ctc_loss=0.1129, cr_loss=0.3845, over 6781273.86 frames. ], batch size: 95, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:14:23,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=788349.3333333334, ans=0.025 2024-09-20 01:15:29,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=788536.0, ans=0.07 2024-09-20 01:15:37,508 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:15:37,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=788536.0, ans=0.0 2024-09-20 01:15:42,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=788582.6666666666, ans=0.125 2024-09-20 01:15:43,701 INFO [train.py:1198] (1/2) Epoch 44, batch 2300, loss[loss=0.1765, simple_loss=0.233, pruned_loss=0.04327, ctc_loss=0.0967, cr_loss=0.3525, over 34294.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.2584, pruned_loss=0.05182, ctc_loss=0.1121, cr_loss=0.3824, over 6766094.49 frames. ], batch size: 83, lr: 2.70e-03, grad_scale: 16.0 2024-09-20 01:16:21,705 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.030e+02 2.535e+02 3.056e+02 3.725e+02 6.422e+02, threshold=6.112e+02, percent-clipped=2.0 2024-09-20 01:16:51,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=788769.3333333334, ans=0.0 2024-09-20 01:17:02,515 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.86 vs. limit=12.0 2024-09-20 01:17:07,994 INFO [train.py:1198] (1/2) Epoch 44, batch 2350, loss[loss=0.2112, simple_loss=0.268, pruned_loss=0.05685, ctc_loss=0.1204, cr_loss=0.4156, over 34700.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2588, pruned_loss=0.05202, ctc_loss=0.1125, cr_loss=0.3838, over 6772755.15 frames. ], batch size: 97, lr: 2.70e-03, grad_scale: 16.0 2024-09-20 01:17:09,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=788816.0, ans=0.0 2024-09-20 01:17:15,410 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.34 vs. limit=15.0 2024-09-20 01:17:34,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.71 vs. limit=15.0 2024-09-20 01:17:49,916 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=7.55 vs. limit=15.0 2024-09-20 01:17:57,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=788956.0, ans=0.1 2024-09-20 01:18:09,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=788956.0, ans=0.125 2024-09-20 01:18:10,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=788956.0, ans=0.1 2024-09-20 01:18:14,476 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.94 vs. limit=15.0 2024-09-20 01:18:30,216 INFO [train.py:1198] (1/2) Epoch 44, batch 2400, loss[loss=0.1964, simple_loss=0.2547, pruned_loss=0.05086, ctc_loss=0.1095, cr_loss=0.3641, over 34581.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2595, pruned_loss=0.05216, ctc_loss=0.1129, cr_loss=0.3848, over 6776448.12 frames. ], batch size: 89, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:18:45,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=789049.3333333334, ans=0.125 2024-09-20 01:19:09,855 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.272e+02 2.676e+02 3.013e+02 3.752e+02 7.692e+02, threshold=6.026e+02, percent-clipped=2.0 2024-09-20 01:19:28,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=789189.3333333334, ans=0.0 2024-09-20 01:19:31,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=789189.3333333334, ans=0.125 2024-09-20 01:19:41,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=789236.0, ans=0.125 2024-09-20 01:19:54,296 INFO [train.py:1198] (1/2) Epoch 44, batch 2450, loss[loss=0.2165, simple_loss=0.2746, pruned_loss=0.05843, ctc_loss=0.1236, cr_loss=0.4181, over 34435.00 frames. ], tot_loss[loss=0.2018, simple_loss=0.2604, pruned_loss=0.05251, ctc_loss=0.1135, cr_loss=0.3861, over 6750238.82 frames. ], batch size: 95, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:20:09,514 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:20:30,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=789376.0, ans=15.0 2024-09-20 01:20:44,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=789422.6666666666, ans=0.125 2024-09-20 01:20:44,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=789422.6666666666, ans=0.0 2024-09-20 01:21:18,443 INFO [train.py:1198] (1/2) Epoch 44, batch 2500, loss[loss=0.2051, simple_loss=0.2657, pruned_loss=0.05263, ctc_loss=0.1146, cr_loss=0.4062, over 34444.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2604, pruned_loss=0.05266, ctc_loss=0.1138, cr_loss=0.3871, over 6761550.97 frames. ], batch size: 100, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:21:26,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=789516.0, ans=0.2 2024-09-20 01:21:41,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=789562.6666666666, ans=0.125 2024-09-20 01:21:56,787 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.141e+02 2.555e+02 2.848e+02 3.576e+02 5.271e+02, threshold=5.696e+02, percent-clipped=0.0 2024-09-20 01:21:58,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=789609.3333333334, ans=0.125 2024-09-20 01:22:05,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=789609.3333333334, ans=0.125 2024-09-20 01:22:27,071 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=789702.6666666666, ans=0.025 2024-09-20 01:22:28,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=789702.6666666666, ans=0.125 2024-09-20 01:22:43,099 INFO [train.py:1198] (1/2) Epoch 44, batch 2550, loss[loss=0.1669, simple_loss=0.227, pruned_loss=0.038, ctc_loss=0.08806, cr_loss=0.3294, over 34145.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.26, pruned_loss=0.05232, ctc_loss=0.1131, cr_loss=0.3858, over 6765829.46 frames. ], batch size: 78, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:22:53,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.93 vs. limit=15.0 2024-09-20 01:22:58,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=789796.0, ans=0.2 2024-09-20 01:23:42,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=789889.3333333334, ans=0.1 2024-09-20 01:24:07,292 INFO [train.py:1198] (1/2) Epoch 44, batch 2600, loss[loss=0.2073, simple_loss=0.2622, pruned_loss=0.05655, ctc_loss=0.118, cr_loss=0.3927, over 34728.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2605, pruned_loss=0.05261, ctc_loss=0.1136, cr_loss=0.387, over 6761553.43 frames. ], batch size: 92, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:24:32,774 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.31 vs. limit=22.5 2024-09-20 01:24:44,986 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.261e+02 2.540e+02 2.944e+02 3.843e+02 6.262e+02, threshold=5.888e+02, percent-clipped=4.0 2024-09-20 01:25:03,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=790122.6666666666, ans=0.125 2024-09-20 01:25:09,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=790122.6666666666, ans=0.1 2024-09-20 01:25:21,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=790169.3333333334, ans=0.2 2024-09-20 01:25:29,022 INFO [train.py:1198] (1/2) Epoch 44, batch 2650, loss[loss=0.2007, simple_loss=0.2648, pruned_loss=0.05012, ctc_loss=0.1091, cr_loss=0.3645, over 34182.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2602, pruned_loss=0.05237, ctc_loss=0.1133, cr_loss=0.3861, over 6769743.78 frames. ], batch size: 117, lr: 2.70e-03, grad_scale: 32.0 2024-09-20 01:25:36,592 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.50 vs. limit=15.0 2024-09-20 01:26:04,411 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2024-09-20 01:26:12,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=790309.3333333334, ans=0.125 2024-09-20 01:26:28,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=790356.0, ans=0.1 2024-09-20 01:26:45,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=790402.6666666666, ans=0.025 2024-09-20 01:26:52,802 INFO [train.py:1198] (1/2) Epoch 44, batch 2700, loss[loss=0.2109, simple_loss=0.2727, pruned_loss=0.05497, ctc_loss=0.1175, cr_loss=0.3953, over 34630.00 frames. ], tot_loss[loss=0.2018, simple_loss=0.2604, pruned_loss=0.05251, ctc_loss=0.1136, cr_loss=0.3867, over 6763774.62 frames. ], batch size: 102, lr: 2.70e-03, grad_scale: 16.0 2024-09-20 01:27:00,415 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.66 vs. limit=22.5 2024-09-20 01:27:01,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=790449.3333333334, ans=0.0 2024-09-20 01:27:11,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=790496.0, ans=0.2 2024-09-20 01:27:17,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=790496.0, ans=0.0 2024-09-20 01:27:32,510 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.071e+02 2.649e+02 2.941e+02 3.920e+02 6.597e+02, threshold=5.882e+02, percent-clipped=2.0 2024-09-20 01:28:17,361 INFO [train.py:1198] (1/2) Epoch 44, batch 2750, loss[loss=0.2021, simple_loss=0.2522, pruned_loss=0.05552, ctc_loss=0.1209, cr_loss=0.4172, over 34648.00 frames. ], tot_loss[loss=0.2006, simple_loss=0.2591, pruned_loss=0.05211, ctc_loss=0.1128, cr_loss=0.3849, over 6760517.65 frames. ], batch size: 88, lr: 2.70e-03, grad_scale: 16.0 2024-09-20 01:28:17,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=790682.6666666666, ans=0.125 2024-09-20 01:28:41,044 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.99 vs. limit=15.0 2024-09-20 01:28:45,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=790729.3333333334, ans=0.1 2024-09-20 01:29:02,917 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.51 vs. limit=6.0 2024-09-20 01:29:35,698 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.94 vs. limit=6.0 2024-09-20 01:29:40,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=790916.0, ans=0.1 2024-09-20 01:29:41,623 INFO [train.py:1198] (1/2) Epoch 44, batch 2800, loss[loss=0.2223, simple_loss=0.2739, pruned_loss=0.06331, ctc_loss=0.1362, cr_loss=0.4219, over 23209.00 frames. ], tot_loss[loss=0.2011, simple_loss=0.2595, pruned_loss=0.05227, ctc_loss=0.1133, cr_loss=0.3861, over 6739433.76 frames. ], batch size: 244, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:30:03,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=790962.6666666666, ans=0.125 2024-09-20 01:30:05,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=790962.6666666666, ans=0.125 2024-09-20 01:30:13,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=791009.3333333334, ans=0.2 2024-09-20 01:30:18,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.25 vs. limit=15.0 2024-09-20 01:30:20,949 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.148e+02 2.552e+02 2.984e+02 3.403e+02 1.224e+03, threshold=5.967e+02, percent-clipped=2.0 2024-09-20 01:30:35,098 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.45 vs. limit=15.0 2024-09-20 01:30:41,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=791056.0, ans=0.0 2024-09-20 01:30:50,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=791102.6666666666, ans=0.125 2024-09-20 01:31:03,792 INFO [train.py:1198] (1/2) Epoch 44, batch 2850, loss[loss=0.189, simple_loss=0.2449, pruned_loss=0.04879, ctc_loss=0.1059, cr_loss=0.3577, over 34505.00 frames. ], tot_loss[loss=0.2017, simple_loss=0.26, pruned_loss=0.05253, ctc_loss=0.1137, cr_loss=0.3869, over 6723860.86 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:31:14,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.86 vs. limit=22.5 2024-09-20 01:31:19,831 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.57 vs. limit=10.0 2024-09-20 01:31:32,729 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:31:35,909 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=791196.0, ans=0.125 2024-09-20 01:31:42,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=791242.6666666666, ans=0.125 2024-09-20 01:31:42,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=791242.6666666666, ans=0.0 2024-09-20 01:31:50,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=791242.6666666666, ans=0.125 2024-09-20 01:32:07,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=791289.3333333334, ans=0.0 2024-09-20 01:32:08,557 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=791289.3333333334, ans=0.125 2024-09-20 01:32:17,804 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.12 vs. limit=15.0 2024-09-20 01:32:18,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=791336.0, ans=0.1 2024-09-20 01:32:18,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=791336.0, ans=0.0 2024-09-20 01:32:28,076 INFO [train.py:1198] (1/2) Epoch 44, batch 2900, loss[loss=0.1982, simple_loss=0.2599, pruned_loss=0.04994, ctc_loss=0.1099, cr_loss=0.3671, over 34516.00 frames. ], tot_loss[loss=0.2026, simple_loss=0.2611, pruned_loss=0.05287, ctc_loss=0.1143, cr_loss=0.3886, over 6754743.46 frames. ], batch size: 94, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:32:38,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=791382.6666666666, ans=0.125 2024-09-20 01:32:51,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=791429.3333333334, ans=0.035 2024-09-20 01:33:07,947 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.119e+02 2.495e+02 3.243e+02 4.160e+02 6.518e+02, threshold=6.487e+02, percent-clipped=2.0 2024-09-20 01:33:24,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=791522.6666666666, ans=0.125 2024-09-20 01:33:30,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.90 vs. limit=15.0 2024-09-20 01:33:52,544 INFO [train.py:1198] (1/2) Epoch 44, batch 2950, loss[loss=0.2026, simple_loss=0.2571, pruned_loss=0.05472, ctc_loss=0.1153, cr_loss=0.3886, over 34660.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2599, pruned_loss=0.05239, ctc_loss=0.1133, cr_loss=0.3854, over 6749998.86 frames. ], batch size: 88, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:34:10,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=791662.6666666666, ans=0.0 2024-09-20 01:34:20,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=791662.6666666666, ans=0.0 2024-09-20 01:34:29,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=791709.3333333334, ans=0.125 2024-09-20 01:34:51,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=791756.0, ans=0.125 2024-09-20 01:35:06,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=791802.6666666666, ans=0.125 2024-09-20 01:35:07,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.65 vs. limit=15.0 2024-09-20 01:35:14,654 INFO [train.py:1198] (1/2) Epoch 44, batch 3000, loss[loss=0.197, simple_loss=0.2569, pruned_loss=0.0502, ctc_loss=0.1104, cr_loss=0.3645, over 34514.00 frames. ], tot_loss[loss=0.2011, simple_loss=0.2596, pruned_loss=0.0523, ctc_loss=0.1132, cr_loss=0.3852, over 6750628.47 frames. ], batch size: 94, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:35:14,655 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 01:35:32,283 INFO [train.py:1230] (1/2) Epoch 44, validation: loss=0.1484, simple_loss=0.2418, pruned_loss=0.02358, ctc_loss=0.03899, cr_loss=2.372e-14, over 944034.00 frames. 2024-09-20 01:35:32,284 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-20 01:35:42,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=791849.3333333334, ans=0.0 2024-09-20 01:36:02,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=791896.0, ans=0.1 2024-09-20 01:36:11,701 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.126e+02 2.541e+02 3.140e+02 4.142e+02 1.093e+03, threshold=6.279e+02, percent-clipped=3.0 2024-09-20 01:36:53,807 INFO [train.py:1198] (1/2) Epoch 44, batch 3050, loss[loss=0.1966, simple_loss=0.2497, pruned_loss=0.05292, ctc_loss=0.1116, cr_loss=0.3816, over 34600.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2606, pruned_loss=0.05262, ctc_loss=0.1138, cr_loss=0.3867, over 6742686.15 frames. ], batch size: 89, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:36:55,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=792082.6666666666, ans=0.125 2024-09-20 01:37:03,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=792082.6666666666, ans=0.125 2024-09-20 01:37:07,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=792082.6666666666, ans=0.2 2024-09-20 01:37:10,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=792129.3333333334, ans=0.0 2024-09-20 01:37:25,217 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.24 vs. limit=10.0 2024-09-20 01:37:29,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=792176.0, ans=0.025 2024-09-20 01:37:44,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=792222.6666666666, ans=0.1 2024-09-20 01:37:48,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=792222.6666666666, ans=0.2 2024-09-20 01:38:03,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=792269.3333333334, ans=0.0 2024-09-20 01:38:05,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=792269.3333333334, ans=0.2 2024-09-20 01:38:07,072 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.42 vs. limit=10.0 2024-09-20 01:38:08,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=792269.3333333334, ans=0.0 2024-09-20 01:38:16,073 INFO [train.py:1198] (1/2) Epoch 44, batch 3100, loss[loss=0.2175, simple_loss=0.2773, pruned_loss=0.05817, ctc_loss=0.1243, cr_loss=0.4143, over 34258.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.2605, pruned_loss=0.05259, ctc_loss=0.1138, cr_loss=0.3867, over 6741668.68 frames. ], batch size: 117, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:38:16,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=792316.0, ans=0.125 2024-09-20 01:38:38,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=792362.6666666666, ans=0.05 2024-09-20 01:38:43,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=792362.6666666666, ans=0.125 2024-09-20 01:38:48,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=792409.3333333334, ans=0.0 2024-09-20 01:38:55,182 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.179e+02 2.591e+02 3.080e+02 3.890e+02 8.868e+02, threshold=6.159e+02, percent-clipped=3.0 2024-09-20 01:39:09,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=792456.0, ans=0.125 2024-09-20 01:39:27,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=792502.6666666666, ans=0.2 2024-09-20 01:39:37,058 INFO [train.py:1198] (1/2) Epoch 44, batch 3150, loss[loss=0.2245, simple_loss=0.2829, pruned_loss=0.06163, ctc_loss=0.1288, cr_loss=0.4268, over 33832.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.2606, pruned_loss=0.05254, ctc_loss=0.1137, cr_loss=0.3867, over 6747097.96 frames. ], batch size: 122, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:39:56,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=792596.0, ans=0.125 2024-09-20 01:40:08,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=792642.6666666666, ans=0.025 2024-09-20 01:40:16,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=792642.6666666666, ans=0.2 2024-09-20 01:40:22,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=792642.6666666666, ans=0.1 2024-09-20 01:40:48,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=792736.0, ans=0.1 2024-09-20 01:40:57,933 INFO [train.py:1198] (1/2) Epoch 44, batch 3200, loss[loss=0.1985, simple_loss=0.2538, pruned_loss=0.05262, ctc_loss=0.1131, cr_loss=0.3829, over 34558.00 frames. ], tot_loss[loss=0.2017, simple_loss=0.2603, pruned_loss=0.05249, ctc_loss=0.1136, cr_loss=0.3865, over 6759295.29 frames. ], batch size: 94, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:41:07,951 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=792782.6666666666, ans=0.125 2024-09-20 01:41:25,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=792829.3333333334, ans=0.125 2024-09-20 01:41:38,478 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.188e+02 2.662e+02 3.192e+02 3.733e+02 5.750e+02, threshold=6.384e+02, percent-clipped=0.0 2024-09-20 01:41:58,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=792922.6666666666, ans=0.0 2024-09-20 01:42:04,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=792969.3333333334, ans=0.125 2024-09-20 01:42:15,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.62 vs. limit=22.5 2024-09-20 01:42:19,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=793016.0, ans=0.0 2024-09-20 01:42:20,539 INFO [train.py:1198] (1/2) Epoch 44, batch 3250, loss[loss=0.2067, simple_loss=0.2636, pruned_loss=0.05486, ctc_loss=0.1203, cr_loss=0.4012, over 34651.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.2606, pruned_loss=0.05251, ctc_loss=0.1137, cr_loss=0.3867, over 6769173.60 frames. ], batch size: 98, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:42:33,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=793016.0, ans=0.025 2024-09-20 01:42:50,322 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.17 vs. limit=15.0 2024-09-20 01:43:09,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=793156.0, ans=0.025 2024-09-20 01:43:14,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=793156.0, ans=0.2 2024-09-20 01:43:23,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=793156.0, ans=0.0 2024-09-20 01:43:25,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=793202.6666666666, ans=0.125 2024-09-20 01:43:29,059 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.52 vs. limit=22.5 2024-09-20 01:43:32,062 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.16 vs. limit=15.0 2024-09-20 01:43:36,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=793202.6666666666, ans=0.2 2024-09-20 01:43:39,646 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=793202.6666666666, ans=0.0 2024-09-20 01:43:42,504 INFO [train.py:1198] (1/2) Epoch 44, batch 3300, loss[loss=0.2118, simple_loss=0.2726, pruned_loss=0.05539, ctc_loss=0.1195, cr_loss=0.4061, over 33127.00 frames. ], tot_loss[loss=0.2008, simple_loss=0.2594, pruned_loss=0.05213, ctc_loss=0.113, cr_loss=0.3848, over 6767598.30 frames. ], batch size: 130, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:43:57,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=793296.0, ans=10.0 2024-09-20 01:44:06,220 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.07 vs. limit=15.0 2024-09-20 01:44:14,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=793342.6666666666, ans=0.0 2024-09-20 01:44:21,654 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.259e+02 2.528e+02 2.830e+02 3.480e+02 7.080e+02, threshold=5.660e+02, percent-clipped=1.0 2024-09-20 01:44:46,079 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:44:49,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=793436.0, ans=0.1 2024-09-20 01:45:03,303 INFO [train.py:1198] (1/2) Epoch 44, batch 3350, loss[loss=0.2097, simple_loss=0.2704, pruned_loss=0.05465, ctc_loss=0.1183, cr_loss=0.3997, over 33892.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2601, pruned_loss=0.05247, ctc_loss=0.1137, cr_loss=0.3863, over 6742020.99 frames. ], batch size: 122, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:46:15,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=793669.3333333334, ans=0.0 2024-09-20 01:46:23,564 INFO [train.py:1198] (1/2) Epoch 44, batch 3400, loss[loss=0.1801, simple_loss=0.2364, pruned_loss=0.04505, ctc_loss=0.09811, cr_loss=0.3513, over 34145.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2598, pruned_loss=0.05246, ctc_loss=0.1136, cr_loss=0.3863, over 6731692.73 frames. ], batch size: 78, lr: 2.69e-03, grad_scale: 16.0 2024-09-20 01:46:29,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=793716.0, ans=0.2 2024-09-20 01:46:45,922 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=793762.6666666666, ans=0.0 2024-09-20 01:47:04,944 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.163e+02 2.554e+02 2.925e+02 3.664e+02 6.810e+02, threshold=5.849e+02, percent-clipped=4.0 2024-09-20 01:47:24,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=793856.0, ans=0.125 2024-09-20 01:47:29,325 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=793902.6666666666, ans=0.125 2024-09-20 01:47:45,125 INFO [train.py:1198] (1/2) Epoch 44, batch 3450, loss[loss=0.209, simple_loss=0.2693, pruned_loss=0.05456, ctc_loss=0.1181, cr_loss=0.3966, over 33041.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.2603, pruned_loss=0.05265, ctc_loss=0.1138, cr_loss=0.3869, over 6744292.45 frames. ], batch size: 130, lr: 2.69e-03, grad_scale: 16.0 2024-09-20 01:47:56,670 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=793949.3333333334, ans=0.125 2024-09-20 01:48:12,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.78 vs. limit=22.5 2024-09-20 01:48:19,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.88 vs. limit=15.0 2024-09-20 01:48:28,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=794042.6666666666, ans=0.125 2024-09-20 01:48:43,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=794089.3333333334, ans=0.125 2024-09-20 01:48:53,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=794136.0, ans=0.2 2024-09-20 01:49:05,942 INFO [train.py:1198] (1/2) Epoch 44, batch 3500, loss[loss=0.1796, simple_loss=0.2405, pruned_loss=0.04291, ctc_loss=0.09664, cr_loss=0.3384, over 34464.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2598, pruned_loss=0.05248, ctc_loss=0.1135, cr_loss=0.3863, over 6745632.03 frames. ], batch size: 85, lr: 2.69e-03, grad_scale: 16.0 2024-09-20 01:49:09,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=794182.6666666666, ans=0.125 2024-09-20 01:49:46,430 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.161e+02 2.584e+02 2.946e+02 3.826e+02 7.124e+02, threshold=5.893e+02, percent-clipped=3.0 2024-09-20 01:49:48,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=794276.0, ans=0.0 2024-09-20 01:49:51,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=794276.0, ans=0.0 2024-09-20 01:50:25,829 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.50 vs. limit=22.5 2024-09-20 01:50:26,340 INFO [train.py:1198] (1/2) Epoch 44, batch 3550, loss[loss=0.2083, simple_loss=0.2694, pruned_loss=0.05418, ctc_loss=0.115, cr_loss=0.3967, over 34390.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2602, pruned_loss=0.0527, ctc_loss=0.1139, cr_loss=0.3875, over 6756230.95 frames. ], batch size: 103, lr: 2.69e-03, grad_scale: 16.0 2024-09-20 01:51:10,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=794509.3333333334, ans=0.5 2024-09-20 01:51:23,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=794556.0, ans=0.0 2024-09-20 01:51:42,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=794602.6666666666, ans=0.125 2024-09-20 01:51:47,386 INFO [train.py:1198] (1/2) Epoch 44, batch 3600, loss[loss=0.1996, simple_loss=0.259, pruned_loss=0.0511, ctc_loss=0.1132, cr_loss=0.385, over 34463.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.2602, pruned_loss=0.05268, ctc_loss=0.1139, cr_loss=0.3874, over 6765939.01 frames. ], batch size: 90, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:52:01,361 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=794649.3333333334, ans=0.2 2024-09-20 01:52:04,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=794696.0, ans=0.125 2024-09-20 01:52:28,120 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.169e+02 2.694e+02 3.174e+02 4.328e+02 1.005e+03, threshold=6.349e+02, percent-clipped=7.0 2024-09-20 01:52:28,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=794742.6666666666, ans=0.0 2024-09-20 01:52:32,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.25 vs. limit=12.0 2024-09-20 01:53:08,384 INFO [train.py:1198] (1/2) Epoch 44, batch 3650, loss[loss=0.2273, simple_loss=0.2807, pruned_loss=0.06446, ctc_loss=0.1354, cr_loss=0.4483, over 34464.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2597, pruned_loss=0.05254, ctc_loss=0.1136, cr_loss=0.3871, over 6768784.44 frames. ], batch size: 110, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:53:42,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=794976.0, ans=0.2 2024-09-20 01:53:55,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=795022.6666666666, ans=0.125 2024-09-20 01:54:17,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=795069.3333333334, ans=0.0 2024-09-20 01:54:28,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.20 vs. limit=15.0 2024-09-20 01:54:29,253 INFO [train.py:1198] (1/2) Epoch 44, batch 3700, loss[loss=0.2038, simple_loss=0.2713, pruned_loss=0.04928, ctc_loss=0.1115, cr_loss=0.3847, over 34638.00 frames. ], tot_loss[loss=0.2007, simple_loss=0.2592, pruned_loss=0.05206, ctc_loss=0.1129, cr_loss=0.385, over 6783008.91 frames. ], batch size: 102, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:54:47,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795162.6666666666, ans=0.1 2024-09-20 01:55:09,734 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.137e+02 2.516e+02 2.737e+02 3.539e+02 6.134e+02, threshold=5.474e+02, percent-clipped=0.0 2024-09-20 01:55:36,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=795302.6666666666, ans=0.125 2024-09-20 01:55:50,931 INFO [train.py:1198] (1/2) Epoch 44, batch 3750, loss[loss=0.219, simple_loss=0.279, pruned_loss=0.05862, ctc_loss=0.1247, cr_loss=0.4199, over 34315.00 frames. ], tot_loss[loss=0.2037, simple_loss=0.2625, pruned_loss=0.05315, ctc_loss=0.1151, cr_loss=0.3909, over 6785581.61 frames. ], batch size: 113, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:55:51,435 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 01:56:05,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=795396.0, ans=0.05 2024-09-20 01:56:15,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=795396.0, ans=0.125 2024-09-20 01:56:58,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=795536.0, ans=0.1 2024-09-20 01:57:09,284 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=795536.0, ans=0.125 2024-09-20 01:57:12,209 INFO [train.py:1198] (1/2) Epoch 44, batch 3800, loss[loss=0.2352, simple_loss=0.2824, pruned_loss=0.07008, ctc_loss=0.1459, cr_loss=0.4642, over 30410.00 frames. ], tot_loss[loss=0.2066, simple_loss=0.265, pruned_loss=0.05441, ctc_loss=0.1175, cr_loss=0.3966, over 6674738.31 frames. ], batch size: 175, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:57:54,016 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.244e+02 2.440e+02 2.566e+02 2.757e+02 3.437e+02, threshold=5.132e+02, percent-clipped=0.0 2024-09-20 01:57:56,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=795676.0, ans=0.125 2024-09-20 01:57:56,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=795676.0, ans=0.1 2024-09-20 01:57:59,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=795676.0, ans=0.125 2024-09-20 01:58:35,784 INFO [train.py:1198] (1/2) Epoch 44, batch 3850, loss[loss=0.2265, simple_loss=0.277, pruned_loss=0.06488, ctc_loss=0.1391, cr_loss=0.46, over 24046.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.2667, pruned_loss=0.05581, ctc_loss=0.1204, cr_loss=0.4006, over 6248635.73 frames. ], batch size: 245, lr: 2.69e-03, grad_scale: 32.0 2024-09-20 01:58:43,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=795816.0, ans=0.0 2024-09-20 01:58:46,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=795816.0, ans=0.2 2024-09-20 01:58:54,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=795862.6666666666, ans=0.0 2024-09-20 01:58:57,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=795862.6666666666, ans=0.1 2024-09-20 01:59:04,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=795862.6666666666, ans=0.0 2024-09-20 01:59:06,915 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.50 vs. limit=10.0 2024-09-20 02:00:03,619 INFO [train.py:1198] (1/2) Epoch 45, batch 0, loss[loss=0.1857, simple_loss=0.2463, pruned_loss=0.04547, ctc_loss=0.1003, cr_loss=0.3534, over 34496.00 frames. ], tot_loss[loss=0.1857, simple_loss=0.2463, pruned_loss=0.04547, ctc_loss=0.1003, cr_loss=0.3534, over 34496.00 frames. ], batch size: 85, lr: 2.66e-03, grad_scale: 32.0 2024-09-20 02:00:03,620 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 02:00:20,466 INFO [train.py:1230] (1/2) Epoch 45, validation: loss=0.149, simple_loss=0.2428, pruned_loss=0.02369, ctc_loss=0.03916, cr_loss=2.283e-14, over 944034.00 frames. 2024-09-20 02:00:20,466 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-20 02:00:22,683 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.11 vs. limit=15.0 2024-09-20 02:00:52,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=796035.3333333334, ans=0.125 2024-09-20 02:00:55,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=796035.3333333334, ans=0.025 2024-09-20 02:00:58,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=796035.3333333334, ans=0.025 2024-09-20 02:01:04,503 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.04 vs. limit=15.0 2024-09-20 02:01:10,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=15.0 2024-09-20 02:01:13,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=796082.0, ans=0.125 2024-09-20 02:01:28,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=796128.6666666666, ans=0.05 2024-09-20 02:01:35,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=796128.6666666666, ans=0.1 2024-09-20 02:01:38,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=796128.6666666666, ans=15.0 2024-09-20 02:01:39,735 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.173e+02 2.690e+02 2.885e+02 3.343e+02 7.619e+02, threshold=5.769e+02, percent-clipped=5.0 2024-09-20 02:01:43,089 INFO [train.py:1198] (1/2) Epoch 45, batch 50, loss[loss=0.1816, simple_loss=0.238, pruned_loss=0.04552, ctc_loss=0.1003, cr_loss=0.3524, over 34486.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2605, pruned_loss=0.05226, ctc_loss=0.1136, cr_loss=0.3875, over 1480608.56 frames. ], batch size: 82, lr: 2.66e-03, grad_scale: 32.0 2024-09-20 02:01:59,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=796222.0, ans=0.125 2024-09-20 02:02:01,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=796222.0, ans=0.125 2024-09-20 02:02:16,086 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=796268.6666666666, ans=0.125 2024-09-20 02:02:24,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796268.6666666666, ans=0.1 2024-09-20 02:02:36,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=796315.3333333334, ans=0.125 2024-09-20 02:02:50,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.65 vs. limit=22.5 2024-09-20 02:03:09,236 INFO [train.py:1198] (1/2) Epoch 45, batch 100, loss[loss=0.1951, simple_loss=0.2481, pruned_loss=0.052, ctc_loss=0.1108, cr_loss=0.3981, over 34584.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.2619, pruned_loss=0.05302, ctc_loss=0.1147, cr_loss=0.3906, over 2628836.04 frames. ], batch size: 89, lr: 2.66e-03, grad_scale: 32.0 2024-09-20 02:03:24,716 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.22 vs. limit=15.0 2024-09-20 02:03:30,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=796455.3333333334, ans=0.125 2024-09-20 02:03:43,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=796502.0, ans=0.035 2024-09-20 02:03:53,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=796502.0, ans=0.125 2024-09-20 02:04:12,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=796595.3333333334, ans=0.025 2024-09-20 02:04:18,032 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.71 vs. limit=15.0 2024-09-20 02:04:22,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=796595.3333333334, ans=0.125 2024-09-20 02:04:26,978 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.138e+02 2.570e+02 2.936e+02 3.864e+02 7.058e+02, threshold=5.871e+02, percent-clipped=4.0 2024-09-20 02:04:30,198 INFO [train.py:1198] (1/2) Epoch 45, batch 150, loss[loss=0.1831, simple_loss=0.2405, pruned_loss=0.04597, ctc_loss=0.09859, cr_loss=0.3497, over 34480.00 frames. ], tot_loss[loss=0.2008, simple_loss=0.2599, pruned_loss=0.05192, ctc_loss=0.1128, cr_loss=0.3858, over 3556299.33 frames. ], batch size: 82, lr: 2.65e-03, grad_scale: 32.0 2024-09-20 02:04:33,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=796642.0, ans=0.125 2024-09-20 02:04:40,674 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.27 vs. limit=15.0 2024-09-20 02:04:48,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=796688.6666666666, ans=0.125 2024-09-20 02:04:51,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=796688.6666666666, ans=0.1 2024-09-20 02:05:00,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=796688.6666666666, ans=0.2 2024-09-20 02:05:39,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=796828.6666666666, ans=0.0 2024-09-20 02:05:52,090 INFO [train.py:1198] (1/2) Epoch 45, batch 200, loss[loss=0.2094, simple_loss=0.2683, pruned_loss=0.05519, ctc_loss=0.1207, cr_loss=0.3987, over 31986.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2589, pruned_loss=0.0518, ctc_loss=0.1124, cr_loss=0.3847, over 4270525.89 frames. ], batch size: 145, lr: 2.65e-03, grad_scale: 32.0 2024-09-20 02:06:06,089 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.80 vs. limit=15.0 2024-09-20 02:06:39,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=796968.6666666666, ans=0.0 2024-09-20 02:06:52,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=797015.3333333334, ans=0.125 2024-09-20 02:06:55,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=797015.3333333334, ans=0.125 2024-09-20 02:07:00,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=797062.0, ans=0.025 2024-09-20 02:07:05,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=797062.0, ans=0.125 2024-09-20 02:07:15,363 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.174e+02 2.519e+02 3.194e+02 4.922e+02 9.493e+02, threshold=6.388e+02, percent-clipped=12.0 2024-09-20 02:07:18,568 INFO [train.py:1198] (1/2) Epoch 45, batch 250, loss[loss=0.1959, simple_loss=0.26, pruned_loss=0.04779, ctc_loss=0.1057, cr_loss=0.3749, over 34278.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2593, pruned_loss=0.0518, ctc_loss=0.1125, cr_loss=0.3855, over 4831502.32 frames. ], batch size: 117, lr: 2.65e-03, grad_scale: 32.0 2024-09-20 02:07:32,435 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.21 vs. limit=15.0 2024-09-20 02:07:45,731 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.36 vs. limit=6.0 2024-09-20 02:08:09,809 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=797248.6666666666, ans=0.5 2024-09-20 02:08:40,545 INFO [train.py:1198] (1/2) Epoch 45, batch 300, loss[loss=0.227, simple_loss=0.2838, pruned_loss=0.06309, ctc_loss=0.1334, cr_loss=0.4334, over 34348.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.2587, pruned_loss=0.0516, ctc_loss=0.112, cr_loss=0.384, over 5261611.74 frames. ], batch size: 107, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:08:42,675 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:09:00,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=797388.6666666666, ans=0.07 2024-09-20 02:09:23,974 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2024-09-20 02:09:30,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=797482.0, ans=0.5 2024-09-20 02:09:38,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=797482.0, ans=0.1 2024-09-20 02:10:03,087 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.195e+02 2.546e+02 2.814e+02 3.286e+02 5.845e+02, threshold=5.629e+02, percent-clipped=0.0 2024-09-20 02:10:04,725 INFO [train.py:1198] (1/2) Epoch 45, batch 350, loss[loss=0.1702, simple_loss=0.2321, pruned_loss=0.0389, ctc_loss=0.08798, cr_loss=0.3235, over 34269.00 frames. ], tot_loss[loss=0.2008, simple_loss=0.2595, pruned_loss=0.05201, ctc_loss=0.1128, cr_loss=0.3857, over 5597113.89 frames. ], batch size: 83, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:10:15,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2024-09-20 02:10:17,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=797575.3333333334, ans=0.0 2024-09-20 02:10:21,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=797622.0, ans=0.125 2024-09-20 02:10:25,169 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.92 vs. limit=22.5 2024-09-20 02:10:38,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=797668.6666666666, ans=0.125 2024-09-20 02:10:44,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=797668.6666666666, ans=0.125 2024-09-20 02:10:56,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=797715.3333333334, ans=0.125 2024-09-20 02:10:58,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=797715.3333333334, ans=0.0 2024-09-20 02:11:03,162 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=797715.3333333334, ans=0.0 2024-09-20 02:11:08,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=797715.3333333334, ans=0.05 2024-09-20 02:11:14,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=797762.0, ans=0.125 2024-09-20 02:11:22,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=797762.0, ans=0.125 2024-09-20 02:11:28,684 INFO [train.py:1198] (1/2) Epoch 45, batch 400, loss[loss=0.2049, simple_loss=0.2621, pruned_loss=0.05394, ctc_loss=0.117, cr_loss=0.4118, over 34408.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2593, pruned_loss=0.05177, ctc_loss=0.1125, cr_loss=0.3855, over 5862719.37 frames. ], batch size: 95, lr: 2.65e-03, grad_scale: 32.0 2024-09-20 02:11:38,800 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=797808.6666666666, ans=0.0 2024-09-20 02:11:40,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=797808.6666666666, ans=0.0 2024-09-20 02:11:45,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=797855.3333333334, ans=0.125 2024-09-20 02:11:57,100 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=797855.3333333334, ans=0.125 2024-09-20 02:12:02,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.19 vs. limit=10.0 2024-09-20 02:12:07,725 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.12 vs. limit=10.0 2024-09-20 02:12:08,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=797902.0, ans=0.2 2024-09-20 02:12:11,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=797902.0, ans=0.125 2024-09-20 02:12:18,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=797948.6666666666, ans=0.125 2024-09-20 02:12:50,052 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.118e+02 2.456e+02 2.982e+02 3.624e+02 5.777e+02, threshold=5.963e+02, percent-clipped=1.0 2024-09-20 02:12:51,667 INFO [train.py:1198] (1/2) Epoch 45, batch 450, loss[loss=0.2104, simple_loss=0.2712, pruned_loss=0.0549, ctc_loss=0.1191, cr_loss=0.4016, over 34703.00 frames. ], tot_loss[loss=0.2006, simple_loss=0.2594, pruned_loss=0.05191, ctc_loss=0.1127, cr_loss=0.3853, over 6052533.79 frames. ], batch size: 97, lr: 2.65e-03, grad_scale: 32.0 2024-09-20 02:12:53,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=798042.0, ans=0.1 2024-09-20 02:12:59,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=798042.0, ans=0.125 2024-09-20 02:13:00,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=798042.0, ans=0.125 2024-09-20 02:13:05,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=798042.0, ans=0.0 2024-09-20 02:13:08,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=798088.6666666666, ans=0.125 2024-09-20 02:13:13,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=798088.6666666666, ans=0.0 2024-09-20 02:13:21,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=798088.6666666666, ans=0.2 2024-09-20 02:13:48,222 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=798182.0, ans=0.125 2024-09-20 02:14:14,619 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.079e-02 2024-09-20 02:14:15,785 INFO [train.py:1198] (1/2) Epoch 45, batch 500, loss[loss=0.2315, simple_loss=0.2838, pruned_loss=0.06649, ctc_loss=0.1391, cr_loss=0.4593, over 34443.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2587, pruned_loss=0.05177, ctc_loss=0.1124, cr_loss=0.3847, over 6218740.83 frames. ], batch size: 110, lr: 2.65e-03, grad_scale: 32.0 2024-09-20 02:14:36,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=798322.0, ans=0.05 2024-09-20 02:15:38,427 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.031e+02 2.543e+02 2.924e+02 3.440e+02 7.016e+02, threshold=5.848e+02, percent-clipped=1.0 2024-09-20 02:15:38,900 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:15:38,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=798508.6666666666, ans=0.125 2024-09-20 02:15:40,105 INFO [train.py:1198] (1/2) Epoch 45, batch 550, loss[loss=0.2082, simple_loss=0.2693, pruned_loss=0.05408, ctc_loss=0.1175, cr_loss=0.3868, over 33736.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.2584, pruned_loss=0.05173, ctc_loss=0.1122, cr_loss=0.3842, over 6326408.61 frames. ], batch size: 122, lr: 2.65e-03, grad_scale: 32.0 2024-09-20 02:15:45,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=798508.6666666666, ans=0.1 2024-09-20 02:15:47,750 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.17 vs. limit=22.5 2024-09-20 02:15:56,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=798555.3333333334, ans=0.125 2024-09-20 02:16:03,233 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=798555.3333333334, ans=0.2 2024-09-20 02:16:15,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.59 vs. limit=22.5 2024-09-20 02:17:02,166 INFO [train.py:1198] (1/2) Epoch 45, batch 600, loss[loss=0.2153, simple_loss=0.2764, pruned_loss=0.05663, ctc_loss=0.1226, cr_loss=0.4109, over 34207.00 frames. ], tot_loss[loss=0.2001, simple_loss=0.2588, pruned_loss=0.05176, ctc_loss=0.1124, cr_loss=0.3847, over 6429735.56 frames. ], batch size: 117, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:17:19,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.42 vs. limit=22.5 2024-09-20 02:17:41,923 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=798835.3333333334, ans=0.0 2024-09-20 02:17:46,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=798835.3333333334, ans=0.125 2024-09-20 02:17:47,431 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.14 vs. limit=15.0 2024-09-20 02:17:49,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=798835.3333333334, ans=0.2 2024-09-20 02:18:00,983 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=798882.0, ans=0.0 2024-09-20 02:18:02,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=798882.0, ans=0.125 2024-09-20 02:18:07,574 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=798928.6666666666, ans=0.0 2024-09-20 02:18:07,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=798928.6666666666, ans=0.0 2024-09-20 02:18:27,698 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.121e+02 2.544e+02 2.910e+02 3.313e+02 8.269e+02, threshold=5.819e+02, percent-clipped=3.0 2024-09-20 02:18:27,726 INFO [train.py:1198] (1/2) Epoch 45, batch 650, loss[loss=0.205, simple_loss=0.2668, pruned_loss=0.05199, ctc_loss=0.1166, cr_loss=0.3995, over 34519.00 frames. ], tot_loss[loss=0.1994, simple_loss=0.2582, pruned_loss=0.05141, ctc_loss=0.1118, cr_loss=0.3833, over 6521900.68 frames. ], batch size: 94, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:18:38,424 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=14.25 vs. limit=22.5 2024-09-20 02:18:44,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=799022.0, ans=0.125 2024-09-20 02:18:47,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=799022.0, ans=0.125 2024-09-20 02:18:59,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=799068.6666666666, ans=0.2 2024-09-20 02:19:04,190 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=799068.6666666666, ans=0.125 2024-09-20 02:19:19,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.79 vs. limit=15.0 2024-09-20 02:19:30,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=799115.3333333334, ans=0.125 2024-09-20 02:19:50,079 INFO [train.py:1198] (1/2) Epoch 45, batch 700, loss[loss=0.2012, simple_loss=0.256, pruned_loss=0.05431, ctc_loss=0.1119, cr_loss=0.3832, over 34598.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.2589, pruned_loss=0.05157, ctc_loss=0.1121, cr_loss=0.3841, over 6578776.65 frames. ], batch size: 89, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:20:03,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=799208.6666666666, ans=0.125 2024-09-20 02:20:16,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=799255.3333333334, ans=0.2 2024-09-20 02:20:20,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=799255.3333333334, ans=0.1 2024-09-20 02:20:21,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=799302.0, ans=0.125 2024-09-20 02:20:32,282 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.95 vs. limit=6.0 2024-09-20 02:20:49,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=799348.6666666666, ans=0.125 2024-09-20 02:20:53,063 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=799348.6666666666, ans=0.0 2024-09-20 02:20:54,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=799395.3333333334, ans=0.0 2024-09-20 02:20:58,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=799395.3333333334, ans=0.125 2024-09-20 02:21:01,827 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.64 vs. limit=15.0 2024-09-20 02:21:12,327 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.087e+02 2.569e+02 2.990e+02 3.641e+02 6.801e+02, threshold=5.980e+02, percent-clipped=4.0 2024-09-20 02:21:12,353 INFO [train.py:1198] (1/2) Epoch 45, batch 750, loss[loss=0.2172, simple_loss=0.2749, pruned_loss=0.05893, ctc_loss=0.1259, cr_loss=0.4141, over 34425.00 frames. ], tot_loss[loss=0.1996, simple_loss=0.2584, pruned_loss=0.05153, ctc_loss=0.1121, cr_loss=0.3838, over 6621400.76 frames. ], batch size: 95, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:21:43,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=799488.6666666666, ans=0.125 2024-09-20 02:21:47,045 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=799535.3333333334, ans=0.04949747468305833 2024-09-20 02:22:13,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=799582.0, ans=0.125 2024-09-20 02:22:16,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=799582.0, ans=0.125 2024-09-20 02:22:17,170 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=799582.0, ans=0.0 2024-09-20 02:22:37,892 INFO [train.py:1198] (1/2) Epoch 45, batch 800, loss[loss=0.1807, simple_loss=0.2385, pruned_loss=0.0449, ctc_loss=0.09728, cr_loss=0.3421, over 34440.00 frames. ], tot_loss[loss=0.1996, simple_loss=0.2585, pruned_loss=0.05154, ctc_loss=0.112, cr_loss=0.3835, over 6658011.44 frames. ], batch size: 85, lr: 2.65e-03, grad_scale: 32.0 2024-09-20 02:22:50,553 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2024-09-20 02:23:14,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=799768.6666666666, ans=0.0 2024-09-20 02:23:21,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=799768.6666666666, ans=0.125 2024-09-20 02:23:42,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=799862.0, ans=0.2 2024-09-20 02:24:00,180 INFO [train.py:1198] (1/2) Epoch 45, batch 850, loss[loss=0.2031, simple_loss=0.2693, pruned_loss=0.04986, ctc_loss=0.1118, cr_loss=0.3724, over 34384.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.2586, pruned_loss=0.05159, ctc_loss=0.112, cr_loss=0.3836, over 6690572.74 frames. ], batch size: 103, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:24:01,735 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.163e+02 2.533e+02 2.962e+02 3.510e+02 1.214e+03, threshold=5.925e+02, percent-clipped=2.0 2024-09-20 02:24:34,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=800002.0, ans=0.2 2024-09-20 02:24:43,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=800002.0, ans=0.125 2024-09-20 02:25:08,081 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=800095.3333333334, ans=0.1 2024-09-20 02:25:22,687 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=800142.0, ans=0.1 2024-09-20 02:25:23,930 INFO [train.py:1198] (1/2) Epoch 45, batch 900, loss[loss=0.1805, simple_loss=0.2418, pruned_loss=0.04275, ctc_loss=0.09642, cr_loss=0.3596, over 34465.00 frames. ], tot_loss[loss=0.2003, simple_loss=0.2591, pruned_loss=0.05182, ctc_loss=0.1125, cr_loss=0.3846, over 6695990.56 frames. ], batch size: 85, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:25:31,372 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.02 vs. limit=22.5 2024-09-20 02:25:40,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=800188.6666666666, ans=0.025 2024-09-20 02:25:56,330 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.10 vs. limit=15.0 2024-09-20 02:26:04,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=800235.3333333334, ans=6.0 2024-09-20 02:26:17,358 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:26:27,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=800282.0, ans=0.125 2024-09-20 02:26:28,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=800282.0, ans=0.125 2024-09-20 02:26:38,416 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=800328.6666666666, ans=0.05 2024-09-20 02:26:47,946 INFO [train.py:1198] (1/2) Epoch 45, batch 950, loss[loss=0.1932, simple_loss=0.2482, pruned_loss=0.05031, ctc_loss=0.1089, cr_loss=0.3949, over 34672.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.259, pruned_loss=0.05175, ctc_loss=0.1123, cr_loss=0.3843, over 6696863.04 frames. ], batch size: 87, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:26:49,583 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.168e+02 2.621e+02 2.962e+02 3.692e+02 5.265e+02, threshold=5.925e+02, percent-clipped=0.0 2024-09-20 02:26:58,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=800375.3333333334, ans=0.5 2024-09-20 02:27:31,680 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.04 vs. limit=15.0 2024-09-20 02:27:43,281 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=22.5 2024-09-20 02:27:45,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=800515.3333333334, ans=0.0 2024-09-20 02:27:49,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=800515.3333333334, ans=0.125 2024-09-20 02:28:00,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=800562.0, ans=0.025 2024-09-20 02:28:09,730 INFO [train.py:1198] (1/2) Epoch 45, batch 1000, loss[loss=0.1936, simple_loss=0.2505, pruned_loss=0.05002, ctc_loss=0.1084, cr_loss=0.3746, over 34501.00 frames. ], tot_loss[loss=0.201, simple_loss=0.2596, pruned_loss=0.05214, ctc_loss=0.113, cr_loss=0.3858, over 6691886.94 frames. ], batch size: 90, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:28:31,679 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2024-09-20 02:29:00,347 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.92 vs. limit=15.0 2024-09-20 02:29:10,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=800748.6666666666, ans=0.1 2024-09-20 02:29:15,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=800795.3333333334, ans=0.125 2024-09-20 02:29:22,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=800795.3333333334, ans=0.07 2024-09-20 02:29:36,001 INFO [train.py:1198] (1/2) Epoch 45, batch 1050, loss[loss=0.213, simple_loss=0.2793, pruned_loss=0.05409, ctc_loss=0.1141, cr_loss=0.3907, over 34581.00 frames. ], tot_loss[loss=0.2006, simple_loss=0.2591, pruned_loss=0.05203, ctc_loss=0.1128, cr_loss=0.3853, over 6701708.19 frames. ], batch size: 99, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:29:37,574 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.120e+02 2.565e+02 2.903e+02 3.310e+02 5.014e+02, threshold=5.807e+02, percent-clipped=0.0 2024-09-20 02:29:44,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=800842.0, ans=0.1 2024-09-20 02:29:58,452 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.03 vs. limit=15.0 2024-09-20 02:30:29,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=800982.0, ans=0.2 2024-09-20 02:30:33,243 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2024-09-20 02:30:36,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.02 vs. limit=15.0 2024-09-20 02:30:45,747 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=801028.6666666666, ans=0.0 2024-09-20 02:30:52,919 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.51 vs. limit=15.0 2024-09-20 02:30:53,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=801028.6666666666, ans=0.0 2024-09-20 02:30:54,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=801028.6666666666, ans=0.125 2024-09-20 02:30:58,431 INFO [train.py:1198] (1/2) Epoch 45, batch 1100, loss[loss=0.2048, simple_loss=0.2618, pruned_loss=0.05413, ctc_loss=0.1172, cr_loss=0.403, over 34349.00 frames. ], tot_loss[loss=0.2003, simple_loss=0.2588, pruned_loss=0.05195, ctc_loss=0.1126, cr_loss=0.3852, over 6716122.28 frames. ], batch size: 91, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:31:30,166 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=801168.6666666666, ans=0.125 2024-09-20 02:31:36,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=801168.6666666666, ans=0.125 2024-09-20 02:31:46,772 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:31:56,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=801215.3333333334, ans=0.0 2024-09-20 02:32:03,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=801262.0, ans=0.2 2024-09-20 02:32:14,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801262.0, ans=0.1 2024-09-20 02:32:16,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=801262.0, ans=0.125 2024-09-20 02:32:16,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=801262.0, ans=0.125 2024-09-20 02:32:22,737 INFO [train.py:1198] (1/2) Epoch 45, batch 1150, loss[loss=0.1992, simple_loss=0.2549, pruned_loss=0.05249, ctc_loss=0.1146, cr_loss=0.3878, over 34349.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2588, pruned_loss=0.05204, ctc_loss=0.1128, cr_loss=0.3853, over 6715094.12 frames. ], batch size: 91, lr: 2.65e-03, grad_scale: 8.0 2024-09-20 02:32:25,993 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.125e+02 2.562e+02 2.892e+02 3.403e+02 5.473e+02, threshold=5.784e+02, percent-clipped=0.0 2024-09-20 02:32:31,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801308.6666666666, ans=0.1 2024-09-20 02:32:33,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=801308.6666666666, ans=0.125 2024-09-20 02:32:37,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=801355.3333333334, ans=0.125 2024-09-20 02:33:11,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=801448.6666666666, ans=0.1 2024-09-20 02:33:12,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=801448.6666666666, ans=0.0 2024-09-20 02:33:16,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=801448.6666666666, ans=0.0 2024-09-20 02:33:46,924 INFO [train.py:1198] (1/2) Epoch 45, batch 1200, loss[loss=0.2092, simple_loss=0.2671, pruned_loss=0.05547, ctc_loss=0.1186, cr_loss=0.4137, over 34557.00 frames. ], tot_loss[loss=0.2011, simple_loss=0.2596, pruned_loss=0.05223, ctc_loss=0.1133, cr_loss=0.3867, over 6707918.50 frames. ], batch size: 99, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:33:55,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=801542.0, ans=0.1 2024-09-20 02:34:15,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=801588.6666666666, ans=0.125 2024-09-20 02:34:33,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=801635.3333333334, ans=0.0 2024-09-20 02:35:03,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=801728.6666666666, ans=0.1 2024-09-20 02:35:06,894 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:35:09,589 INFO [train.py:1198] (1/2) Epoch 45, batch 1250, loss[loss=0.2241, simple_loss=0.282, pruned_loss=0.06142, ctc_loss=0.1293, cr_loss=0.4363, over 34339.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2602, pruned_loss=0.05232, ctc_loss=0.1134, cr_loss=0.3877, over 6741040.16 frames. ], batch size: 107, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:35:12,849 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.127e+02 2.510e+02 2.800e+02 3.466e+02 6.035e+02, threshold=5.600e+02, percent-clipped=1.0 2024-09-20 02:35:36,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=801822.0, ans=0.125 2024-09-20 02:35:43,029 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:35:46,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=801868.6666666666, ans=0.0 2024-09-20 02:35:52,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=801868.6666666666, ans=0.125 2024-09-20 02:36:14,452 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=801915.3333333334, ans=0.1 2024-09-20 02:36:33,718 INFO [train.py:1198] (1/2) Epoch 45, batch 1300, loss[loss=0.2096, simple_loss=0.2712, pruned_loss=0.05387, ctc_loss=0.1203, cr_loss=0.4051, over 32945.00 frames. ], tot_loss[loss=0.2011, simple_loss=0.2598, pruned_loss=0.05216, ctc_loss=0.113, cr_loss=0.3866, over 6745517.32 frames. ], batch size: 130, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:36:53,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=802055.3333333334, ans=0.125 2024-09-20 02:36:58,731 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=802055.3333333334, ans=0.5 2024-09-20 02:37:26,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=802148.6666666666, ans=0.0 2024-09-20 02:37:40,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=802195.3333333334, ans=0.07 2024-09-20 02:37:41,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=802195.3333333334, ans=0.1 2024-09-20 02:37:48,530 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=22.5 2024-09-20 02:37:57,796 INFO [train.py:1198] (1/2) Epoch 45, batch 1350, loss[loss=0.2017, simple_loss=0.2616, pruned_loss=0.05135, ctc_loss=0.1142, cr_loss=0.4082, over 34537.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2596, pruned_loss=0.05213, ctc_loss=0.1129, cr_loss=0.3861, over 6764411.32 frames. ], batch size: 94, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:38:00,977 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.181e+02 2.684e+02 3.301e+02 4.481e+02 7.367e+02, threshold=6.601e+02, percent-clipped=8.0 2024-09-20 02:38:21,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.62 vs. limit=15.0 2024-09-20 02:38:25,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=802288.6666666666, ans=0.0 2024-09-20 02:38:27,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=802288.6666666666, ans=0.025 2024-09-20 02:38:38,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=802335.3333333334, ans=0.125 2024-09-20 02:38:40,060 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=802335.3333333334, ans=0.025 2024-09-20 02:38:48,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.55 vs. limit=15.0 2024-09-20 02:38:56,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=802382.0, ans=0.2 2024-09-20 02:39:17,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.55 vs. limit=15.0 2024-09-20 02:39:19,493 INFO [train.py:1198] (1/2) Epoch 45, batch 1400, loss[loss=0.1593, simple_loss=0.2171, pruned_loss=0.03641, ctc_loss=0.08261, cr_loss=0.3032, over 34265.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2592, pruned_loss=0.0519, ctc_loss=0.1124, cr_loss=0.3849, over 6776741.30 frames. ], batch size: 80, lr: 2.65e-03, grad_scale: 16.0 2024-09-20 02:39:23,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=802475.3333333334, ans=0.1 2024-09-20 02:39:25,272 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.71 vs. limit=10.0 2024-09-20 02:39:26,505 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=802475.3333333334, ans=0.125 2024-09-20 02:39:28,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=802475.3333333334, ans=0.0 2024-09-20 02:39:28,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=802475.3333333334, ans=0.0 2024-09-20 02:39:42,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=802522.0, ans=0.125 2024-09-20 02:39:42,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=802522.0, ans=0.125 2024-09-20 02:39:47,620 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=802522.0, ans=0.0 2024-09-20 02:39:47,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=802522.0, ans=0.2 2024-09-20 02:40:03,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2024-09-20 02:40:10,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=802615.3333333334, ans=10.0 2024-09-20 02:40:37,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=802662.0, ans=0.0 2024-09-20 02:40:43,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=802662.0, ans=0.0 2024-09-20 02:40:47,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=802662.0, ans=0.0 2024-09-20 02:40:50,277 INFO [train.py:1198] (1/2) Epoch 45, batch 1450, loss[loss=0.2204, simple_loss=0.2758, pruned_loss=0.06098, ctc_loss=0.1293, cr_loss=0.4325, over 34476.00 frames. ], tot_loss[loss=0.2008, simple_loss=0.2597, pruned_loss=0.05198, ctc_loss=0.1126, cr_loss=0.386, over 6773371.76 frames. ], batch size: 110, lr: 2.64e-03, grad_scale: 16.0 2024-09-20 02:40:50,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=802708.6666666666, ans=0.95 2024-09-20 02:40:52,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=802708.6666666666, ans=0.125 2024-09-20 02:40:53,504 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.119e+02 2.591e+02 2.932e+02 3.693e+02 6.496e+02, threshold=5.864e+02, percent-clipped=0.0 2024-09-20 02:40:55,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=802708.6666666666, ans=0.125 2024-09-20 02:40:57,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=802708.6666666666, ans=0.125 2024-09-20 02:42:12,481 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=802942.0, ans=0.09899494936611666 2024-09-20 02:42:13,730 INFO [train.py:1198] (1/2) Epoch 45, batch 1500, loss[loss=0.2046, simple_loss=0.2652, pruned_loss=0.05254, ctc_loss=0.1139, cr_loss=0.4024, over 34448.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2598, pruned_loss=0.05198, ctc_loss=0.1126, cr_loss=0.386, over 6774189.54 frames. ], batch size: 100, lr: 2.64e-03, grad_scale: 16.0 2024-09-20 02:42:17,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=802942.0, ans=0.0 2024-09-20 02:42:19,013 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 02:42:19,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=802942.0, ans=0.0 2024-09-20 02:42:27,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=802942.0, ans=0.1 2024-09-20 02:42:27,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=802942.0, ans=0.0 2024-09-20 02:42:32,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=802988.6666666666, ans=0.125 2024-09-20 02:43:04,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=803082.0, ans=0.0 2024-09-20 02:43:10,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=803082.0, ans=0.125 2024-09-20 02:43:28,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=803128.6666666666, ans=0.0 2024-09-20 02:43:37,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=803175.3333333334, ans=0.125 2024-09-20 02:43:38,217 INFO [train.py:1198] (1/2) Epoch 45, batch 1550, loss[loss=0.2261, simple_loss=0.2835, pruned_loss=0.062, ctc_loss=0.1349, cr_loss=0.4408, over 34446.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.26, pruned_loss=0.05224, ctc_loss=0.1131, cr_loss=0.3868, over 6745576.60 frames. ], batch size: 105, lr: 2.64e-03, grad_scale: 16.0 2024-09-20 02:43:40,212 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=803175.3333333334, ans=0.1 2024-09-20 02:43:41,520 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.167e+02 2.586e+02 2.941e+02 3.413e+02 6.440e+02, threshold=5.883e+02, percent-clipped=2.0 2024-09-20 02:43:46,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=803175.3333333334, ans=0.2 2024-09-20 02:44:16,648 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.31 vs. limit=6.0 2024-09-20 02:44:26,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=9.40 vs. limit=12.0 2024-09-20 02:44:32,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=803315.3333333334, ans=0.2 2024-09-20 02:44:33,518 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.77 vs. limit=5.0 2024-09-20 02:44:47,773 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=803362.0, ans=0.1 2024-09-20 02:44:58,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2024-09-20 02:44:59,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=803362.0, ans=0.125 2024-09-20 02:45:02,029 INFO [train.py:1198] (1/2) Epoch 45, batch 1600, loss[loss=0.2033, simple_loss=0.2653, pruned_loss=0.05185, ctc_loss=0.1108, cr_loss=0.384, over 34555.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2595, pruned_loss=0.05209, ctc_loss=0.1129, cr_loss=0.3859, over 6726200.49 frames. ], batch size: 99, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 02:45:29,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=803455.3333333334, ans=15.0 2024-09-20 02:45:38,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=803502.0, ans=0.0 2024-09-20 02:45:59,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=803548.6666666666, ans=0.125 2024-09-20 02:46:04,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=803548.6666666666, ans=0.125 2024-09-20 02:46:13,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=803595.3333333334, ans=0.125 2024-09-20 02:46:14,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=803595.3333333334, ans=0.125 2024-09-20 02:46:18,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=803595.3333333334, ans=0.125 2024-09-20 02:46:24,212 INFO [train.py:1198] (1/2) Epoch 45, batch 1650, loss[loss=0.219, simple_loss=0.2781, pruned_loss=0.05903, ctc_loss=0.1252, cr_loss=0.4206, over 34418.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2596, pruned_loss=0.05212, ctc_loss=0.113, cr_loss=0.3852, over 6720573.73 frames. ], batch size: 103, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 02:46:26,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=803642.0, ans=0.125 2024-09-20 02:46:27,533 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.167e+02 2.541e+02 2.989e+02 3.584e+02 6.955e+02, threshold=5.979e+02, percent-clipped=1.0 2024-09-20 02:46:28,018 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=803642.0, ans=0.125 2024-09-20 02:46:31,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=803642.0, ans=0.125 2024-09-20 02:46:41,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=803688.6666666666, ans=0.125 2024-09-20 02:46:44,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=803688.6666666666, ans=0.125 2024-09-20 02:47:14,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=803782.0, ans=0.0 2024-09-20 02:47:21,666 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.86 vs. limit=22.5 2024-09-20 02:47:38,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=803828.6666666666, ans=0.025 2024-09-20 02:47:42,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.26 vs. limit=15.0 2024-09-20 02:47:48,105 INFO [train.py:1198] (1/2) Epoch 45, batch 1700, loss[loss=0.1722, simple_loss=0.2303, pruned_loss=0.04087, ctc_loss=0.09358, cr_loss=0.3414, over 34272.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2592, pruned_loss=0.05184, ctc_loss=0.1124, cr_loss=0.3845, over 6745407.01 frames. ], batch size: 80, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 02:47:57,126 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.11 vs. limit=10.0 2024-09-20 02:48:01,642 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=803875.3333333334, ans=0.125 2024-09-20 02:48:01,677 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=803875.3333333334, ans=0.1 2024-09-20 02:48:19,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=803968.6666666666, ans=0.0 2024-09-20 02:49:12,309 INFO [train.py:1198] (1/2) Epoch 45, batch 1750, loss[loss=0.1696, simple_loss=0.226, pruned_loss=0.04094, ctc_loss=0.09014, cr_loss=0.3299, over 34114.00 frames. ], tot_loss[loss=0.2001, simple_loss=0.2589, pruned_loss=0.05174, ctc_loss=0.1122, cr_loss=0.3843, over 6754708.80 frames. ], batch size: 78, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 02:49:15,544 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.179e+02 2.462e+02 2.790e+02 3.308e+02 6.963e+02, threshold=5.579e+02, percent-clipped=1.0 2024-09-20 02:49:29,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=804155.3333333334, ans=0.0 2024-09-20 02:49:31,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=804155.3333333334, ans=10.0 2024-09-20 02:49:37,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=804155.3333333334, ans=0.05 2024-09-20 02:49:45,715 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=804202.0, ans=0.95 2024-09-20 02:50:16,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=804295.3333333334, ans=0.0 2024-09-20 02:50:18,568 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2024-09-20 02:50:27,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=804295.3333333334, ans=0.035 2024-09-20 02:50:34,353 INFO [train.py:1198] (1/2) Epoch 45, batch 1800, loss[loss=0.215, simple_loss=0.2721, pruned_loss=0.05856, ctc_loss=0.122, cr_loss=0.4082, over 34695.00 frames. ], tot_loss[loss=0.2006, simple_loss=0.2593, pruned_loss=0.05195, ctc_loss=0.1126, cr_loss=0.385, over 6757517.25 frames. ], batch size: 97, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 02:51:24,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=804482.0, ans=0.2 2024-09-20 02:51:41,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=804528.6666666666, ans=0.0 2024-09-20 02:51:44,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=804528.6666666666, ans=0.0 2024-09-20 02:51:50,157 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=15.0 2024-09-20 02:51:59,097 INFO [train.py:1198] (1/2) Epoch 45, batch 1850, loss[loss=0.2091, simple_loss=0.2697, pruned_loss=0.05425, ctc_loss=0.1189, cr_loss=0.4047, over 34440.00 frames. ], tot_loss[loss=0.2006, simple_loss=0.2592, pruned_loss=0.05204, ctc_loss=0.1127, cr_loss=0.3851, over 6764888.36 frames. ], batch size: 100, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 02:52:02,380 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.180e+02 2.670e+02 3.194e+02 4.379e+02 7.103e+02, threshold=6.388e+02, percent-clipped=8.0 2024-09-20 02:52:06,377 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.93 vs. limit=6.0 2024-09-20 02:52:26,244 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer_na.min_abs, batch_count=804622.0, ans=0.02 2024-09-20 02:52:57,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=804715.3333333334, ans=0.125 2024-09-20 02:53:04,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=804715.3333333334, ans=0.0 2024-09-20 02:53:08,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=804762.0, ans=0.1 2024-09-20 02:53:23,243 INFO [train.py:1198] (1/2) Epoch 45, batch 1900, loss[loss=0.205, simple_loss=0.2654, pruned_loss=0.05306, ctc_loss=0.1154, cr_loss=0.3839, over 34380.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.26, pruned_loss=0.05232, ctc_loss=0.1132, cr_loss=0.3863, over 6774239.38 frames. ], batch size: 103, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 02:53:25,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=804808.6666666666, ans=0.1 2024-09-20 02:53:40,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=804855.3333333334, ans=0.0 2024-09-20 02:53:45,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=804855.3333333334, ans=0.025 2024-09-20 02:54:24,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=804948.6666666666, ans=0.0 2024-09-20 02:54:47,497 INFO [train.py:1198] (1/2) Epoch 45, batch 1950, loss[loss=0.2074, simple_loss=0.2633, pruned_loss=0.05578, ctc_loss=0.121, cr_loss=0.3936, over 34374.00 frames. ], tot_loss[loss=0.2024, simple_loss=0.2612, pruned_loss=0.05266, ctc_loss=0.1139, cr_loss=0.3884, over 6790854.03 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 02:54:48,309 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.79 vs. limit=15.0 2024-09-20 02:54:50,877 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.146e+02 2.547e+02 2.872e+02 3.891e+02 6.418e+02, threshold=5.744e+02, percent-clipped=1.0 2024-09-20 02:55:33,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=805135.3333333334, ans=0.0 2024-09-20 02:55:33,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=805135.3333333334, ans=0.125 2024-09-20 02:55:48,833 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2024-09-20 02:56:05,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=805228.6666666666, ans=0.0 2024-09-20 02:56:11,606 INFO [train.py:1198] (1/2) Epoch 45, batch 2000, loss[loss=0.1838, simple_loss=0.2378, pruned_loss=0.04731, ctc_loss=0.1036, cr_loss=0.3625, over 34188.00 frames. ], tot_loss[loss=0.2025, simple_loss=0.2613, pruned_loss=0.05265, ctc_loss=0.114, cr_loss=0.3884, over 6765196.25 frames. ], batch size: 78, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 02:56:27,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=805322.0, ans=0.125 2024-09-20 02:56:33,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=805322.0, ans=0.0 2024-09-20 02:56:51,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=805368.6666666666, ans=0.09899494936611666 2024-09-20 02:57:01,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=805415.3333333334, ans=0.0 2024-09-20 02:57:17,933 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=805462.0, ans=0.0 2024-09-20 02:57:24,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=805462.0, ans=0.0 2024-09-20 02:57:34,297 INFO [train.py:1198] (1/2) Epoch 45, batch 2050, loss[loss=0.1705, simple_loss=0.2285, pruned_loss=0.04037, ctc_loss=0.09151, cr_loss=0.3355, over 34481.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2603, pruned_loss=0.05234, ctc_loss=0.1134, cr_loss=0.3874, over 6756605.78 frames. ], batch size: 82, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 02:57:37,576 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.249e+02 2.668e+02 2.974e+02 3.732e+02 7.871e+02, threshold=5.948e+02, percent-clipped=5.0 2024-09-20 02:58:02,928 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.84 vs. limit=15.0 2024-09-20 02:58:53,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=805695.3333333334, ans=0.0 2024-09-20 02:58:58,268 INFO [train.py:1198] (1/2) Epoch 45, batch 2100, loss[loss=0.1977, simple_loss=0.2555, pruned_loss=0.05122, ctc_loss=0.11, cr_loss=0.3843, over 34533.00 frames. ], tot_loss[loss=0.2007, simple_loss=0.2595, pruned_loss=0.05192, ctc_loss=0.1127, cr_loss=0.3856, over 6770499.78 frames. ], batch size: 94, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 02:59:01,945 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=805742.0, ans=0.0 2024-09-20 02:59:03,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=805742.0, ans=0.07 2024-09-20 02:59:06,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.82 vs. limit=15.0 2024-09-20 02:59:10,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=805742.0, ans=0.1 2024-09-20 02:59:18,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=805788.6666666666, ans=0.025 2024-09-20 02:59:23,137 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=805788.6666666666, ans=0.125 2024-09-20 02:59:23,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=805788.6666666666, ans=0.0 2024-09-20 02:59:36,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=805835.3333333334, ans=0.125 2024-09-20 03:00:05,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=805928.6666666666, ans=0.07 2024-09-20 03:00:13,061 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.90 vs. limit=6.0 2024-09-20 03:00:21,610 INFO [train.py:1198] (1/2) Epoch 45, batch 2150, loss[loss=0.2029, simple_loss=0.2609, pruned_loss=0.05342, ctc_loss=0.1148, cr_loss=0.378, over 34352.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.2588, pruned_loss=0.05157, ctc_loss=0.112, cr_loss=0.3844, over 6789942.69 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 03:00:24,822 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.138e+02 2.514e+02 2.979e+02 3.915e+02 7.087e+02, threshold=5.957e+02, percent-clipped=5.0 2024-09-20 03:00:25,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=805975.3333333334, ans=0.2 2024-09-20 03:00:48,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=806022.0, ans=0.125 2024-09-20 03:01:09,850 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:01:29,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=806162.0, ans=0.1 2024-09-20 03:01:43,929 INFO [train.py:1198] (1/2) Epoch 45, batch 2200, loss[loss=0.2051, simple_loss=0.2697, pruned_loss=0.05131, ctc_loss=0.1135, cr_loss=0.3804, over 34452.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2592, pruned_loss=0.05184, ctc_loss=0.1124, cr_loss=0.3848, over 6783569.42 frames. ], batch size: 100, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 03:02:05,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=806255.3333333334, ans=0.05 2024-09-20 03:02:09,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=806255.3333333334, ans=0.125 2024-09-20 03:02:12,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=806255.3333333334, ans=0.0 2024-09-20 03:02:20,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=806302.0, ans=0.09899494936611666 2024-09-20 03:02:31,057 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=15.0 2024-09-20 03:02:36,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=806348.6666666666, ans=0.0 2024-09-20 03:03:08,131 INFO [train.py:1198] (1/2) Epoch 45, batch 2250, loss[loss=0.2042, simple_loss=0.2621, pruned_loss=0.05362, ctc_loss=0.115, cr_loss=0.402, over 34420.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.2587, pruned_loss=0.05169, ctc_loss=0.1122, cr_loss=0.384, over 6780482.85 frames. ], batch size: 95, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 03:03:11,233 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 2.573e+02 3.008e+02 3.546e+02 5.479e+02, threshold=6.015e+02, percent-clipped=0.0 2024-09-20 03:03:16,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=806442.0, ans=0.125 2024-09-20 03:03:29,508 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=806488.6666666666, ans=0.2 2024-09-20 03:03:29,670 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:03:41,835 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-09-20 03:03:44,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=806535.3333333334, ans=0.125 2024-09-20 03:04:00,870 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:04:15,439 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=806628.6666666666, ans=0.2 2024-09-20 03:04:31,703 INFO [train.py:1198] (1/2) Epoch 45, batch 2300, loss[loss=0.1879, simple_loss=0.2432, pruned_loss=0.04867, ctc_loss=0.1028, cr_loss=0.3662, over 34301.00 frames. ], tot_loss[loss=0.1989, simple_loss=0.2576, pruned_loss=0.05129, ctc_loss=0.1114, cr_loss=0.3818, over 6766141.58 frames. ], batch size: 83, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 03:04:37,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.19 vs. limit=15.0 2024-09-20 03:04:56,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=806722.0, ans=0.0 2024-09-20 03:05:04,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=806768.6666666666, ans=0.2 2024-09-20 03:05:08,893 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=17.62 vs. limit=22.5 2024-09-20 03:05:11,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=806768.6666666666, ans=0.2 2024-09-20 03:05:36,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=806862.0, ans=0.05 2024-09-20 03:05:37,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=806862.0, ans=0.025 2024-09-20 03:05:45,229 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.56 vs. limit=12.0 2024-09-20 03:05:55,605 INFO [train.py:1198] (1/2) Epoch 45, batch 2350, loss[loss=0.2039, simple_loss=0.2675, pruned_loss=0.05121, ctc_loss=0.1139, cr_loss=0.3747, over 34718.00 frames. ], tot_loss[loss=0.1994, simple_loss=0.2581, pruned_loss=0.05149, ctc_loss=0.1117, cr_loss=0.3828, over 6772454.03 frames. ], batch size: 97, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 03:05:57,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=806908.6666666666, ans=0.2 2024-09-20 03:05:59,001 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.131e+02 2.590e+02 2.764e+02 3.384e+02 6.094e+02, threshold=5.527e+02, percent-clipped=1.0 2024-09-20 03:06:04,113 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=806908.6666666666, ans=0.1 2024-09-20 03:06:07,937 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.69 vs. limit=15.0 2024-09-20 03:06:20,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=806955.3333333334, ans=0.0 2024-09-20 03:07:10,309 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=807095.3333333334, ans=0.0 2024-09-20 03:07:13,451 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=807095.3333333334, ans=0.0 2024-09-20 03:07:18,080 INFO [train.py:1198] (1/2) Epoch 45, batch 2400, loss[loss=0.2057, simple_loss=0.2605, pruned_loss=0.05542, ctc_loss=0.1186, cr_loss=0.4107, over 34600.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2588, pruned_loss=0.05189, ctc_loss=0.1125, cr_loss=0.3848, over 6776335.64 frames. ], batch size: 89, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 03:07:24,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=807142.0, ans=0.125 2024-09-20 03:07:57,265 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=11.37 vs. limit=15.0 2024-09-20 03:08:11,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=22.5 2024-09-20 03:08:14,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=807282.0, ans=0.07 2024-09-20 03:08:20,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.85 vs. limit=10.0 2024-09-20 03:08:34,968 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=807328.6666666666, ans=0.125 2024-09-20 03:08:41,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=807375.3333333334, ans=0.0 2024-09-20 03:08:42,825 INFO [train.py:1198] (1/2) Epoch 45, batch 2450, loss[loss=0.2037, simple_loss=0.2618, pruned_loss=0.05331, ctc_loss=0.1162, cr_loss=0.3935, over 34423.00 frames. ], tot_loss[loss=0.2012, simple_loss=0.2598, pruned_loss=0.05227, ctc_loss=0.1133, cr_loss=0.3867, over 6751154.40 frames. ], batch size: 95, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 03:08:47,540 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.195e+02 2.633e+02 3.065e+02 4.031e+02 8.627e+02, threshold=6.130e+02, percent-clipped=3.0 2024-09-20 03:08:56,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=807375.3333333334, ans=0.125 2024-09-20 03:09:20,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=807468.6666666666, ans=0.035 2024-09-20 03:10:06,483 INFO [train.py:1198] (1/2) Epoch 45, batch 2500, loss[loss=0.2055, simple_loss=0.2709, pruned_loss=0.05102, ctc_loss=0.1134, cr_loss=0.3814, over 34477.00 frames. ], tot_loss[loss=0.201, simple_loss=0.2596, pruned_loss=0.05219, ctc_loss=0.1131, cr_loss=0.3864, over 6762945.87 frames. ], batch size: 100, lr: 2.64e-03, grad_scale: 32.0 2024-09-20 03:10:11,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=807608.6666666666, ans=0.125 2024-09-20 03:10:44,746 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=807702.0, ans=0.125 2024-09-20 03:10:51,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=807702.0, ans=0.1 2024-09-20 03:11:01,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=807748.6666666666, ans=0.0 2024-09-20 03:11:08,526 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.27 vs. limit=22.5 2024-09-20 03:11:08,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.96 vs. limit=15.0 2024-09-20 03:11:19,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=807795.3333333334, ans=0.2 2024-09-20 03:11:30,962 INFO [train.py:1198] (1/2) Epoch 45, batch 2550, loss[loss=0.1673, simple_loss=0.2218, pruned_loss=0.04067, ctc_loss=0.09188, cr_loss=0.3239, over 34165.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2595, pruned_loss=0.05216, ctc_loss=0.113, cr_loss=0.3861, over 6766356.54 frames. ], batch size: 78, lr: 2.64e-03, grad_scale: 16.0 2024-09-20 03:11:31,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=807842.0, ans=0.1 2024-09-20 03:11:37,262 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.033e+02 2.508e+02 2.824e+02 3.513e+02 7.319e+02, threshold=5.648e+02, percent-clipped=3.0 2024-09-20 03:11:41,586 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.56 vs. limit=22.5 2024-09-20 03:12:03,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=807935.3333333334, ans=0.0 2024-09-20 03:12:18,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=807982.0, ans=0.0 2024-09-20 03:12:25,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=807982.0, ans=0.125 2024-09-20 03:12:41,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=808028.6666666666, ans=0.0 2024-09-20 03:12:53,088 INFO [train.py:1198] (1/2) Epoch 45, batch 2600, loss[loss=0.2012, simple_loss=0.262, pruned_loss=0.05139, ctc_loss=0.1115, cr_loss=0.3841, over 34370.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.26, pruned_loss=0.05239, ctc_loss=0.1134, cr_loss=0.387, over 6762002.06 frames. ], batch size: 91, lr: 2.64e-03, grad_scale: 16.0 2024-09-20 03:13:04,046 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.57 vs. limit=12.0 2024-09-20 03:13:18,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=808122.0, ans=0.2 2024-09-20 03:14:16,661 INFO [train.py:1198] (1/2) Epoch 45, batch 2650, loss[loss=0.2196, simple_loss=0.2779, pruned_loss=0.05976, ctc_loss=0.1257, cr_loss=0.4163, over 34202.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2603, pruned_loss=0.05233, ctc_loss=0.1133, cr_loss=0.3872, over 6769244.96 frames. ], batch size: 117, lr: 2.64e-03, grad_scale: 16.0 2024-09-20 03:14:20,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=808308.6666666666, ans=0.125 2024-09-20 03:14:21,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=808308.6666666666, ans=0.0 2024-09-20 03:14:23,063 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.197e+02 2.567e+02 2.785e+02 3.510e+02 5.177e+02, threshold=5.570e+02, percent-clipped=0.0 2024-09-20 03:14:41,871 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=808355.3333333334, ans=6.0 2024-09-20 03:14:44,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=808355.3333333334, ans=0.125 2024-09-20 03:14:47,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=808402.0, ans=0.125 2024-09-20 03:14:59,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=808402.0, ans=0.05 2024-09-20 03:14:59,200 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=808402.0, ans=0.1 2024-09-20 03:15:28,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=808495.3333333334, ans=0.2 2024-09-20 03:15:37,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=808495.3333333334, ans=0.025 2024-09-20 03:15:40,027 INFO [train.py:1198] (1/2) Epoch 45, batch 2700, loss[loss=0.209, simple_loss=0.2727, pruned_loss=0.05345, ctc_loss=0.1146, cr_loss=0.3854, over 34616.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2608, pruned_loss=0.05247, ctc_loss=0.1136, cr_loss=0.3877, over 6764815.60 frames. ], batch size: 102, lr: 2.64e-03, grad_scale: 16.0 2024-09-20 03:15:45,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=808542.0, ans=0.125 2024-09-20 03:15:45,842 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=808542.0, ans=15.0 2024-09-20 03:15:53,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=808542.0, ans=0.125 2024-09-20 03:16:06,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=808588.6666666666, ans=0.125 2024-09-20 03:16:13,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=808635.3333333334, ans=0.125 2024-09-20 03:16:41,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808682.0, ans=0.1 2024-09-20 03:16:44,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=808728.6666666666, ans=0.0 2024-09-20 03:16:53,367 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.75 vs. limit=15.0 2024-09-20 03:17:02,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=808775.3333333334, ans=0.09899494936611666 2024-09-20 03:17:04,184 INFO [train.py:1198] (1/2) Epoch 45, batch 2750, loss[loss=0.2057, simple_loss=0.2561, pruned_loss=0.05793, ctc_loss=0.1175, cr_loss=0.3983, over 34628.00 frames. ], tot_loss[loss=0.2011, simple_loss=0.2598, pruned_loss=0.05219, ctc_loss=0.113, cr_loss=0.386, over 6762117.90 frames. ], batch size: 88, lr: 2.63e-03, grad_scale: 16.0 2024-09-20 03:17:08,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=808775.3333333334, ans=0.1 2024-09-20 03:17:09,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=808775.3333333334, ans=0.125 2024-09-20 03:17:10,739 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.136e+02 2.527e+02 2.895e+02 3.589e+02 5.153e+02, threshold=5.791e+02, percent-clipped=0.0 2024-09-20 03:17:25,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=808822.0, ans=0.0 2024-09-20 03:18:26,485 INFO [train.py:1198] (1/2) Epoch 45, batch 2800, loss[loss=0.2275, simple_loss=0.2787, pruned_loss=0.06556, ctc_loss=0.1415, cr_loss=0.4209, over 23721.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2601, pruned_loss=0.05243, ctc_loss=0.1134, cr_loss=0.3869, over 6740408.53 frames. ], batch size: 244, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:19:05,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=809102.0, ans=0.125 2024-09-20 03:19:06,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=809102.0, ans=0.2 2024-09-20 03:19:49,201 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2024-09-20 03:19:51,242 INFO [train.py:1198] (1/2) Epoch 45, batch 2850, loss[loss=0.1973, simple_loss=0.2527, pruned_loss=0.05236, ctc_loss=0.112, cr_loss=0.3726, over 34477.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2602, pruned_loss=0.05243, ctc_loss=0.1135, cr_loss=0.387, over 6724289.12 frames. ], batch size: 90, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:19:57,644 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.122e+02 2.683e+02 3.174e+02 3.689e+02 6.821e+02, threshold=6.347e+02, percent-clipped=1.0 2024-09-20 03:20:37,615 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=809335.3333333334, ans=0.0 2024-09-20 03:20:42,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=809382.0, ans=10.0 2024-09-20 03:20:47,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=809382.0, ans=0.1 2024-09-20 03:21:14,709 INFO [train.py:1198] (1/2) Epoch 45, batch 2900, loss[loss=0.2064, simple_loss=0.262, pruned_loss=0.05526, ctc_loss=0.1189, cr_loss=0.414, over 34514.00 frames. ], tot_loss[loss=0.2027, simple_loss=0.2614, pruned_loss=0.05276, ctc_loss=0.1142, cr_loss=0.3891, over 6754927.02 frames. ], batch size: 94, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:21:14,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=809475.3333333334, ans=0.125 2024-09-20 03:21:18,720 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.90 vs. limit=15.0 2024-09-20 03:21:38,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.54 vs. limit=12.0 2024-09-20 03:21:45,391 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=18.59 vs. limit=22.5 2024-09-20 03:21:59,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=809568.6666666666, ans=0.125 2024-09-20 03:22:36,974 INFO [train.py:1198] (1/2) Epoch 45, batch 2950, loss[loss=0.2017, simple_loss=0.2556, pruned_loss=0.0544, ctc_loss=0.1155, cr_loss=0.3963, over 34627.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2601, pruned_loss=0.0524, ctc_loss=0.1135, cr_loss=0.3864, over 6748989.18 frames. ], batch size: 88, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:22:42,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=809708.6666666666, ans=0.125 2024-09-20 03:22:43,513 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=15.0 2024-09-20 03:22:45,579 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.173e+02 2.676e+02 3.060e+02 3.776e+02 6.541e+02, threshold=6.119e+02, percent-clipped=1.0 2024-09-20 03:23:22,083 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=809802.0, ans=0.0 2024-09-20 03:23:40,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=809848.6666666666, ans=0.05 2024-09-20 03:23:43,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=809895.3333333334, ans=0.1 2024-09-20 03:23:46,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=809895.3333333334, ans=0.125 2024-09-20 03:23:48,397 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff2_skip_rate, batch_count=809895.3333333334, ans=0.0 2024-09-20 03:24:01,098 INFO [train.py:1198] (1/2) Epoch 45, batch 3000, loss[loss=0.201, simple_loss=0.2609, pruned_loss=0.05122, ctc_loss=0.1134, cr_loss=0.3983, over 34533.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.2597, pruned_loss=0.05235, ctc_loss=0.1134, cr_loss=0.3865, over 6750668.16 frames. ], batch size: 94, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:24:01,098 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 03:24:18,261 INFO [train.py:1230] (1/2) Epoch 45, validation: loss=0.1484, simple_loss=0.2416, pruned_loss=0.02375, ctc_loss=0.03882, cr_loss=2.267e-14, over 944034.00 frames. 2024-09-20 03:24:18,261 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-20 03:24:44,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=809988.6666666666, ans=0.0 2024-09-20 03:25:09,789 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.20 vs. limit=15.0 2024-09-20 03:25:16,551 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=12.0 2024-09-20 03:25:22,286 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:25:23,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=810128.6666666666, ans=0.125 2024-09-20 03:25:33,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=810128.6666666666, ans=0.04949747468305833 2024-09-20 03:25:40,031 INFO [train.py:1198] (1/2) Epoch 45, batch 3050, loss[loss=0.1914, simple_loss=0.2495, pruned_loss=0.04846, ctc_loss=0.1065, cr_loss=0.3758, over 34585.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2602, pruned_loss=0.05235, ctc_loss=0.1134, cr_loss=0.3864, over 6742406.13 frames. ], batch size: 89, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:25:46,548 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.147e+02 2.435e+02 2.766e+02 3.321e+02 5.595e+02, threshold=5.532e+02, percent-clipped=0.0 2024-09-20 03:25:50,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=810175.3333333334, ans=0.2 2024-09-20 03:26:04,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=810222.0, ans=0.125 2024-09-20 03:26:17,402 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=810268.6666666666, ans=0.05 2024-09-20 03:26:24,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=810268.6666666666, ans=0.2 2024-09-20 03:26:36,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=810315.3333333334, ans=0.1 2024-09-20 03:26:38,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=810315.3333333334, ans=0.0 2024-09-20 03:26:54,934 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.48 vs. limit=15.0 2024-09-20 03:27:00,409 INFO [train.py:1198] (1/2) Epoch 45, batch 3100, loss[loss=0.2158, simple_loss=0.2796, pruned_loss=0.05567, ctc_loss=0.1221, cr_loss=0.407, over 34222.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2602, pruned_loss=0.05242, ctc_loss=0.1137, cr_loss=0.3868, over 6741884.73 frames. ], batch size: 117, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:27:05,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=810408.6666666666, ans=0.95 2024-09-20 03:27:46,871 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.24 vs. limit=6.0 2024-09-20 03:28:04,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=810548.6666666666, ans=0.0 2024-09-20 03:28:04,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=810548.6666666666, ans=0.125 2024-09-20 03:28:05,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=810595.3333333334, ans=0.125 2024-09-20 03:28:15,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=810595.3333333334, ans=0.0 2024-09-20 03:28:23,303 INFO [train.py:1198] (1/2) Epoch 45, batch 3150, loss[loss=0.2097, simple_loss=0.2705, pruned_loss=0.05471, ctc_loss=0.119, cr_loss=0.3909, over 33757.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.26, pruned_loss=0.05235, ctc_loss=0.1136, cr_loss=0.3866, over 6748266.93 frames. ], batch size: 122, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:28:29,724 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.098e+02 2.636e+02 3.009e+02 3.518e+02 6.479e+02, threshold=6.018e+02, percent-clipped=5.0 2024-09-20 03:28:39,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=810688.6666666666, ans=0.0 2024-09-20 03:28:47,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=810688.6666666666, ans=0.125 2024-09-20 03:29:02,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=810735.3333333334, ans=0.07 2024-09-20 03:29:26,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=810828.6666666666, ans=0.125 2024-09-20 03:29:28,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=810828.6666666666, ans=0.2 2024-09-20 03:29:43,951 INFO [train.py:1198] (1/2) Epoch 45, batch 3200, loss[loss=0.1983, simple_loss=0.2607, pruned_loss=0.04957, ctc_loss=0.1074, cr_loss=0.3793, over 34528.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2595, pruned_loss=0.05208, ctc_loss=0.113, cr_loss=0.3856, over 6762131.06 frames. ], batch size: 94, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:29:46,569 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2024-09-20 03:29:48,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=810875.3333333334, ans=0.0 2024-09-20 03:29:50,597 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=810875.3333333334, ans=0.125 2024-09-20 03:30:14,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=810922.0, ans=0.0 2024-09-20 03:30:35,808 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=811015.3333333334, ans=0.025 2024-09-20 03:30:43,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=811015.3333333334, ans=0.0 2024-09-20 03:30:45,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=811015.3333333334, ans=0.125 2024-09-20 03:30:58,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=811062.0, ans=0.1 2024-09-20 03:31:00,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=811062.0, ans=0.0 2024-09-20 03:31:06,204 INFO [train.py:1198] (1/2) Epoch 45, batch 3250, loss[loss=0.2057, simple_loss=0.2675, pruned_loss=0.05259, ctc_loss=0.1144, cr_loss=0.3957, over 34654.00 frames. ], tot_loss[loss=0.2013, simple_loss=0.26, pruned_loss=0.05224, ctc_loss=0.1133, cr_loss=0.3865, over 6771507.44 frames. ], batch size: 98, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:31:06,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=811108.6666666666, ans=0.1 2024-09-20 03:31:10,097 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.36 vs. limit=10.0 2024-09-20 03:31:12,530 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.211e+02 2.585e+02 2.870e+02 3.525e+02 5.797e+02, threshold=5.741e+02, percent-clipped=0.0 2024-09-20 03:31:12,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=811108.6666666666, ans=0.0 2024-09-20 03:31:23,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=811155.3333333334, ans=0.1 2024-09-20 03:31:40,777 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.34 vs. limit=15.0 2024-09-20 03:32:13,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=811295.3333333334, ans=0.025 2024-09-20 03:32:21,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=811295.3333333334, ans=0.2 2024-09-20 03:32:22,316 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2024-09-20 03:32:26,444 INFO [train.py:1198] (1/2) Epoch 45, batch 3300, loss[loss=0.2003, simple_loss=0.2688, pruned_loss=0.04789, ctc_loss=0.1063, cr_loss=0.369, over 33156.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2589, pruned_loss=0.05188, ctc_loss=0.1126, cr_loss=0.3843, over 6769102.14 frames. ], batch size: 130, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:32:49,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=811388.6666666666, ans=0.0 2024-09-20 03:33:48,026 INFO [train.py:1198] (1/2) Epoch 45, batch 3350, loss[loss=0.2122, simple_loss=0.2717, pruned_loss=0.05623, ctc_loss=0.1202, cr_loss=0.4077, over 33874.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2596, pruned_loss=0.05211, ctc_loss=0.113, cr_loss=0.3857, over 6743592.65 frames. ], batch size: 122, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:33:51,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=811575.3333333334, ans=0.0 2024-09-20 03:33:54,468 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.202e+02 2.498e+02 2.730e+02 3.113e+02 5.442e+02, threshold=5.461e+02, percent-clipped=0.0 2024-09-20 03:34:07,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=811622.0, ans=0.125 2024-09-20 03:34:17,913 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=7.62 vs. limit=15.0 2024-09-20 03:34:22,340 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=811668.6666666666, ans=0.125 2024-09-20 03:34:36,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=811715.3333333334, ans=0.125 2024-09-20 03:34:49,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=811715.3333333334, ans=0.125 2024-09-20 03:35:09,729 INFO [train.py:1198] (1/2) Epoch 45, batch 3400, loss[loss=0.1847, simple_loss=0.2409, pruned_loss=0.04739, ctc_loss=0.09923, cr_loss=0.3477, over 34151.00 frames. ], tot_loss[loss=0.201, simple_loss=0.2596, pruned_loss=0.05212, ctc_loss=0.1131, cr_loss=0.3855, over 6733206.10 frames. ], batch size: 78, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:35:39,414 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.91 vs. limit=10.0 2024-09-20 03:35:46,941 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:35:46,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=811902.0, ans=0.0 2024-09-20 03:35:53,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=811902.0, ans=0.125 2024-09-20 03:36:03,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=811948.6666666666, ans=0.2 2024-09-20 03:36:09,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=811948.6666666666, ans=0.07 2024-09-20 03:36:11,179 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=811948.6666666666, ans=0.125 2024-09-20 03:36:29,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.40 vs. limit=12.0 2024-09-20 03:36:30,468 INFO [train.py:1198] (1/2) Epoch 45, batch 3450, loss[loss=0.2123, simple_loss=0.2766, pruned_loss=0.05403, ctc_loss=0.1189, cr_loss=0.404, over 33102.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2602, pruned_loss=0.0523, ctc_loss=0.1134, cr_loss=0.3866, over 6745390.76 frames. ], batch size: 130, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:36:34,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.39 vs. limit=15.0 2024-09-20 03:36:36,838 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.140e+02 2.500e+02 2.781e+02 3.480e+02 6.806e+02, threshold=5.562e+02, percent-clipped=2.0 2024-09-20 03:36:54,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=812088.6666666666, ans=0.0 2024-09-20 03:37:09,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=812135.3333333334, ans=0.125 2024-09-20 03:37:10,726 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=812135.3333333334, ans=0.0 2024-09-20 03:37:30,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=812182.0, ans=0.0 2024-09-20 03:37:50,362 INFO [train.py:1198] (1/2) Epoch 45, batch 3500, loss[loss=0.1785, simple_loss=0.2378, pruned_loss=0.0431, ctc_loss=0.09569, cr_loss=0.3453, over 34469.00 frames. ], tot_loss[loss=0.201, simple_loss=0.2596, pruned_loss=0.05214, ctc_loss=0.1132, cr_loss=0.3858, over 6747373.03 frames. ], batch size: 85, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:38:04,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=812275.3333333334, ans=0.125 2024-09-20 03:38:34,378 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.88 vs. limit=15.0 2024-09-20 03:38:56,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=812462.0, ans=0.0 2024-09-20 03:39:01,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=812462.0, ans=0.0 2024-09-20 03:39:04,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=812462.0, ans=0.125 2024-09-20 03:39:11,652 INFO [train.py:1198] (1/2) Epoch 45, batch 3550, loss[loss=0.2059, simple_loss=0.27, pruned_loss=0.05153, ctc_loss=0.1143, cr_loss=0.3951, over 34397.00 frames. ], tot_loss[loss=0.2012, simple_loss=0.2598, pruned_loss=0.05224, ctc_loss=0.1132, cr_loss=0.3861, over 6756909.04 frames. ], batch size: 103, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:39:16,185 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:39:18,894 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.194e+02 2.538e+02 3.253e+02 4.283e+02 8.049e+02, threshold=6.506e+02, percent-clipped=6.0 2024-09-20 03:39:47,020 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.15 vs. limit=6.0 2024-09-20 03:39:48,436 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.68 vs. limit=22.5 2024-09-20 03:39:54,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=812602.0, ans=0.125 2024-09-20 03:40:03,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=812648.6666666666, ans=0.2 2024-09-20 03:40:07,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=812648.6666666666, ans=0.0 2024-09-20 03:40:15,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.63 vs. limit=15.0 2024-09-20 03:40:31,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=812742.0, ans=0.1 2024-09-20 03:40:32,582 INFO [train.py:1198] (1/2) Epoch 45, batch 3600, loss[loss=0.1821, simple_loss=0.2434, pruned_loss=0.04348, ctc_loss=0.09954, cr_loss=0.3499, over 34470.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2601, pruned_loss=0.05228, ctc_loss=0.1133, cr_loss=0.3866, over 6766631.84 frames. ], batch size: 90, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:40:32,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=812742.0, ans=0.0 2024-09-20 03:40:48,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.71 vs. limit=10.0 2024-09-20 03:41:00,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=812788.6666666666, ans=0.2 2024-09-20 03:41:21,236 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=812882.0, ans=0.0 2024-09-20 03:41:24,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=812882.0, ans=0.025 2024-09-20 03:41:42,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=812928.6666666666, ans=0.0 2024-09-20 03:41:54,163 INFO [train.py:1198] (1/2) Epoch 45, batch 3650, loss[loss=0.2067, simple_loss=0.2662, pruned_loss=0.05336, ctc_loss=0.1186, cr_loss=0.4159, over 34439.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2591, pruned_loss=0.05169, ctc_loss=0.1123, cr_loss=0.3843, over 6769193.76 frames. ], batch size: 110, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:42:00,458 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.094e+02 2.533e+02 2.949e+02 3.756e+02 8.173e+02, threshold=5.897e+02, percent-clipped=4.0 2024-09-20 03:42:16,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=813022.0, ans=0.125 2024-09-20 03:42:16,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=813022.0, ans=0.125 2024-09-20 03:42:28,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=813068.6666666666, ans=0.0 2024-09-20 03:42:43,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=813115.3333333334, ans=0.0 2024-09-20 03:42:47,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=813115.3333333334, ans=0.1 2024-09-20 03:43:11,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=813162.0, ans=0.125 2024-09-20 03:43:14,584 INFO [train.py:1198] (1/2) Epoch 45, batch 3700, loss[loss=0.1913, simple_loss=0.2534, pruned_loss=0.04665, ctc_loss=0.106, cr_loss=0.3669, over 34622.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2591, pruned_loss=0.05153, ctc_loss=0.1121, cr_loss=0.3839, over 6782916.19 frames. ], batch size: 102, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:43:22,337 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.51 vs. limit=22.5 2024-09-20 03:43:23,047 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=813208.6666666666, ans=0.2 2024-09-20 03:43:23,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=813208.6666666666, ans=0.125 2024-09-20 03:43:43,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=813255.3333333334, ans=0.125 2024-09-20 03:43:58,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=813302.0, ans=0.125 2024-09-20 03:44:08,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=813348.6666666666, ans=0.05 2024-09-20 03:44:25,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=813395.3333333334, ans=0.125 2024-09-20 03:44:35,039 INFO [train.py:1198] (1/2) Epoch 45, batch 3750, loss[loss=0.2123, simple_loss=0.2715, pruned_loss=0.05636, ctc_loss=0.1232, cr_loss=0.3959, over 34406.00 frames. ], tot_loss[loss=0.2031, simple_loss=0.2623, pruned_loss=0.05275, ctc_loss=0.1143, cr_loss=0.3894, over 6785396.84 frames. ], batch size: 113, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:44:37,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=813442.0, ans=0.2 2024-09-20 03:44:41,341 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.097e+02 2.452e+02 2.613e+02 2.932e+02 8.705e+02, threshold=5.226e+02, percent-clipped=1.0 2024-09-20 03:45:00,747 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.59 vs. limit=15.0 2024-09-20 03:45:01,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward2.hidden_balancer.prob, batch_count=813488.6666666666, ans=0.125 2024-09-20 03:45:05,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=813488.6666666666, ans=0.0 2024-09-20 03:45:11,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=813535.3333333334, ans=0.125 2024-09-20 03:45:13,039 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=813535.3333333334, ans=0.1 2024-09-20 03:45:16,294 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=813535.3333333334, ans=0.07 2024-09-20 03:45:23,235 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=813582.0, ans=0.125 2024-09-20 03:45:29,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=813582.0, ans=0.125 2024-09-20 03:45:56,595 INFO [train.py:1198] (1/2) Epoch 45, batch 3800, loss[loss=0.2242, simple_loss=0.2732, pruned_loss=0.06498, ctc_loss=0.1414, cr_loss=0.4248, over 29940.00 frames. ], tot_loss[loss=0.2059, simple_loss=0.2647, pruned_loss=0.05395, ctc_loss=0.1167, cr_loss=0.395, over 6676684.02 frames. ], batch size: 175, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:46:16,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=813722.0, ans=0.1 2024-09-20 03:46:29,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=813768.6666666666, ans=0.125 2024-09-20 03:46:35,754 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.27 vs. limit=15.0 2024-09-20 03:46:50,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=813815.3333333334, ans=0.05 2024-09-20 03:46:51,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=813815.3333333334, ans=0.2 2024-09-20 03:47:13,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=813862.0, ans=0.125 2024-09-20 03:47:15,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=813862.0, ans=0.1 2024-09-20 03:47:19,893 INFO [train.py:1198] (1/2) Epoch 45, batch 3850, loss[loss=0.2238, simple_loss=0.2779, pruned_loss=0.06272, ctc_loss=0.1372, cr_loss=0.4215, over 23972.00 frames. ], tot_loss[loss=0.2086, simple_loss=0.2665, pruned_loss=0.05539, ctc_loss=0.1197, cr_loss=0.3986, over 6251590.05 frames. ], batch size: 244, lr: 2.63e-03, grad_scale: 32.0 2024-09-20 03:47:26,544 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.219e+02 2.473e+02 2.726e+02 2.930e+02 8.261e+02, threshold=5.453e+02, percent-clipped=1.0 2024-09-20 03:47:42,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.69 vs. limit=22.5 2024-09-20 03:47:50,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=813955.3333333334, ans=0.125 2024-09-20 03:48:35,288 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:48:52,032 INFO [train.py:1198] (1/2) Epoch 46, batch 0, loss[loss=0.1816, simple_loss=0.2403, pruned_loss=0.04471, ctc_loss=0.09689, cr_loss=0.3515, over 34466.00 frames. ], tot_loss[loss=0.1816, simple_loss=0.2403, pruned_loss=0.04471, ctc_loss=0.09689, cr_loss=0.3515, over 34466.00 frames. ], batch size: 85, lr: 2.60e-03, grad_scale: 32.0 2024-09-20 03:48:52,032 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 03:49:08,800 INFO [train.py:1230] (1/2) Epoch 46, validation: loss=0.1484, simple_loss=0.2426, pruned_loss=0.02326, ctc_loss=0.03851, cr_loss=2.273e-14, over 944034.00 frames. 2024-09-20 03:49:08,800 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-20 03:49:52,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=814128.0, ans=0.125 2024-09-20 03:50:35,272 INFO [train.py:1198] (1/2) Epoch 46, batch 50, loss[loss=0.1657, simple_loss=0.2256, pruned_loss=0.03779, ctc_loss=0.08726, cr_loss=0.3203, over 34489.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2602, pruned_loss=0.05227, ctc_loss=0.1132, cr_loss=0.388, over 1481053.41 frames. ], batch size: 82, lr: 2.60e-03, grad_scale: 32.0 2024-09-20 03:50:48,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=814268.0, ans=0.0 2024-09-20 03:50:49,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2024-09-20 03:51:19,759 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.232e+02 2.607e+02 2.872e+02 3.150e+02 5.659e+02, threshold=5.743e+02, percent-clipped=1.0 2024-09-20 03:51:21,853 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=814361.3333333334, ans=0.125 2024-09-20 03:51:35,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=814408.0, ans=0.125 2024-09-20 03:51:38,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=814408.0, ans=0.2 2024-09-20 03:51:49,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=814454.6666666666, ans=0.1 2024-09-20 03:51:54,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=814454.6666666666, ans=0.2 2024-09-20 03:51:56,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=814501.3333333334, ans=0.0 2024-09-20 03:51:57,359 INFO [train.py:1198] (1/2) Epoch 46, batch 100, loss[loss=0.1943, simple_loss=0.2533, pruned_loss=0.04982, ctc_loss=0.1068, cr_loss=0.3562, over 34606.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.2623, pruned_loss=0.05307, ctc_loss=0.1148, cr_loss=0.3917, over 2630896.24 frames. ], batch size: 89, lr: 2.60e-03, grad_scale: 32.0 2024-09-20 03:52:04,042 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=814501.3333333334, ans=0.0 2024-09-20 03:52:13,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=814548.0, ans=0.125 2024-09-20 03:52:30,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=814594.6666666666, ans=0.0 2024-09-20 03:52:36,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=814594.6666666666, ans=0.125 2024-09-20 03:52:38,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=814594.6666666666, ans=0.04949747468305833 2024-09-20 03:52:45,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=814641.3333333334, ans=0.125 2024-09-20 03:53:19,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=814734.6666666666, ans=0.1 2024-09-20 03:53:20,427 INFO [train.py:1198] (1/2) Epoch 46, batch 150, loss[loss=0.1902, simple_loss=0.2426, pruned_loss=0.05059, ctc_loss=0.1052, cr_loss=0.3883, over 34518.00 frames. ], tot_loss[loss=0.2015, simple_loss=0.2605, pruned_loss=0.0522, ctc_loss=0.1131, cr_loss=0.3874, over 3558520.69 frames. ], batch size: 82, lr: 2.60e-03, grad_scale: 32.0 2024-09-20 03:53:30,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=814734.6666666666, ans=0.0 2024-09-20 03:53:42,089 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=814781.3333333334, ans=0.1 2024-09-20 03:53:44,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814781.3333333334, ans=0.1 2024-09-20 03:53:44,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=814781.3333333334, ans=0.0 2024-09-20 03:53:47,868 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.63 vs. limit=10.0 2024-09-20 03:53:52,075 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2024-09-20 03:54:00,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=814828.0, ans=0.125 2024-09-20 03:54:02,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=814828.0, ans=0.125 2024-09-20 03:54:07,184 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 2.578e+02 3.001e+02 3.563e+02 6.673e+02, threshold=6.002e+02, percent-clipped=3.0 2024-09-20 03:54:17,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=814874.6666666666, ans=0.2 2024-09-20 03:54:22,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=814874.6666666666, ans=0.025 2024-09-20 03:54:33,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=814921.3333333334, ans=0.1 2024-09-20 03:54:36,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=814921.3333333334, ans=0.025 2024-09-20 03:54:44,484 INFO [train.py:1198] (1/2) Epoch 46, batch 200, loss[loss=0.2089, simple_loss=0.2658, pruned_loss=0.05622, ctc_loss=0.1202, cr_loss=0.3892, over 32184.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2592, pruned_loss=0.05185, ctc_loss=0.1124, cr_loss=0.3855, over 4271286.63 frames. ], batch size: 145, lr: 2.60e-03, grad_scale: 32.0 2024-09-20 03:54:55,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.62 vs. limit=15.0 2024-09-20 03:55:06,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=815014.6666666666, ans=0.125 2024-09-20 03:55:06,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=815014.6666666666, ans=0.125 2024-09-20 03:55:14,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=815014.6666666666, ans=0.2 2024-09-20 03:55:19,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=815061.3333333334, ans=0.125 2024-09-20 03:55:40,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=815108.0, ans=0.2 2024-09-20 03:55:50,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=815154.6666666666, ans=0.125 2024-09-20 03:55:55,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=815154.6666666666, ans=0.0 2024-09-20 03:56:02,403 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=22.5 2024-09-20 03:56:06,461 INFO [train.py:1198] (1/2) Epoch 46, batch 250, loss[loss=0.214, simple_loss=0.2739, pruned_loss=0.05664, ctc_loss=0.1243, cr_loss=0.3997, over 34219.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2592, pruned_loss=0.05172, ctc_loss=0.1122, cr_loss=0.3849, over 4833407.28 frames. ], batch size: 117, lr: 2.60e-03, grad_scale: 32.0 2024-09-20 03:56:31,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=815248.0, ans=0.125 2024-09-20 03:56:43,082 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=815294.6666666666, ans=0.0 2024-09-20 03:56:43,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=815294.6666666666, ans=0.125 2024-09-20 03:56:50,912 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.116e+02 2.568e+02 3.204e+02 4.262e+02 8.634e+02, threshold=6.407e+02, percent-clipped=5.0 2024-09-20 03:56:52,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=815294.6666666666, ans=0.125 2024-09-20 03:57:07,981 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:57:30,380 INFO [train.py:1198] (1/2) Epoch 46, batch 300, loss[loss=0.2118, simple_loss=0.2738, pruned_loss=0.055, ctc_loss=0.1197, cr_loss=0.396, over 34332.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2588, pruned_loss=0.0517, ctc_loss=0.1121, cr_loss=0.3846, over 5261298.70 frames. ], batch size: 107, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 03:57:32,381 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 03:58:16,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.04 vs. limit=15.0 2024-09-20 03:58:26,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.04 vs. limit=15.0 2024-09-20 03:58:51,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=815621.3333333334, ans=0.125 2024-09-20 03:58:54,669 INFO [train.py:1198] (1/2) Epoch 46, batch 350, loss[loss=0.1729, simple_loss=0.2347, pruned_loss=0.03979, ctc_loss=0.09212, cr_loss=0.3281, over 34265.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2592, pruned_loss=0.05185, ctc_loss=0.1123, cr_loss=0.3856, over 5596792.87 frames. ], batch size: 83, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 03:59:04,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=815668.0, ans=0.125 2024-09-20 03:59:11,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=815714.6666666666, ans=0.07 2024-09-20 03:59:21,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=815714.6666666666, ans=0.2 2024-09-20 03:59:39,136 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.022e+02 2.525e+02 2.828e+02 3.444e+02 5.050e+02, threshold=5.656e+02, percent-clipped=0.0 2024-09-20 03:59:56,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=815808.0, ans=0.0 2024-09-20 03:59:59,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=815854.6666666666, ans=0.125 2024-09-20 04:00:16,936 INFO [train.py:1198] (1/2) Epoch 46, batch 400, loss[loss=0.2118, simple_loss=0.2714, pruned_loss=0.05566, ctc_loss=0.1223, cr_loss=0.4126, over 34405.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2587, pruned_loss=0.0515, ctc_loss=0.1118, cr_loss=0.3846, over 5863767.04 frames. ], batch size: 95, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:00:37,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=815948.0, ans=0.125 2024-09-20 04:01:34,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=816088.0, ans=0.1 2024-09-20 04:01:42,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.81 vs. limit=15.0 2024-09-20 04:01:44,326 INFO [train.py:1198] (1/2) Epoch 46, batch 450, loss[loss=0.2033, simple_loss=0.2642, pruned_loss=0.05203, ctc_loss=0.1138, cr_loss=0.3909, over 34696.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.2588, pruned_loss=0.05156, ctc_loss=0.112, cr_loss=0.3845, over 6054151.14 frames. ], batch size: 97, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:01:49,948 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.51 vs. limit=12.0 2024-09-20 04:01:59,445 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=816181.3333333334, ans=0.0 2024-09-20 04:01:59,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=816181.3333333334, ans=0.125 2024-09-20 04:02:20,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=816228.0, ans=0.0 2024-09-20 04:02:22,541 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=816228.0, ans=0.2 2024-09-20 04:02:28,823 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.107e+02 2.543e+02 2.888e+02 3.759e+02 5.838e+02, threshold=5.775e+02, percent-clipped=1.0 2024-09-20 04:02:57,435 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=816321.3333333334, ans=0.0 2024-09-20 04:03:03,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=816321.3333333334, ans=0.0 2024-09-20 04:03:06,855 INFO [train.py:1198] (1/2) Epoch 46, batch 500, loss[loss=0.2197, simple_loss=0.2786, pruned_loss=0.05966, ctc_loss=0.125, cr_loss=0.4131, over 34455.00 frames. ], tot_loss[loss=0.1995, simple_loss=0.2583, pruned_loss=0.05146, ctc_loss=0.1117, cr_loss=0.3838, over 6221164.15 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:03:07,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=816368.0, ans=0.125 2024-09-20 04:03:13,828 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=816368.0, ans=0.0 2024-09-20 04:03:16,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=816368.0, ans=0.0 2024-09-20 04:03:28,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=816414.6666666666, ans=0.2 2024-09-20 04:03:35,278 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.18 vs. limit=22.5 2024-09-20 04:03:42,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=816461.3333333334, ans=0.07 2024-09-20 04:03:51,363 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=816461.3333333334, ans=0.0 2024-09-20 04:03:59,959 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.64 vs. limit=6.0 2024-09-20 04:04:09,593 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=816508.0, ans=0.1 2024-09-20 04:04:15,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=816554.6666666666, ans=0.125 2024-09-20 04:04:22,560 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=816554.6666666666, ans=0.0 2024-09-20 04:04:30,692 INFO [train.py:1198] (1/2) Epoch 46, batch 550, loss[loss=0.2139, simple_loss=0.279, pruned_loss=0.0546, ctc_loss=0.1201, cr_loss=0.3913, over 33935.00 frames. ], tot_loss[loss=0.1996, simple_loss=0.2585, pruned_loss=0.05146, ctc_loss=0.1118, cr_loss=0.3842, over 6328711.63 frames. ], batch size: 122, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:04:33,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.84 vs. limit=10.0 2024-09-20 04:04:40,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=816601.3333333334, ans=0.0 2024-09-20 04:04:45,971 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=816648.0, ans=0.0 2024-09-20 04:05:10,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=816694.6666666666, ans=0.1 2024-09-20 04:05:15,231 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.302e+02 2.579e+02 2.752e+02 3.373e+02 6.536e+02, threshold=5.504e+02, percent-clipped=2.0 2024-09-20 04:05:17,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=816694.6666666666, ans=10.0 2024-09-20 04:05:39,020 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=816788.0, ans=0.125 2024-09-20 04:05:47,383 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=816788.0, ans=0.125 2024-09-20 04:05:54,988 INFO [train.py:1198] (1/2) Epoch 46, batch 600, loss[loss=0.2129, simple_loss=0.2713, pruned_loss=0.05716, ctc_loss=0.1188, cr_loss=0.4086, over 34244.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.2588, pruned_loss=0.05155, ctc_loss=0.112, cr_loss=0.3849, over 6432456.74 frames. ], batch size: 117, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:05:58,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=816834.6666666666, ans=0.125 2024-09-20 04:06:13,301 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=816881.3333333334, ans=0.125 2024-09-20 04:06:27,143 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=15.0 2024-09-20 04:07:00,567 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=817021.3333333334, ans=0.0 2024-09-20 04:07:05,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=817021.3333333334, ans=0.125 2024-09-20 04:07:13,491 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=817021.3333333334, ans=0.0 2024-09-20 04:07:14,380 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-09-20 04:07:16,479 INFO [train.py:1198] (1/2) Epoch 46, batch 650, loss[loss=0.1826, simple_loss=0.2418, pruned_loss=0.04462, ctc_loss=0.1, cr_loss=0.3566, over 34523.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2582, pruned_loss=0.05126, ctc_loss=0.1115, cr_loss=0.383, over 6523643.36 frames. ], batch size: 94, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:07:26,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=817068.0, ans=0.1 2024-09-20 04:07:26,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=817068.0, ans=0.125 2024-09-20 04:07:42,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.64 vs. limit=15.0 2024-09-20 04:07:44,759 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=817114.6666666666, ans=0.125 2024-09-20 04:08:00,738 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.128e+02 2.484e+02 2.803e+02 3.398e+02 8.407e+02, threshold=5.606e+02, percent-clipped=4.0 2024-09-20 04:08:06,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=817208.0, ans=0.0 2024-09-20 04:08:20,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=817208.0, ans=0.1 2024-09-20 04:08:40,434 INFO [train.py:1198] (1/2) Epoch 46, batch 700, loss[loss=0.2043, simple_loss=0.2617, pruned_loss=0.05391, ctc_loss=0.1146, cr_loss=0.4029, over 34589.00 frames. ], tot_loss[loss=0.1996, simple_loss=0.2587, pruned_loss=0.05139, ctc_loss=0.1118, cr_loss=0.384, over 6580015.00 frames. ], batch size: 89, lr: 2.59e-03, grad_scale: 64.0 2024-09-20 04:08:47,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=817301.3333333334, ans=0.125 2024-09-20 04:09:15,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=817394.6666666666, ans=0.0 2024-09-20 04:09:17,722 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=768, metric=17.06 vs. limit=22.5 2024-09-20 04:09:28,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=817394.6666666666, ans=0.125 2024-09-20 04:09:40,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=817441.3333333334, ans=0.0 2024-09-20 04:09:42,123 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:09:46,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=817488.0, ans=0.125 2024-09-20 04:09:52,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.61 vs. limit=15.0 2024-09-20 04:10:01,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=817488.0, ans=0.1 2024-09-20 04:10:04,449 INFO [train.py:1198] (1/2) Epoch 46, batch 750, loss[loss=0.1898, simple_loss=0.2556, pruned_loss=0.04516, ctc_loss=0.09918, cr_loss=0.3432, over 34401.00 frames. ], tot_loss[loss=0.1993, simple_loss=0.2585, pruned_loss=0.05126, ctc_loss=0.1115, cr_loss=0.3835, over 6624452.77 frames. ], batch size: 95, lr: 2.59e-03, grad_scale: 64.0 2024-09-20 04:10:08,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=817534.6666666666, ans=0.0 2024-09-20 04:10:43,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=817628.0, ans=0.0 2024-09-20 04:10:50,255 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.248e+02 2.556e+02 2.971e+02 3.687e+02 6.499e+02, threshold=5.943e+02, percent-clipped=2.0 2024-09-20 04:10:59,615 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.64 vs. limit=12.0 2024-09-20 04:11:20,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=817721.3333333334, ans=0.0 2024-09-20 04:11:26,400 INFO [train.py:1198] (1/2) Epoch 46, batch 800, loss[loss=0.1769, simple_loss=0.2372, pruned_loss=0.04215, ctc_loss=0.09475, cr_loss=0.3364, over 34486.00 frames. ], tot_loss[loss=0.1996, simple_loss=0.2586, pruned_loss=0.05141, ctc_loss=0.1118, cr_loss=0.3843, over 6659808.20 frames. ], batch size: 85, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:11:31,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=817768.0, ans=0.1 2024-09-20 04:11:41,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=817814.6666666666, ans=0.125 2024-09-20 04:12:25,754 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=817908.0, ans=0.09899494936611666 2024-09-20 04:12:27,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=817908.0, ans=0.015 2024-09-20 04:12:30,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=817908.0, ans=0.2 2024-09-20 04:12:35,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=817954.6666666666, ans=0.0 2024-09-20 04:12:35,710 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:12:40,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=817954.6666666666, ans=0.0 2024-09-20 04:12:41,039 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.91 vs. limit=15.0 2024-09-20 04:12:49,945 INFO [train.py:1198] (1/2) Epoch 46, batch 850, loss[loss=0.2028, simple_loss=0.2686, pruned_loss=0.04986, ctc_loss=0.1093, cr_loss=0.3843, over 34394.00 frames. ], tot_loss[loss=0.1993, simple_loss=0.2583, pruned_loss=0.05128, ctc_loss=0.1117, cr_loss=0.3838, over 6691909.17 frames. ], batch size: 103, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:13:05,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=818001.3333333334, ans=0.125 2024-09-20 04:13:10,851 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=7.62 vs. limit=15.0 2024-09-20 04:13:23,704 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=818094.6666666666, ans=0.1 2024-09-20 04:13:26,902 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=818094.6666666666, ans=0.1 2024-09-20 04:13:38,157 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 2.664e+02 3.064e+02 3.572e+02 5.151e+02, threshold=6.127e+02, percent-clipped=0.0 2024-09-20 04:14:06,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=818188.0, ans=0.1 2024-09-20 04:14:10,055 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=818188.0, ans=0.125 2024-09-20 04:14:14,679 INFO [train.py:1198] (1/2) Epoch 46, batch 900, loss[loss=0.1744, simple_loss=0.2323, pruned_loss=0.04235, ctc_loss=0.09428, cr_loss=0.3214, over 34465.00 frames. ], tot_loss[loss=0.1993, simple_loss=0.2583, pruned_loss=0.05133, ctc_loss=0.1118, cr_loss=0.3839, over 6698970.82 frames. ], batch size: 85, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:14:31,432 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=818281.3333333334, ans=0.125 2024-09-20 04:14:38,492 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.71 vs. limit=15.0 2024-09-20 04:15:25,314 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=818421.3333333334, ans=0.125 2024-09-20 04:15:30,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=818421.3333333334, ans=0.125 2024-09-20 04:15:36,381 INFO [train.py:1198] (1/2) Epoch 46, batch 950, loss[loss=0.1819, simple_loss=0.2384, pruned_loss=0.04551, ctc_loss=0.1004, cr_loss=0.3555, over 34730.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2582, pruned_loss=0.05116, ctc_loss=0.1114, cr_loss=0.3825, over 6703123.92 frames. ], batch size: 87, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:15:38,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=818468.0, ans=0.0 2024-09-20 04:15:58,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=818514.6666666666, ans=0.0 2024-09-20 04:16:18,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=818561.3333333334, ans=0.0 2024-09-20 04:16:24,355 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 2.723e+02 3.115e+02 3.969e+02 5.966e+02, threshold=6.230e+02, percent-clipped=0.0 2024-09-20 04:16:34,689 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=818608.0, ans=0.125 2024-09-20 04:17:02,388 INFO [train.py:1198] (1/2) Epoch 46, batch 1000, loss[loss=0.1957, simple_loss=0.2493, pruned_loss=0.05225, ctc_loss=0.1115, cr_loss=0.3824, over 34479.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2588, pruned_loss=0.05163, ctc_loss=0.1123, cr_loss=0.3849, over 6696451.70 frames. ], batch size: 90, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:17:12,688 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:17:34,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=818794.6666666666, ans=0.1 2024-09-20 04:18:02,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=818841.3333333334, ans=0.125 2024-09-20 04:18:24,935 INFO [train.py:1198] (1/2) Epoch 46, batch 1050, loss[loss=0.1987, simple_loss=0.2597, pruned_loss=0.05016, ctc_loss=0.1113, cr_loss=0.3777, over 34562.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2583, pruned_loss=0.05161, ctc_loss=0.1122, cr_loss=0.3851, over 6704097.85 frames. ], batch size: 99, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:18:38,131 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=818934.6666666666, ans=0.2 2024-09-20 04:19:04,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=819028.0, ans=0.2 2024-09-20 04:19:05,324 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.47 vs. limit=6.0 2024-09-20 04:19:10,961 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.273e+02 2.544e+02 2.789e+02 3.411e+02 5.291e+02, threshold=5.577e+02, percent-clipped=0.0 2024-09-20 04:19:22,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=819074.6666666666, ans=0.125 2024-09-20 04:19:26,624 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.98 vs. limit=15.0 2024-09-20 04:19:42,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=819121.3333333334, ans=0.125 2024-09-20 04:19:48,804 INFO [train.py:1198] (1/2) Epoch 46, batch 1100, loss[loss=0.2066, simple_loss=0.2631, pruned_loss=0.05526, ctc_loss=0.1174, cr_loss=0.4012, over 34357.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2579, pruned_loss=0.05143, ctc_loss=0.1118, cr_loss=0.3843, over 6715738.85 frames. ], batch size: 91, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:20:30,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.40 vs. limit=15.0 2024-09-20 04:20:46,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=819308.0, ans=0.1 2024-09-20 04:20:48,386 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=819308.0, ans=0.125 2024-09-20 04:20:51,825 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=819308.0, ans=0.125 2024-09-20 04:21:05,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=819354.6666666666, ans=0.0 2024-09-20 04:21:13,060 INFO [train.py:1198] (1/2) Epoch 46, batch 1150, loss[loss=0.1993, simple_loss=0.2588, pruned_loss=0.0507, ctc_loss=0.1123, cr_loss=0.3953, over 34347.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2578, pruned_loss=0.05143, ctc_loss=0.1117, cr_loss=0.3836, over 6714371.45 frames. ], batch size: 91, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:21:53,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=819494.6666666666, ans=0.0 2024-09-20 04:21:57,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=819494.6666666666, ans=0.09899494936611666 2024-09-20 04:21:59,851 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.121e+02 2.545e+02 2.803e+02 3.318e+02 8.872e+02, threshold=5.606e+02, percent-clipped=2.0 2024-09-20 04:22:34,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=819634.6666666666, ans=0.1 2024-09-20 04:22:34,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=819634.6666666666, ans=0.0 2024-09-20 04:22:36,092 INFO [train.py:1198] (1/2) Epoch 46, batch 1200, loss[loss=0.2113, simple_loss=0.2724, pruned_loss=0.05467, ctc_loss=0.1214, cr_loss=0.4114, over 34560.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2587, pruned_loss=0.05169, ctc_loss=0.1123, cr_loss=0.3849, over 6708131.49 frames. ], batch size: 99, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:22:44,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=819634.6666666666, ans=0.035 2024-09-20 04:23:20,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=819728.0, ans=0.1 2024-09-20 04:23:59,801 INFO [train.py:1198] (1/2) Epoch 46, batch 1250, loss[loss=0.2134, simple_loss=0.2704, pruned_loss=0.05771, ctc_loss=0.1225, cr_loss=0.4135, over 34307.00 frames. ], tot_loss[loss=0.2011, simple_loss=0.2597, pruned_loss=0.05219, ctc_loss=0.1132, cr_loss=0.3874, over 6741499.93 frames. ], batch size: 107, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:24:00,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=819868.0, ans=0.0 2024-09-20 04:24:11,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=819868.0, ans=0.0 2024-09-20 04:24:23,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=819914.6666666666, ans=0.025 2024-09-20 04:24:28,617 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=819914.6666666666, ans=0.1 2024-09-20 04:24:31,967 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=819914.6666666666, ans=0.125 2024-09-20 04:24:34,146 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.32 vs. limit=15.0 2024-09-20 04:24:48,002 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.147e+02 2.572e+02 2.977e+02 3.630e+02 8.428e+02, threshold=5.953e+02, percent-clipped=2.0 2024-09-20 04:24:51,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=820008.0, ans=0.125 2024-09-20 04:24:59,754 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:24:59,818 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=820008.0, ans=0.1 2024-09-20 04:25:05,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=820008.0, ans=0.0 2024-09-20 04:25:24,274 INFO [train.py:1198] (1/2) Epoch 46, batch 1300, loss[loss=0.2155, simple_loss=0.2788, pruned_loss=0.05579, ctc_loss=0.1205, cr_loss=0.4116, over 33048.00 frames. ], tot_loss[loss=0.2005, simple_loss=0.2592, pruned_loss=0.05192, ctc_loss=0.1126, cr_loss=0.3859, over 6744618.38 frames. ], batch size: 130, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:25:28,095 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=820101.3333333334, ans=0.125 2024-09-20 04:25:44,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=820148.0, ans=0.125 2024-09-20 04:25:51,665 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.09 vs. limit=22.5 2024-09-20 04:26:05,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=820194.6666666666, ans=0.2 2024-09-20 04:26:19,078 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=820241.3333333334, ans=0.125 2024-09-20 04:26:41,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=820288.0, ans=0.0 2024-09-20 04:26:46,350 INFO [train.py:1198] (1/2) Epoch 46, batch 1350, loss[loss=0.2013, simple_loss=0.2611, pruned_loss=0.05193, ctc_loss=0.1115, cr_loss=0.3863, over 34520.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2589, pruned_loss=0.05184, ctc_loss=0.1125, cr_loss=0.3853, over 6763299.27 frames. ], batch size: 94, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:26:49,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=820334.6666666666, ans=0.2 2024-09-20 04:26:56,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=820334.6666666666, ans=0.05 2024-09-20 04:27:07,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=820381.3333333334, ans=0.025 2024-09-20 04:27:27,745 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=820428.0, ans=0.1 2024-09-20 04:27:33,795 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.234e+02 2.585e+02 3.102e+02 4.090e+02 6.336e+02, threshold=6.205e+02, percent-clipped=3.0 2024-09-20 04:27:40,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=820474.6666666666, ans=0.125 2024-09-20 04:27:56,572 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.96 vs. limit=15.0 2024-09-20 04:28:07,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=820521.3333333334, ans=0.05 2024-09-20 04:28:10,356 INFO [train.py:1198] (1/2) Epoch 46, batch 1400, loss[loss=0.1871, simple_loss=0.2372, pruned_loss=0.05021, ctc_loss=0.1069, cr_loss=0.3818, over 34319.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.259, pruned_loss=0.05199, ctc_loss=0.1126, cr_loss=0.3856, over 6775882.57 frames. ], batch size: 80, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:28:24,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=820568.0, ans=0.035 2024-09-20 04:28:33,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=820614.6666666666, ans=0.125 2024-09-20 04:28:45,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=820661.3333333334, ans=0.125 2024-09-20 04:28:53,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=820661.3333333334, ans=0.1 2024-09-20 04:29:14,172 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=10.07 vs. limit=15.0 2024-09-20 04:29:23,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=820754.6666666666, ans=0.2 2024-09-20 04:29:33,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=820801.3333333334, ans=0.125 2024-09-20 04:29:34,601 INFO [train.py:1198] (1/2) Epoch 46, batch 1450, loss[loss=0.2114, simple_loss=0.2748, pruned_loss=0.05407, ctc_loss=0.118, cr_loss=0.4084, over 34461.00 frames. ], tot_loss[loss=0.2008, simple_loss=0.2595, pruned_loss=0.05205, ctc_loss=0.1128, cr_loss=0.3862, over 6773366.24 frames. ], batch size: 110, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:29:36,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=820801.3333333334, ans=0.1 2024-09-20 04:29:40,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2024-09-20 04:29:50,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=768, metric=2.62 vs. limit=15.0 2024-09-20 04:30:01,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=820848.0, ans=0.125 2024-09-20 04:30:12,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=820894.6666666666, ans=0.025 2024-09-20 04:30:20,454 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.204e+02 2.529e+02 2.794e+02 3.273e+02 4.802e+02, threshold=5.589e+02, percent-clipped=0.0 2024-09-20 04:30:45,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=820988.0, ans=0.2 2024-09-20 04:30:50,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=820988.0, ans=0.2 2024-09-20 04:30:52,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=820988.0, ans=0.5 2024-09-20 04:30:54,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=820988.0, ans=0.125 2024-09-20 04:30:58,679 INFO [train.py:1198] (1/2) Epoch 46, batch 1500, loss[loss=0.1974, simple_loss=0.2643, pruned_loss=0.04698, ctc_loss=0.1076, cr_loss=0.3735, over 34462.00 frames. ], tot_loss[loss=0.2007, simple_loss=0.2597, pruned_loss=0.05184, ctc_loss=0.1126, cr_loss=0.3858, over 6774316.29 frames. ], batch size: 100, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:31:00,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=821034.6666666666, ans=0.125 2024-09-20 04:31:04,693 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.90 vs. limit=10.0 2024-09-20 04:31:13,054 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=12.0 2024-09-20 04:31:17,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=821081.3333333334, ans=0.0 2024-09-20 04:31:37,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=821128.0, ans=0.0 2024-09-20 04:31:42,203 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:32:15,140 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=821221.3333333334, ans=0.0 2024-09-20 04:32:22,999 INFO [train.py:1198] (1/2) Epoch 46, batch 1550, loss[loss=0.2151, simple_loss=0.2768, pruned_loss=0.0567, ctc_loss=0.1196, cr_loss=0.4008, over 34421.00 frames. ], tot_loss[loss=0.2007, simple_loss=0.2595, pruned_loss=0.0519, ctc_loss=0.1128, cr_loss=0.3863, over 6746290.08 frames. ], batch size: 105, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:32:38,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=821314.6666666666, ans=0.07 2024-09-20 04:32:38,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=821314.6666666666, ans=0.025 2024-09-20 04:32:52,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=821314.6666666666, ans=0.125 2024-09-20 04:33:15,326 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.173e+02 2.622e+02 3.108e+02 3.785e+02 6.082e+02, threshold=6.216e+02, percent-clipped=2.0 2024-09-20 04:33:15,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=821361.3333333334, ans=0.1 2024-09-20 04:33:43,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=821454.6666666666, ans=0.125 2024-09-20 04:33:43,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=821454.6666666666, ans=0.2 2024-09-20 04:33:48,575 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:33:51,551 INFO [train.py:1198] (1/2) Epoch 46, batch 1600, loss[loss=0.2125, simple_loss=0.2777, pruned_loss=0.05419, ctc_loss=0.1168, cr_loss=0.3894, over 34567.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2591, pruned_loss=0.0517, ctc_loss=0.1125, cr_loss=0.3851, over 6724977.10 frames. ], batch size: 99, lr: 2.59e-03, grad_scale: 32.0 2024-09-20 04:34:15,043 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.78 vs. limit=15.0 2024-09-20 04:34:42,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=821641.3333333334, ans=0.125 2024-09-20 04:34:46,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=821641.3333333334, ans=0.125 2024-09-20 04:35:14,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=821734.6666666666, ans=0.1 2024-09-20 04:35:15,295 INFO [train.py:1198] (1/2) Epoch 46, batch 1650, loss[loss=0.207, simple_loss=0.269, pruned_loss=0.05303, ctc_loss=0.1159, cr_loss=0.3933, over 34377.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.259, pruned_loss=0.0517, ctc_loss=0.1125, cr_loss=0.385, over 6715953.36 frames. ], batch size: 103, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 04:35:17,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=821734.6666666666, ans=0.07 2024-09-20 04:35:33,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=821781.3333333334, ans=0.125 2024-09-20 04:35:39,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=821781.3333333334, ans=0.125 2024-09-20 04:35:56,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=821828.0, ans=0.025 2024-09-20 04:36:02,999 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.161e+02 2.552e+02 2.907e+02 3.672e+02 8.381e+02, threshold=5.814e+02, percent-clipped=2.0 2024-09-20 04:36:03,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=821828.0, ans=0.125 2024-09-20 04:36:03,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=821828.0, ans=0.125 2024-09-20 04:36:13,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=821874.6666666666, ans=0.0 2024-09-20 04:36:29,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=821921.3333333334, ans=0.09899494936611666 2024-09-20 04:36:31,848 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.16 vs. limit=15.0 2024-09-20 04:36:38,976 INFO [train.py:1198] (1/2) Epoch 46, batch 1700, loss[loss=0.1746, simple_loss=0.2311, pruned_loss=0.04268, ctc_loss=0.0943, cr_loss=0.3459, over 34308.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2591, pruned_loss=0.05152, ctc_loss=0.112, cr_loss=0.3845, over 6742389.56 frames. ], batch size: 80, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 04:37:12,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2024-09-20 04:37:22,203 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:37:43,270 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:37:46,504 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=822154.6666666666, ans=0.125 2024-09-20 04:37:46,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=822154.6666666666, ans=0.125 2024-09-20 04:37:54,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=822154.6666666666, ans=0.125 2024-09-20 04:38:02,634 INFO [train.py:1198] (1/2) Epoch 46, batch 1750, loss[loss=0.1824, simple_loss=0.2351, pruned_loss=0.04745, ctc_loss=0.1048, cr_loss=0.3468, over 34143.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.2589, pruned_loss=0.0515, ctc_loss=0.112, cr_loss=0.3839, over 6752359.50 frames. ], batch size: 78, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 04:38:17,415 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=822248.0, ans=0.125 2024-09-20 04:38:24,306 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=22.5 2024-09-20 04:38:39,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=822294.6666666666, ans=0.125 2024-09-20 04:38:48,691 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.135e+02 2.550e+02 2.809e+02 3.306e+02 6.052e+02, threshold=5.618e+02, percent-clipped=1.0 2024-09-20 04:38:52,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=822341.3333333334, ans=0.125 2024-09-20 04:38:57,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=822341.3333333334, ans=0.0 2024-09-20 04:39:21,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=822388.0, ans=0.025 2024-09-20 04:39:24,626 INFO [train.py:1198] (1/2) Epoch 46, batch 1800, loss[loss=0.2172, simple_loss=0.2756, pruned_loss=0.05899, ctc_loss=0.1223, cr_loss=0.4114, over 34711.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.2591, pruned_loss=0.05137, ctc_loss=0.1118, cr_loss=0.3841, over 6755423.78 frames. ], batch size: 97, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 04:39:32,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.10 vs. limit=6.0 2024-09-20 04:40:03,519 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.30 vs. limit=15.0 2024-09-20 04:40:04,583 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=822528.0, ans=0.125 2024-09-20 04:40:16,357 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.07 vs. limit=10.0 2024-09-20 04:40:27,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=822574.6666666666, ans=0.09899494936611666 2024-09-20 04:40:30,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=822621.3333333334, ans=0.2 2024-09-20 04:40:43,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=822621.3333333334, ans=0.125 2024-09-20 04:40:48,468 INFO [train.py:1198] (1/2) Epoch 46, batch 1850, loss[loss=0.2059, simple_loss=0.2678, pruned_loss=0.05236, ctc_loss=0.1171, cr_loss=0.3968, over 34417.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2589, pruned_loss=0.05144, ctc_loss=0.1118, cr_loss=0.3844, over 6763239.06 frames. ], batch size: 100, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 04:41:36,106 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.171e+02 2.677e+02 3.313e+02 4.232e+02 6.032e+02, threshold=6.627e+02, percent-clipped=4.0 2024-09-20 04:41:44,972 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:41:55,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=822854.6666666666, ans=15.0 2024-09-20 04:42:07,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=822854.6666666666, ans=0.0 2024-09-20 04:42:12,212 INFO [train.py:1198] (1/2) Epoch 46, batch 1900, loss[loss=0.1911, simple_loss=0.2549, pruned_loss=0.04613, ctc_loss=0.1032, cr_loss=0.3619, over 34381.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2592, pruned_loss=0.05165, ctc_loss=0.1123, cr_loss=0.3856, over 6772013.04 frames. ], batch size: 103, lr: 2.58e-03, grad_scale: 16.0 2024-09-20 04:42:17,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=822901.3333333334, ans=0.125 2024-09-20 04:42:28,478 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.12 vs. limit=15.0 2024-09-20 04:42:40,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=822948.0, ans=0.2 2024-09-20 04:42:57,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=822994.6666666666, ans=0.0 2024-09-20 04:42:57,891 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.whiten, num_groups=1, num_channels=768, metric=3.74 vs. limit=12.0 2024-09-20 04:42:58,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=822994.6666666666, ans=0.125 2024-09-20 04:43:00,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=823041.3333333334, ans=0.125 2024-09-20 04:43:00,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=823041.3333333334, ans=0.0 2024-09-20 04:43:25,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=823088.0, ans=0.125 2024-09-20 04:43:36,710 INFO [train.py:1198] (1/2) Epoch 46, batch 1950, loss[loss=0.1991, simple_loss=0.2581, pruned_loss=0.05123, ctc_loss=0.1105, cr_loss=0.3914, over 34339.00 frames. ], tot_loss[loss=0.2012, simple_loss=0.2604, pruned_loss=0.05197, ctc_loss=0.113, cr_loss=0.3871, over 6789311.80 frames. ], batch size: 91, lr: 2.58e-03, grad_scale: 16.0 2024-09-20 04:43:38,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=823134.6666666666, ans=0.125 2024-09-20 04:44:13,904 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=823228.0, ans=0.125 2024-09-20 04:44:25,096 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.139e+02 2.546e+02 2.795e+02 3.142e+02 4.408e+02, threshold=5.590e+02, percent-clipped=0.0 2024-09-20 04:44:30,404 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:44:41,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=823321.3333333334, ans=0.125 2024-09-20 04:44:59,527 INFO [train.py:1198] (1/2) Epoch 46, batch 2000, loss[loss=0.1755, simple_loss=0.2294, pruned_loss=0.04449, ctc_loss=0.09492, cr_loss=0.3377, over 34142.00 frames. ], tot_loss[loss=0.2018, simple_loss=0.2609, pruned_loss=0.05229, ctc_loss=0.1134, cr_loss=0.3874, over 6764336.95 frames. ], batch size: 78, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 04:45:01,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=823368.0, ans=0.125 2024-09-20 04:45:31,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=823414.6666666666, ans=0.1 2024-09-20 04:45:33,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=823461.3333333334, ans=0.1 2024-09-20 04:45:37,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=823461.3333333334, ans=0.125 2024-09-20 04:46:23,455 INFO [train.py:1198] (1/2) Epoch 46, batch 2050, loss[loss=0.1802, simple_loss=0.2379, pruned_loss=0.04457, ctc_loss=0.09745, cr_loss=0.3438, over 34488.00 frames. ], tot_loss[loss=0.201, simple_loss=0.26, pruned_loss=0.05201, ctc_loss=0.1127, cr_loss=0.3861, over 6754610.73 frames. ], batch size: 82, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 04:46:27,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=823601.3333333334, ans=0.1 2024-09-20 04:46:56,927 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=823694.6666666666, ans=0.0 2024-09-20 04:47:00,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=823694.6666666666, ans=0.1 2024-09-20 04:47:08,790 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=823694.6666666666, ans=0.125 2024-09-20 04:47:13,249 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.313e+02 2.670e+02 3.078e+02 4.131e+02 8.607e+02, threshold=6.155e+02, percent-clipped=7.0 2024-09-20 04:47:17,694 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.80 vs. limit=15.0 2024-09-20 04:47:18,676 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=6.403e-02 2024-09-20 04:47:47,651 INFO [train.py:1198] (1/2) Epoch 46, batch 2100, loss[loss=0.1979, simple_loss=0.257, pruned_loss=0.05072, ctc_loss=0.1104, cr_loss=0.3832, over 34550.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2594, pruned_loss=0.05174, ctc_loss=0.1122, cr_loss=0.3851, over 6768354.15 frames. ], batch size: 94, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 04:47:54,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=22.5 2024-09-20 04:47:57,819 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=823834.6666666666, ans=0.0 2024-09-20 04:48:01,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.18 vs. limit=15.0 2024-09-20 04:48:07,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=823881.3333333334, ans=0.1 2024-09-20 04:48:15,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=823881.3333333334, ans=0.07 2024-09-20 04:48:33,033 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.46 vs. limit=15.0 2024-09-20 04:49:03,495 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.097e-02 2024-09-20 04:49:11,355 INFO [train.py:1198] (1/2) Epoch 46, batch 2150, loss[loss=0.1904, simple_loss=0.2492, pruned_loss=0.04755, ctc_loss=0.107, cr_loss=0.3791, over 34369.00 frames. ], tot_loss[loss=0.1994, simple_loss=0.2585, pruned_loss=0.0513, ctc_loss=0.1115, cr_loss=0.3835, over 6788492.93 frames. ], batch size: 91, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 04:49:36,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=824114.6666666666, ans=0.07 2024-09-20 04:49:44,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=824161.3333333334, ans=0.0 2024-09-20 04:49:59,404 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.150e+02 2.632e+02 3.351e+02 4.156e+02 6.089e+02, threshold=6.703e+02, percent-clipped=0.0 2024-09-20 04:50:20,532 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.30 vs. limit=10.0 2024-09-20 04:50:23,926 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.81 vs. limit=12.0 2024-09-20 04:50:34,092 INFO [train.py:1198] (1/2) Epoch 46, batch 2200, loss[loss=0.2031, simple_loss=0.2681, pruned_loss=0.05075, ctc_loss=0.1092, cr_loss=0.3696, over 34450.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2588, pruned_loss=0.05147, ctc_loss=0.1116, cr_loss=0.3838, over 6783102.02 frames. ], batch size: 100, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 04:50:37,658 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=824301.3333333334, ans=0.035 2024-09-20 04:50:44,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=824301.3333333334, ans=0.025 2024-09-20 04:50:55,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=824348.0, ans=0.0 2024-09-20 04:50:59,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=824348.0, ans=15.0 2024-09-20 04:51:12,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=824394.6666666666, ans=0.0 2024-09-20 04:51:34,195 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:51:35,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=824441.3333333334, ans=0.1 2024-09-20 04:51:40,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=824488.0, ans=0.0 2024-09-20 04:51:42,513 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824488.0, ans=0.1 2024-09-20 04:51:44,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=824488.0, ans=0.1 2024-09-20 04:51:49,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=824488.0, ans=0.1 2024-09-20 04:51:58,630 INFO [train.py:1198] (1/2) Epoch 46, batch 2250, loss[loss=0.1958, simple_loss=0.2576, pruned_loss=0.04932, ctc_loss=0.1053, cr_loss=0.3594, over 34428.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2589, pruned_loss=0.05143, ctc_loss=0.1116, cr_loss=0.3837, over 6781094.04 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 04:52:02,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=824534.6666666666, ans=0.0 2024-09-20 04:52:43,301 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 04:52:46,110 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.175e+02 2.637e+02 2.943e+02 3.669e+02 5.931e+02, threshold=5.885e+02, percent-clipped=0.0 2024-09-20 04:52:49,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=824674.6666666666, ans=10.0 2024-09-20 04:52:56,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=824674.6666666666, ans=0.2 2024-09-20 04:53:01,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.89 vs. limit=6.0 2024-09-20 04:53:01,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2024-09-20 04:53:22,435 INFO [train.py:1198] (1/2) Epoch 46, batch 2300, loss[loss=0.1957, simple_loss=0.2453, pruned_loss=0.0538, ctc_loss=0.1139, cr_loss=0.3933, over 34693.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.2578, pruned_loss=0.05114, ctc_loss=0.111, cr_loss=0.3818, over 6767241.60 frames. ], batch size: 84, lr: 2.58e-03, grad_scale: 16.0 2024-09-20 04:53:22,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=824768.0, ans=0.07 2024-09-20 04:54:12,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=824908.0, ans=0.025 2024-09-20 04:54:26,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=824954.6666666666, ans=0.0 2024-09-20 04:54:44,640 INFO [train.py:1198] (1/2) Epoch 46, batch 2350, loss[loss=0.2163, simple_loss=0.2777, pruned_loss=0.05676, ctc_loss=0.1242, cr_loss=0.4141, over 34689.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2581, pruned_loss=0.05132, ctc_loss=0.1115, cr_loss=0.3831, over 6773395.66 frames. ], batch size: 97, lr: 2.58e-03, grad_scale: 16.0 2024-09-20 04:55:20,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=825094.6666666666, ans=0.125 2024-09-20 04:55:36,476 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.147e+02 2.532e+02 2.850e+02 3.555e+02 5.564e+02, threshold=5.699e+02, percent-clipped=0.0 2024-09-20 04:55:45,102 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=825141.3333333334, ans=0.2 2024-09-20 04:55:55,040 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=825188.0, ans=0.125 2024-09-20 04:56:09,413 INFO [train.py:1198] (1/2) Epoch 46, batch 2400, loss[loss=0.2, simple_loss=0.256, pruned_loss=0.05289, ctc_loss=0.1143, cr_loss=0.3852, over 34605.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2587, pruned_loss=0.0515, ctc_loss=0.1119, cr_loss=0.384, over 6777439.58 frames. ], batch size: 89, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 04:56:16,143 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=825234.6666666666, ans=10.0 2024-09-20 04:56:17,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=825234.6666666666, ans=0.2 2024-09-20 04:56:27,765 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=825281.3333333334, ans=0.0 2024-09-20 04:56:29,407 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=825281.3333333334, ans=0.035 2024-09-20 04:56:29,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=825281.3333333334, ans=0.1 2024-09-20 04:56:34,712 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=825281.3333333334, ans=0.2 2024-09-20 04:56:36,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=825281.3333333334, ans=0.125 2024-09-20 04:57:02,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=825374.6666666666, ans=0.95 2024-09-20 04:57:10,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=825374.6666666666, ans=0.0 2024-09-20 04:57:20,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=825421.3333333334, ans=0.125 2024-09-20 04:57:24,737 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.35 vs. limit=15.0 2024-09-20 04:57:25,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=825421.3333333334, ans=0.125 2024-09-20 04:57:33,773 INFO [train.py:1198] (1/2) Epoch 46, batch 2450, loss[loss=0.202, simple_loss=0.2617, pruned_loss=0.05261, ctc_loss=0.1095, cr_loss=0.3798, over 34407.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2593, pruned_loss=0.0518, ctc_loss=0.1125, cr_loss=0.3857, over 6750468.97 frames. ], batch size: 95, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 04:57:56,120 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.52 vs. limit=15.0 2024-09-20 04:57:57,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=825514.6666666666, ans=0.0 2024-09-20 04:58:20,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=825561.3333333334, ans=0.125 2024-09-20 04:58:22,929 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.255e+02 2.553e+02 2.956e+02 3.716e+02 5.792e+02, threshold=5.911e+02, percent-clipped=2.0 2024-09-20 04:58:35,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=825608.0, ans=0.0 2024-09-20 04:58:44,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=825654.6666666666, ans=0.2 2024-09-20 04:58:48,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=825654.6666666666, ans=0.125 2024-09-20 04:58:57,701 INFO [train.py:1198] (1/2) Epoch 46, batch 2500, loss[loss=0.2001, simple_loss=0.2613, pruned_loss=0.05049, ctc_loss=0.1126, cr_loss=0.3872, over 34449.00 frames. ], tot_loss[loss=0.2007, simple_loss=0.2595, pruned_loss=0.05191, ctc_loss=0.1128, cr_loss=0.3859, over 6761657.31 frames. ], batch size: 100, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 04:59:11,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=825701.3333333334, ans=0.125 2024-09-20 04:59:16,240 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=825748.0, ans=0.0 2024-09-20 04:59:34,371 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=825794.6666666666, ans=0.125 2024-09-20 04:59:34,441 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=825794.6666666666, ans=0.125 2024-09-20 04:59:42,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=825794.6666666666, ans=0.0 2024-09-20 04:59:59,888 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.20 vs. limit=15.0 2024-09-20 05:00:21,676 INFO [train.py:1198] (1/2) Epoch 46, batch 2550, loss[loss=0.1726, simple_loss=0.2287, pruned_loss=0.04199, ctc_loss=0.09482, cr_loss=0.336, over 34166.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2594, pruned_loss=0.05176, ctc_loss=0.1125, cr_loss=0.3852, over 6765537.50 frames. ], batch size: 78, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 05:00:22,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.19 vs. limit=15.0 2024-09-20 05:00:47,228 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.16 vs. limit=15.0 2024-09-20 05:00:58,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=826028.0, ans=0.09899494936611666 2024-09-20 05:01:11,265 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.206e+02 2.537e+02 2.962e+02 3.697e+02 5.529e+02, threshold=5.923e+02, percent-clipped=0.0 2024-09-20 05:01:25,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=826074.6666666666, ans=0.125 2024-09-20 05:01:44,527 INFO [train.py:1198] (1/2) Epoch 46, batch 2600, loss[loss=0.2003, simple_loss=0.2525, pruned_loss=0.05404, ctc_loss=0.1183, cr_loss=0.408, over 34388.00 frames. ], tot_loss[loss=0.2007, simple_loss=0.2598, pruned_loss=0.05179, ctc_loss=0.1126, cr_loss=0.3857, over 6760920.83 frames. ], batch size: 91, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 05:01:48,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=826168.0, ans=0.025 2024-09-20 05:02:11,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=826214.6666666666, ans=10.0 2024-09-20 05:02:25,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=826261.3333333334, ans=0.2 2024-09-20 05:02:53,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=826354.6666666666, ans=0.0 2024-09-20 05:03:05,315 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=826354.6666666666, ans=0.2 2024-09-20 05:03:08,057 INFO [train.py:1198] (1/2) Epoch 46, batch 2650, loss[loss=0.2093, simple_loss=0.2703, pruned_loss=0.05458, ctc_loss=0.1165, cr_loss=0.395, over 34224.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2601, pruned_loss=0.05184, ctc_loss=0.1128, cr_loss=0.3864, over 6768594.38 frames. ], batch size: 117, lr: 2.58e-03, grad_scale: 16.0 2024-09-20 05:03:08,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=826401.3333333334, ans=0.0 2024-09-20 05:03:18,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=826401.3333333334, ans=0.125 2024-09-20 05:03:39,746 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.58 vs. limit=22.5 2024-09-20 05:03:40,027 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.19 vs. limit=6.0 2024-09-20 05:03:44,447 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=826494.6666666666, ans=0.025 2024-09-20 05:03:47,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=826494.6666666666, ans=0.125 2024-09-20 05:03:58,522 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.199e+02 2.569e+02 2.760e+02 3.432e+02 6.502e+02, threshold=5.520e+02, percent-clipped=1.0 2024-09-20 05:04:03,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=826541.3333333334, ans=0.04949747468305833 2024-09-20 05:04:07,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2024-09-20 05:04:31,803 INFO [train.py:1198] (1/2) Epoch 46, batch 2700, loss[loss=0.2008, simple_loss=0.2666, pruned_loss=0.049, ctc_loss=0.1102, cr_loss=0.3734, over 34649.00 frames. ], tot_loss[loss=0.2012, simple_loss=0.2603, pruned_loss=0.05196, ctc_loss=0.1129, cr_loss=0.3868, over 6763113.73 frames. ], batch size: 102, lr: 2.58e-03, grad_scale: 16.0 2024-09-20 05:05:03,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=826728.0, ans=0.025 2024-09-20 05:05:16,431 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=826728.0, ans=0.2 2024-09-20 05:05:18,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=826728.0, ans=0.125 2024-09-20 05:05:31,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=826774.6666666666, ans=0.125 2024-09-20 05:05:37,815 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=826821.3333333334, ans=0.125 2024-09-20 05:05:47,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=826821.3333333334, ans=0.125 2024-09-20 05:05:52,540 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=826868.0, ans=0.125 2024-09-20 05:05:52,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=826868.0, ans=0.1 2024-09-20 05:05:53,904 INFO [train.py:1198] (1/2) Epoch 46, batch 2750, loss[loss=0.1913, simple_loss=0.2452, pruned_loss=0.0506, ctc_loss=0.1064, cr_loss=0.3742, over 34634.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.2589, pruned_loss=0.05153, ctc_loss=0.112, cr_loss=0.3844, over 6760663.63 frames. ], batch size: 88, lr: 2.58e-03, grad_scale: 16.0 2024-09-20 05:06:37,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=826961.3333333334, ans=0.0 2024-09-20 05:06:44,530 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=7.05 vs. limit=15.0 2024-09-20 05:06:46,905 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.243e+02 2.713e+02 3.117e+02 3.757e+02 6.069e+02, threshold=6.233e+02, percent-clipped=4.0 2024-09-20 05:07:10,997 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.20 vs. limit=15.0 2024-09-20 05:07:18,577 INFO [train.py:1198] (1/2) Epoch 46, batch 2800, loss[loss=0.2252, simple_loss=0.2777, pruned_loss=0.06409, ctc_loss=0.1383, cr_loss=0.4204, over 23618.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2592, pruned_loss=0.0518, ctc_loss=0.1125, cr_loss=0.3853, over 6738316.20 frames. ], batch size: 245, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 05:07:31,962 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=827101.3333333334, ans=0.0 2024-09-20 05:08:00,034 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=827194.6666666666, ans=0.0 2024-09-20 05:08:06,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=827194.6666666666, ans=0.125 2024-09-20 05:08:11,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=827241.3333333334, ans=0.125 2024-09-20 05:08:25,994 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=827288.0, ans=0.0 2024-09-20 05:08:32,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=827288.0, ans=0.0 2024-09-20 05:08:42,118 INFO [train.py:1198] (1/2) Epoch 46, batch 2850, loss[loss=0.2089, simple_loss=0.2586, pruned_loss=0.0588, ctc_loss=0.1254, cr_loss=0.4144, over 34455.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.2596, pruned_loss=0.05209, ctc_loss=0.1131, cr_loss=0.3867, over 6722917.37 frames. ], batch size: 90, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 05:08:47,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=827334.6666666666, ans=0.1 2024-09-20 05:09:12,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=827381.3333333334, ans=0.125 2024-09-20 05:09:14,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.16 vs. limit=10.0 2024-09-20 05:09:33,190 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.260e+02 2.616e+02 3.008e+02 3.729e+02 6.440e+02, threshold=6.015e+02, percent-clipped=1.0 2024-09-20 05:09:53,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=827521.3333333334, ans=0.025 2024-09-20 05:10:06,403 INFO [train.py:1198] (1/2) Epoch 46, batch 2900, loss[loss=0.1943, simple_loss=0.2554, pruned_loss=0.04875, ctc_loss=0.1065, cr_loss=0.3608, over 34550.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2604, pruned_loss=0.05216, ctc_loss=0.1132, cr_loss=0.3866, over 6753382.56 frames. ], batch size: 94, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 05:10:08,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=827568.0, ans=0.95 2024-09-20 05:10:10,776 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.54 vs. limit=15.0 2024-09-20 05:10:30,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=827614.6666666666, ans=0.0 2024-09-20 05:10:33,598 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=827614.6666666666, ans=0.0 2024-09-20 05:10:35,033 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=827614.6666666666, ans=0.1 2024-09-20 05:10:36,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=827614.6666666666, ans=0.2 2024-09-20 05:10:41,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=827661.3333333334, ans=0.125 2024-09-20 05:10:56,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=827708.0, ans=0.2 2024-09-20 05:11:31,145 INFO [train.py:1198] (1/2) Epoch 46, batch 2950, loss[loss=0.1951, simple_loss=0.2512, pruned_loss=0.05061, ctc_loss=0.1109, cr_loss=0.3922, over 34632.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.259, pruned_loss=0.05159, ctc_loss=0.112, cr_loss=0.3836, over 6747225.48 frames. ], batch size: 88, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 05:11:50,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=827848.0, ans=0.025 2024-09-20 05:12:11,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=827894.6666666666, ans=0.07 2024-09-20 05:12:20,017 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=15.0 2024-09-20 05:12:22,081 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.920e+02 2.470e+02 2.803e+02 3.571e+02 5.109e+02, threshold=5.606e+02, percent-clipped=0.0 2024-09-20 05:12:22,269 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=827941.3333333334, ans=0.125 2024-09-20 05:12:22,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=827941.3333333334, ans=0.2 2024-09-20 05:12:23,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=827941.3333333334, ans=0.125 2024-09-20 05:12:27,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=827941.3333333334, ans=0.125 2024-09-20 05:12:45,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=827988.0, ans=0.1 2024-09-20 05:12:53,592 INFO [train.py:1198] (1/2) Epoch 46, batch 3000, loss[loss=0.1985, simple_loss=0.2598, pruned_loss=0.0502, ctc_loss=0.1086, cr_loss=0.3747, over 34512.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.2588, pruned_loss=0.05151, ctc_loss=0.1118, cr_loss=0.3834, over 6748931.64 frames. ], batch size: 94, lr: 2.58e-03, grad_scale: 32.0 2024-09-20 05:12:53,592 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 05:13:03,328 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.8548, 3.4310, 3.6403, 3.3905, 3.1664, 3.5316, 2.9770, 2.9015], device='cuda:1') 2024-09-20 05:13:10,387 INFO [train.py:1230] (1/2) Epoch 46, validation: loss=0.1487, simple_loss=0.2418, pruned_loss=0.0239, ctc_loss=0.03867, cr_loss=2.285e-14, over 944034.00 frames. 2024-09-20 05:13:10,387 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-20 05:13:19,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=828034.6666666666, ans=0.125 2024-09-20 05:13:22,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=828034.6666666666, ans=0.125 2024-09-20 05:13:35,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=828081.3333333334, ans=0.125 2024-09-20 05:13:42,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=828081.3333333334, ans=0.125 2024-09-20 05:14:19,286 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=828221.3333333334, ans=0.125 2024-09-20 05:14:33,758 INFO [train.py:1198] (1/2) Epoch 46, batch 3050, loss[loss=0.1898, simple_loss=0.2493, pruned_loss=0.04754, ctc_loss=0.1038, cr_loss=0.3595, over 34593.00 frames. ], tot_loss[loss=0.2006, simple_loss=0.2596, pruned_loss=0.05186, ctc_loss=0.1125, cr_loss=0.385, over 6742024.91 frames. ], batch size: 89, lr: 2.57e-03, grad_scale: 32.0 2024-09-20 05:14:48,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=828314.6666666666, ans=0.2 2024-09-20 05:14:52,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=828314.6666666666, ans=0.0 2024-09-20 05:14:53,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=828314.6666666666, ans=0.0 2024-09-20 05:14:54,336 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.50 vs. limit=22.5 2024-09-20 05:14:58,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=828314.6666666666, ans=0.125 2024-09-20 05:15:12,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=828361.3333333334, ans=0.025 2024-09-20 05:15:14,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=828361.3333333334, ans=0.0 2024-09-20 05:15:15,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=828361.3333333334, ans=0.1 2024-09-20 05:15:23,810 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.103e+02 2.645e+02 2.900e+02 3.782e+02 6.438e+02, threshold=5.800e+02, percent-clipped=2.0 2024-09-20 05:15:48,905 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.77 vs. limit=15.0 2024-09-20 05:15:54,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=828501.3333333334, ans=0.2 2024-09-20 05:15:55,966 INFO [train.py:1198] (1/2) Epoch 46, batch 3100, loss[loss=0.2142, simple_loss=0.273, pruned_loss=0.05687, ctc_loss=0.1243, cr_loss=0.4194, over 34164.00 frames. ], tot_loss[loss=0.2005, simple_loss=0.2594, pruned_loss=0.05186, ctc_loss=0.1124, cr_loss=0.3846, over 6742459.22 frames. ], batch size: 117, lr: 2.57e-03, grad_scale: 32.0 2024-09-20 05:15:56,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=828501.3333333334, ans=0.125 2024-09-20 05:16:12,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=828548.0, ans=0.0 2024-09-20 05:16:21,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=828548.0, ans=0.07 2024-09-20 05:16:33,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=828594.6666666666, ans=0.125 2024-09-20 05:16:41,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=828594.6666666666, ans=0.125 2024-09-20 05:16:51,272 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:16:52,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=828641.3333333334, ans=0.125 2024-09-20 05:17:16,974 INFO [train.py:1198] (1/2) Epoch 46, batch 3150, loss[loss=0.2148, simple_loss=0.2757, pruned_loss=0.05665, ctc_loss=0.1213, cr_loss=0.4106, over 33827.00 frames. ], tot_loss[loss=0.2005, simple_loss=0.2594, pruned_loss=0.05188, ctc_loss=0.1124, cr_loss=0.3845, over 6748750.44 frames. ], batch size: 122, lr: 2.57e-03, grad_scale: 32.0 2024-09-20 05:17:46,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=828781.3333333334, ans=0.0 2024-09-20 05:17:51,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=828828.0, ans=0.125 2024-09-20 05:17:51,545 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=828828.0, ans=0.025 2024-09-20 05:17:58,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.00 vs. limit=22.5 2024-09-20 05:18:07,517 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.098e+02 2.550e+02 2.974e+02 3.526e+02 6.084e+02, threshold=5.949e+02, percent-clipped=2.0 2024-09-20 05:18:08,233 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=15.0 2024-09-20 05:18:35,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=828921.3333333334, ans=0.125 2024-09-20 05:18:38,064 INFO [train.py:1198] (1/2) Epoch 46, batch 3200, loss[loss=0.2034, simple_loss=0.2579, pruned_loss=0.05506, ctc_loss=0.1164, cr_loss=0.389, over 34519.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2589, pruned_loss=0.05166, ctc_loss=0.112, cr_loss=0.3834, over 6762589.79 frames. ], batch size: 94, lr: 2.57e-03, grad_scale: 32.0 2024-09-20 05:18:40,056 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=828968.0, ans=0.125 2024-09-20 05:18:55,895 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=829014.6666666666, ans=0.125 2024-09-20 05:19:00,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=829014.6666666666, ans=0.125 2024-09-20 05:19:02,348 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=829014.6666666666, ans=0.125 2024-09-20 05:19:03,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=829014.6666666666, ans=0.125 2024-09-20 05:19:05,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=829014.6666666666, ans=0.1 2024-09-20 05:19:26,799 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=829108.0, ans=0.125 2024-09-20 05:19:31,711 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=829108.0, ans=0.125 2024-09-20 05:19:47,662 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=829154.6666666666, ans=0.2 2024-09-20 05:20:00,408 INFO [train.py:1198] (1/2) Epoch 46, batch 3250, loss[loss=0.2028, simple_loss=0.2661, pruned_loss=0.05113, ctc_loss=0.1119, cr_loss=0.3708, over 34666.00 frames. ], tot_loss[loss=0.2006, simple_loss=0.2596, pruned_loss=0.05182, ctc_loss=0.1124, cr_loss=0.3848, over 6771685.59 frames. ], batch size: 98, lr: 2.57e-03, grad_scale: 16.0 2024-09-20 05:20:50,414 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=829341.3333333334, ans=0.5 2024-09-20 05:20:51,690 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.113e+02 2.606e+02 3.109e+02 3.736e+02 5.948e+02, threshold=6.219e+02, percent-clipped=0.0 2024-09-20 05:20:53,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=829341.3333333334, ans=0.125 2024-09-20 05:20:55,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=829341.3333333334, ans=0.125 2024-09-20 05:21:03,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=22.5 2024-09-20 05:21:15,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=829388.0, ans=0.125 2024-09-20 05:21:20,655 INFO [train.py:1198] (1/2) Epoch 46, batch 3300, loss[loss=0.2118, simple_loss=0.2768, pruned_loss=0.05396, ctc_loss=0.1169, cr_loss=0.3902, over 33111.00 frames. ], tot_loss[loss=0.1995, simple_loss=0.2586, pruned_loss=0.05139, ctc_loss=0.1116, cr_loss=0.3828, over 6768976.97 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 16.0 2024-09-20 05:21:46,637 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=829481.3333333334, ans=0.125 2024-09-20 05:21:49,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=829481.3333333334, ans=0.125 2024-09-20 05:21:55,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=829528.0, ans=0.025 2024-09-20 05:22:02,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=829528.0, ans=0.2 2024-09-20 05:22:03,241 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=15.0 2024-09-20 05:22:05,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=829528.0, ans=0.1 2024-09-20 05:22:08,785 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=829574.6666666666, ans=0.2 2024-09-20 05:22:10,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=829574.6666666666, ans=0.0 2024-09-20 05:22:15,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=829574.6666666666, ans=0.2 2024-09-20 05:22:23,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.22 vs. limit=6.0 2024-09-20 05:22:37,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=829621.3333333334, ans=0.125 2024-09-20 05:22:42,391 INFO [train.py:1198] (1/2) Epoch 46, batch 3350, loss[loss=0.1958, simple_loss=0.2623, pruned_loss=0.04648, ctc_loss=0.106, cr_loss=0.3793, over 33871.00 frames. ], tot_loss[loss=0.2003, simple_loss=0.2594, pruned_loss=0.0517, ctc_loss=0.1121, cr_loss=0.3839, over 6742468.18 frames. ], batch size: 122, lr: 2.57e-03, grad_scale: 16.0 2024-09-20 05:22:47,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=829668.0, ans=0.125 2024-09-20 05:22:48,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.35 vs. limit=15.0 2024-09-20 05:23:15,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=829761.3333333334, ans=0.125 2024-09-20 05:23:31,634 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=829808.0, ans=0.2 2024-09-20 05:23:34,423 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.195e+02 2.612e+02 2.919e+02 3.619e+02 5.936e+02, threshold=5.839e+02, percent-clipped=0.0 2024-09-20 05:24:03,245 INFO [train.py:1198] (1/2) Epoch 46, batch 3400, loss[loss=0.168, simple_loss=0.224, pruned_loss=0.04012, ctc_loss=0.0914, cr_loss=0.3339, over 34189.00 frames. ], tot_loss[loss=0.2001, simple_loss=0.259, pruned_loss=0.0517, ctc_loss=0.1122, cr_loss=0.3842, over 6733807.62 frames. ], batch size: 78, lr: 2.57e-03, grad_scale: 16.0 2024-09-20 05:24:05,350 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=829901.3333333334, ans=0.0 2024-09-20 05:24:16,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=829901.3333333334, ans=0.0 2024-09-20 05:24:27,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=829948.0, ans=0.125 2024-09-20 05:24:27,826 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=829948.0, ans=0.2 2024-09-20 05:24:31,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=829948.0, ans=0.0 2024-09-20 05:24:38,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=829994.6666666666, ans=0.125 2024-09-20 05:24:39,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=829994.6666666666, ans=0.025 2024-09-20 05:25:20,522 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.55 vs. limit=15.0 2024-09-20 05:25:24,685 INFO [train.py:1198] (1/2) Epoch 46, batch 3450, loss[loss=0.2023, simple_loss=0.2686, pruned_loss=0.04984, ctc_loss=0.1078, cr_loss=0.3683, over 33094.00 frames. ], tot_loss[loss=0.2005, simple_loss=0.2596, pruned_loss=0.05177, ctc_loss=0.1123, cr_loss=0.385, over 6746480.23 frames. ], batch size: 130, lr: 2.57e-03, grad_scale: 16.0 2024-09-20 05:25:25,099 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=830134.6666666666, ans=0.1 2024-09-20 05:25:59,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.07 vs. limit=15.0 2024-09-20 05:26:16,352 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.30 vs. limit=15.0 2024-09-20 05:26:16,889 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.175e+02 2.555e+02 2.916e+02 3.664e+02 7.042e+02, threshold=5.832e+02, percent-clipped=5.0 2024-09-20 05:26:17,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=830274.6666666666, ans=0.125 2024-09-20 05:26:45,496 INFO [train.py:1198] (1/2) Epoch 46, batch 3500, loss[loss=0.1724, simple_loss=0.2345, pruned_loss=0.03995, ctc_loss=0.08876, cr_loss=0.3158, over 34465.00 frames. ], tot_loss[loss=0.2003, simple_loss=0.2591, pruned_loss=0.05179, ctc_loss=0.1123, cr_loss=0.3849, over 6747868.71 frames. ], batch size: 85, lr: 2.57e-03, grad_scale: 16.0 2024-09-20 05:26:52,357 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:26:53,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=830368.0, ans=0.0 2024-09-20 05:27:05,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=830414.6666666666, ans=0.0 2024-09-20 05:27:06,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=830414.6666666666, ans=0.0 2024-09-20 05:27:13,259 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:27:14,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=830414.6666666666, ans=0.0 2024-09-20 05:27:29,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=830461.3333333334, ans=0.2 2024-09-20 05:27:53,404 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=830554.6666666666, ans=0.125 2024-09-20 05:28:02,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=830554.6666666666, ans=0.125 2024-09-20 05:28:05,873 INFO [train.py:1198] (1/2) Epoch 46, batch 3550, loss[loss=0.2005, simple_loss=0.2638, pruned_loss=0.0495, ctc_loss=0.1125, cr_loss=0.3934, over 34344.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2586, pruned_loss=0.0515, ctc_loss=0.1117, cr_loss=0.384, over 6757035.39 frames. ], batch size: 103, lr: 2.57e-03, grad_scale: 16.0 2024-09-20 05:28:06,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=830601.3333333334, ans=0.2 2024-09-20 05:28:09,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=830601.3333333334, ans=0.07 2024-09-20 05:28:09,531 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=830601.3333333334, ans=0.2 2024-09-20 05:28:17,476 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=830601.3333333334, ans=0.0 2024-09-20 05:28:22,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.35 vs. limit=6.0 2024-09-20 05:28:57,371 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.160e+02 2.550e+02 2.941e+02 3.633e+02 6.355e+02, threshold=5.881e+02, percent-clipped=2.0 2024-09-20 05:29:02,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=22.5 2024-09-20 05:29:10,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=830788.0, ans=0.125 2024-09-20 05:29:16,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=830788.0, ans=0.0 2024-09-20 05:29:19,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=830788.0, ans=0.2 2024-09-20 05:29:27,098 INFO [train.py:1198] (1/2) Epoch 46, batch 3600, loss[loss=0.1989, simple_loss=0.2558, pruned_loss=0.05117, ctc_loss=0.1148, cr_loss=0.4171, over 34477.00 frames. ], tot_loss[loss=0.2001, simple_loss=0.259, pruned_loss=0.05171, ctc_loss=0.1122, cr_loss=0.385, over 6766523.90 frames. ], batch size: 90, lr: 2.57e-03, grad_scale: 32.0 2024-09-20 05:29:32,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=830834.6666666666, ans=0.0 2024-09-20 05:29:36,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=830834.6666666666, ans=10.0 2024-09-20 05:29:40,230 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=830834.6666666666, ans=0.125 2024-09-20 05:29:52,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.92 vs. limit=15.0 2024-09-20 05:30:22,939 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=10.12 vs. limit=15.0 2024-09-20 05:30:38,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=831021.3333333334, ans=0.035 2024-09-20 05:30:41,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=831021.3333333334, ans=0.0 2024-09-20 05:30:47,575 INFO [train.py:1198] (1/2) Epoch 46, batch 3650, loss[loss=0.2211, simple_loss=0.2778, pruned_loss=0.06055, ctc_loss=0.1293, cr_loss=0.4352, over 34436.00 frames. ], tot_loss[loss=0.1996, simple_loss=0.2584, pruned_loss=0.0515, ctc_loss=0.1119, cr_loss=0.3838, over 6769120.45 frames. ], batch size: 110, lr: 2.57e-03, grad_scale: 32.0 2024-09-20 05:30:54,198 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=831068.0, ans=0.125 2024-09-20 05:31:05,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=831114.6666666666, ans=0.1 2024-09-20 05:31:05,725 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:31:08,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=831114.6666666666, ans=0.0 2024-09-20 05:31:11,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=831114.6666666666, ans=0.0 2024-09-20 05:31:27,864 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=831161.3333333334, ans=0.025 2024-09-20 05:31:36,538 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.22 vs. limit=22.5 2024-09-20 05:31:38,651 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.126e+02 2.533e+02 2.955e+02 3.962e+02 6.818e+02, threshold=5.910e+02, percent-clipped=5.0 2024-09-20 05:31:40,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=831208.0, ans=0.125 2024-09-20 05:32:07,413 INFO [train.py:1198] (1/2) Epoch 46, batch 3700, loss[loss=0.1976, simple_loss=0.2589, pruned_loss=0.04946, ctc_loss=0.1086, cr_loss=0.3888, over 34639.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2584, pruned_loss=0.05118, ctc_loss=0.1115, cr_loss=0.3829, over 6784769.74 frames. ], batch size: 102, lr: 2.57e-03, grad_scale: 32.0 2024-09-20 05:32:18,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=831301.3333333334, ans=0.125 2024-09-20 05:32:26,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=831348.0, ans=0.125 2024-09-20 05:32:50,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2024-09-20 05:33:05,258 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=831441.3333333334, ans=0.125 2024-09-20 05:33:13,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=831488.0, ans=0.125 2024-09-20 05:33:18,549 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=11.89 vs. limit=22.5 2024-09-20 05:33:22,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=831488.0, ans=0.125 2024-09-20 05:33:29,001 INFO [train.py:1198] (1/2) Epoch 46, batch 3750, loss[loss=0.2081, simple_loss=0.2712, pruned_loss=0.05277, ctc_loss=0.1175, cr_loss=0.3972, over 34310.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2613, pruned_loss=0.05222, ctc_loss=0.1135, cr_loss=0.3881, over 6786232.55 frames. ], batch size: 113, lr: 2.57e-03, grad_scale: 32.0 2024-09-20 05:33:50,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=831581.3333333334, ans=0.0 2024-09-20 05:33:58,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=831581.3333333334, ans=0.125 2024-09-20 05:34:17,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=831674.6666666666, ans=0.07 2024-09-20 05:34:20,680 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.005e+02 2.471e+02 2.641e+02 2.910e+02 4.724e+02, threshold=5.281e+02, percent-clipped=0.0 2024-09-20 05:34:35,701 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=831721.3333333334, ans=0.125 2024-09-20 05:34:43,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=831721.3333333334, ans=0.025 2024-09-20 05:34:48,490 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.const_attention_rate, batch_count=831768.0, ans=0.025 2024-09-20 05:34:49,736 INFO [train.py:1198] (1/2) Epoch 46, batch 3800, loss[loss=0.2192, simple_loss=0.2733, pruned_loss=0.06072, ctc_loss=0.1306, cr_loss=0.4354, over 29871.00 frames. ], tot_loss[loss=0.2049, simple_loss=0.2637, pruned_loss=0.05355, ctc_loss=0.116, cr_loss=0.3941, over 6673959.02 frames. ], batch size: 175, lr: 2.57e-03, grad_scale: 32.0 2024-09-20 05:35:12,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=831814.6666666666, ans=0.0 2024-09-20 05:35:16,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=831814.6666666666, ans=0.0 2024-09-20 05:35:27,836 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=831861.3333333334, ans=0.2 2024-09-20 05:35:47,489 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=831908.0, ans=0.2 2024-09-20 05:36:03,056 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.91 vs. limit=12.0 2024-09-20 05:36:13,386 INFO [train.py:1198] (1/2) Epoch 46, batch 3850, loss[loss=0.2149, simple_loss=0.2676, pruned_loss=0.05937, ctc_loss=0.1303, cr_loss=0.4333, over 23162.00 frames. ], tot_loss[loss=0.2078, simple_loss=0.2656, pruned_loss=0.05509, ctc_loss=0.1193, cr_loss=0.3987, over 6249148.11 frames. ], batch size: 244, lr: 2.57e-03, grad_scale: 32.0 2024-09-20 05:36:23,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=832001.3333333334, ans=6.0 2024-09-20 05:36:37,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=832048.0, ans=0.125 2024-09-20 05:36:37,399 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=832048.0, ans=0.0 2024-09-20 05:36:42,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=832048.0, ans=0.0 2024-09-20 05:37:39,197 INFO [train.py:1198] (1/2) Epoch 47, batch 0, loss[loss=0.1898, simple_loss=0.2488, pruned_loss=0.04733, ctc_loss=0.1076, cr_loss=0.3652, over 34442.00 frames. ], tot_loss[loss=0.1898, simple_loss=0.2488, pruned_loss=0.04733, ctc_loss=0.1076, cr_loss=0.3652, over 34442.00 frames. ], batch size: 85, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:37:39,198 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 05:37:55,977 INFO [train.py:1230] (1/2) Epoch 47, validation: loss=0.1488, simple_loss=0.2425, pruned_loss=0.02373, ctc_loss=0.03881, cr_loss=2.294e-14, over 944034.00 frames. 2024-09-20 05:37:55,978 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-20 05:38:04,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=832122.6666666666, ans=0.2 2024-09-20 05:38:05,720 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.207e+02 2.720e+02 2.858e+02 3.135e+02 7.452e+02, threshold=5.715e+02, percent-clipped=2.0 2024-09-20 05:38:23,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.80 vs. limit=15.0 2024-09-20 05:38:34,822 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2024-09-20 05:38:58,668 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=832262.6666666666, ans=0.0 2024-09-20 05:39:08,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=832309.3333333334, ans=0.95 2024-09-20 05:39:16,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=832356.0, ans=0.125 2024-09-20 05:39:18,131 INFO [train.py:1198] (1/2) Epoch 47, batch 50, loss[loss=0.1777, simple_loss=0.2367, pruned_loss=0.04308, ctc_loss=0.09679, cr_loss=0.331, over 34461.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2594, pruned_loss=0.05166, ctc_loss=0.112, cr_loss=0.3845, over 1480919.44 frames. ], batch size: 82, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:39:23,271 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=832356.0, ans=0.025 2024-09-20 05:39:48,665 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.13 vs. limit=15.0 2024-09-20 05:40:00,000 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.97 vs. limit=15.0 2024-09-20 05:40:04,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=832449.3333333334, ans=0.125 2024-09-20 05:40:04,201 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=832449.3333333334, ans=0.2 2024-09-20 05:40:17,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=832496.0, ans=0.1 2024-09-20 05:40:28,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=832542.6666666666, ans=0.125 2024-09-20 05:40:28,614 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=832542.6666666666, ans=0.09899494936611666 2024-09-20 05:40:41,367 INFO [train.py:1198] (1/2) Epoch 47, batch 100, loss[loss=0.1806, simple_loss=0.2394, pruned_loss=0.04418, ctc_loss=0.09701, cr_loss=0.3522, over 34589.00 frames. ], tot_loss[loss=0.202, simple_loss=0.2611, pruned_loss=0.05236, ctc_loss=0.1135, cr_loss=0.3879, over 2629422.57 frames. ], batch size: 89, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:40:51,033 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.067e+02 2.512e+02 2.851e+02 3.295e+02 6.063e+02, threshold=5.703e+02, percent-clipped=1.0 2024-09-20 05:41:34,260 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=832729.3333333334, ans=0.1 2024-09-20 05:41:45,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=832729.3333333334, ans=0.04949747468305833 2024-09-20 05:42:04,584 INFO [train.py:1198] (1/2) Epoch 47, batch 150, loss[loss=0.1742, simple_loss=0.2362, pruned_loss=0.04043, ctc_loss=0.08873, cr_loss=0.3434, over 34456.00 frames. ], tot_loss[loss=0.2003, simple_loss=0.2597, pruned_loss=0.05157, ctc_loss=0.112, cr_loss=0.385, over 3557487.11 frames. ], batch size: 82, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:42:17,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=832822.6666666666, ans=0.125 2024-09-20 05:42:21,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=832869.3333333334, ans=0.125 2024-09-20 05:42:53,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=832962.6666666666, ans=0.125 2024-09-20 05:43:08,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=833009.3333333334, ans=0.1 2024-09-20 05:43:08,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=833009.3333333334, ans=0.025 2024-09-20 05:43:21,985 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=12.0 2024-09-20 05:43:24,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=833056.0, ans=0.125 2024-09-20 05:43:26,252 INFO [train.py:1198] (1/2) Epoch 47, batch 200, loss[loss=0.2138, simple_loss=0.2739, pruned_loss=0.05652, ctc_loss=0.1226, cr_loss=0.4025, over 32156.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2591, pruned_loss=0.05131, ctc_loss=0.1117, cr_loss=0.3842, over 4273019.22 frames. ], batch size: 146, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:43:34,865 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=833056.0, ans=0.125 2024-09-20 05:43:36,145 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.210e+02 2.561e+02 3.219e+02 4.776e+02 6.765e+02, threshold=6.438e+02, percent-clipped=9.0 2024-09-20 05:43:49,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=833102.6666666666, ans=0.1 2024-09-20 05:44:04,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=833149.3333333334, ans=0.125 2024-09-20 05:44:16,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=833196.0, ans=0.125 2024-09-20 05:44:36,715 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=22.5 2024-09-20 05:44:52,630 INFO [train.py:1198] (1/2) Epoch 47, batch 250, loss[loss=0.2149, simple_loss=0.2729, pruned_loss=0.05768, ctc_loss=0.1243, cr_loss=0.4159, over 34227.00 frames. ], tot_loss[loss=0.1996, simple_loss=0.259, pruned_loss=0.05131, ctc_loss=0.1115, cr_loss=0.384, over 4836180.02 frames. ], batch size: 117, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:44:56,096 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=833289.3333333334, ans=0.0 2024-09-20 05:45:30,643 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=833382.6666666666, ans=0.125 2024-09-20 05:45:37,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=833382.6666666666, ans=0.125 2024-09-20 05:45:48,679 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=833429.3333333334, ans=0.1 2024-09-20 05:45:56,353 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.51 vs. limit=12.0 2024-09-20 05:45:57,479 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=833476.0, ans=0.125 2024-09-20 05:45:57,721 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2024-09-20 05:46:02,603 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.63 vs. limit=22.5 2024-09-20 05:46:03,156 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=6.15 vs. limit=15.0 2024-09-20 05:46:05,914 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.67 vs. limit=15.0 2024-09-20 05:46:08,774 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=833476.0, ans=0.0 2024-09-20 05:46:15,077 INFO [train.py:1198] (1/2) Epoch 47, batch 300, loss[loss=0.214, simple_loss=0.2712, pruned_loss=0.05771, ctc_loss=0.123, cr_loss=0.4172, over 34365.00 frames. ], tot_loss[loss=0.1996, simple_loss=0.2588, pruned_loss=0.05135, ctc_loss=0.1115, cr_loss=0.384, over 5263318.75 frames. ], batch size: 107, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:46:18,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=833522.6666666666, ans=0.0 2024-09-20 05:46:23,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=833522.6666666666, ans=0.125 2024-09-20 05:46:25,014 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.151e+02 2.575e+02 3.057e+02 3.581e+02 8.148e+02, threshold=6.113e+02, percent-clipped=3.0 2024-09-20 05:46:33,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=833569.3333333334, ans=0.2 2024-09-20 05:46:34,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=833569.3333333334, ans=0.2 2024-09-20 05:47:01,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=833616.0, ans=0.2 2024-09-20 05:47:03,292 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.10 vs. limit=22.5 2024-09-20 05:47:10,873 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=833662.6666666666, ans=0.05 2024-09-20 05:47:22,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=833709.3333333334, ans=0.125 2024-09-20 05:47:25,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=833709.3333333334, ans=0.125 2024-09-20 05:47:36,784 INFO [train.py:1198] (1/2) Epoch 47, batch 350, loss[loss=0.1898, simple_loss=0.2491, pruned_loss=0.0477, ctc_loss=0.1041, cr_loss=0.3556, over 34296.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.2589, pruned_loss=0.05144, ctc_loss=0.1117, cr_loss=0.3841, over 5599323.38 frames. ], batch size: 83, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:47:37,093 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=833756.0, ans=0.1 2024-09-20 05:47:49,220 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=833756.0, ans=0.0 2024-09-20 05:48:08,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=833802.6666666666, ans=0.0 2024-09-20 05:48:29,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=833896.0, ans=0.09899494936611666 2024-09-20 05:48:42,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.43 vs. limit=15.0 2024-09-20 05:48:47,345 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.46 vs. limit=15.0 2024-09-20 05:49:02,492 INFO [train.py:1198] (1/2) Epoch 47, batch 400, loss[loss=0.2047, simple_loss=0.2685, pruned_loss=0.05158, ctc_loss=0.1095, cr_loss=0.3952, over 34439.00 frames. ], tot_loss[loss=0.1993, simple_loss=0.2587, pruned_loss=0.05122, ctc_loss=0.1112, cr_loss=0.3825, over 5866353.10 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:49:04,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=833989.3333333334, ans=0.025 2024-09-20 05:49:12,351 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.231e+02 2.434e+02 2.725e+02 3.186e+02 5.905e+02, threshold=5.449e+02, percent-clipped=0.0 2024-09-20 05:49:18,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.45 vs. limit=8.0 2024-09-20 05:49:19,457 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:49:19,569 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834036.0, ans=0.1 2024-09-20 05:49:26,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.min_positive, batch_count=834036.0, ans=0.05 2024-09-20 05:49:45,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=834082.6666666666, ans=0.0 2024-09-20 05:49:49,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=834082.6666666666, ans=0.04949747468305833 2024-09-20 05:50:02,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=834129.3333333334, ans=0.0 2024-09-20 05:50:17,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=834176.0, ans=0.1 2024-09-20 05:50:25,261 INFO [train.py:1198] (1/2) Epoch 47, batch 450, loss[loss=0.2122, simple_loss=0.2696, pruned_loss=0.05691, ctc_loss=0.1216, cr_loss=0.4155, over 34713.00 frames. ], tot_loss[loss=0.1996, simple_loss=0.2588, pruned_loss=0.05137, ctc_loss=0.1115, cr_loss=0.3833, over 6056017.95 frames. ], batch size: 97, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:50:28,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=834222.6666666666, ans=0.125 2024-09-20 05:50:33,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=834222.6666666666, ans=0.0 2024-09-20 05:50:37,195 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=834222.6666666666, ans=0.1 2024-09-20 05:50:42,218 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=5.10 vs. limit=15.0 2024-09-20 05:50:44,465 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.90 vs. limit=10.0 2024-09-20 05:50:54,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=834269.3333333334, ans=0.0 2024-09-20 05:51:33,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=834409.3333333334, ans=0.125 2024-09-20 05:51:49,392 INFO [train.py:1198] (1/2) Epoch 47, batch 500, loss[loss=0.207, simple_loss=0.2673, pruned_loss=0.05377, ctc_loss=0.1157, cr_loss=0.4031, over 34444.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.2577, pruned_loss=0.05092, ctc_loss=0.1107, cr_loss=0.3811, over 6222353.53 frames. ], batch size: 110, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:51:59,125 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.161e+02 2.455e+02 2.780e+02 3.463e+02 6.015e+02, threshold=5.559e+02, percent-clipped=3.0 2024-09-20 05:52:00,344 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.32 vs. limit=22.5 2024-09-20 05:52:47,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=834596.0, ans=0.125 2024-09-20 05:52:58,022 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2024-09-20 05:53:13,837 INFO [train.py:1198] (1/2) Epoch 47, batch 550, loss[loss=0.2106, simple_loss=0.2756, pruned_loss=0.05336, ctc_loss=0.1175, cr_loss=0.3826, over 33893.00 frames. ], tot_loss[loss=0.1986, simple_loss=0.2579, pruned_loss=0.05095, ctc_loss=0.1109, cr_loss=0.3814, over 6331332.54 frames. ], batch size: 122, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:53:58,670 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.31 vs. limit=22.5 2024-09-20 05:54:06,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=834829.3333333334, ans=0.0 2024-09-20 05:54:21,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.20 vs. limit=22.5 2024-09-20 05:54:23,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=834876.0, ans=0.125 2024-09-20 05:54:26,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=834876.0, ans=0.2 2024-09-20 05:54:31,024 INFO [scaling.py:801] (1/2) Caught exception in Balancer backward: CUDA out of memory. Tried to allocate 3.78 GiB. GPU 1 has a total capacity of 79.17 GiB of which 3.65 GiB is free. Process 39810 has 75.52 GiB memory in use. Of the allocated memory 29.36 GiB is allocated by PyTorch, and 43.77 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables), size=[234, 384, 594, 19], will continue. 2024-09-20 05:54:36,077 INFO [train.py:1198] (1/2) Epoch 47, batch 600, loss[loss=0.2123, simple_loss=0.2737, pruned_loss=0.05608, ctc_loss=0.1172, cr_loss=0.3842, over 34312.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2583, pruned_loss=0.05115, ctc_loss=0.1111, cr_loss=0.3822, over 6433163.45 frames. ], batch size: 117, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:54:45,888 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.091e+02 2.638e+02 2.997e+02 3.807e+02 1.115e+03, threshold=5.993e+02, percent-clipped=3.0 2024-09-20 05:54:46,171 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=834922.6666666666, ans=0.0 2024-09-20 05:55:38,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=835062.6666666666, ans=0.2 2024-09-20 05:55:59,523 INFO [train.py:1198] (1/2) Epoch 47, batch 650, loss[loss=0.2022, simple_loss=0.262, pruned_loss=0.05189, ctc_loss=0.1143, cr_loss=0.3944, over 34508.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2576, pruned_loss=0.05078, ctc_loss=0.1106, cr_loss=0.3811, over 6524177.59 frames. ], batch size: 94, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:56:03,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=835156.0, ans=0.1 2024-09-20 05:56:23,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=835202.6666666666, ans=0.05 2024-09-20 05:56:28,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=835202.6666666666, ans=0.125 2024-09-20 05:56:59,979 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.50 vs. limit=15.0 2024-09-20 05:57:01,225 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=835296.0, ans=0.125 2024-09-20 05:57:24,162 INFO [train.py:1198] (1/2) Epoch 47, batch 700, loss[loss=0.1909, simple_loss=0.2504, pruned_loss=0.04793, ctc_loss=0.1053, cr_loss=0.3641, over 34586.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.2582, pruned_loss=0.051, ctc_loss=0.1109, cr_loss=0.3824, over 6580617.98 frames. ], batch size: 89, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:57:27,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=835389.3333333334, ans=0.125 2024-09-20 05:57:34,036 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.156e+02 2.539e+02 3.015e+02 4.007e+02 9.175e+02, threshold=6.031e+02, percent-clipped=7.0 2024-09-20 05:57:45,824 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=835436.0, ans=0.2 2024-09-20 05:57:51,561 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.40 vs. limit=15.0 2024-09-20 05:57:55,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=835482.6666666666, ans=0.125 2024-09-20 05:58:04,559 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.61 vs. limit=12.0 2024-09-20 05:58:14,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=835529.3333333334, ans=0.0 2024-09-20 05:58:17,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=835529.3333333334, ans=0.125 2024-09-20 05:58:43,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=835576.0, ans=0.0 2024-09-20 05:58:46,517 INFO [train.py:1198] (1/2) Epoch 47, batch 750, loss[loss=0.2084, simple_loss=0.2676, pruned_loss=0.05482, ctc_loss=0.117, cr_loss=0.4051, over 34379.00 frames. ], tot_loss[loss=0.1987, simple_loss=0.258, pruned_loss=0.05098, ctc_loss=0.1109, cr_loss=0.3817, over 6622234.15 frames. ], batch size: 95, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 05:58:48,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=835622.6666666666, ans=0.0 2024-09-20 05:59:06,430 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=835669.3333333334, ans=0.125 2024-09-20 05:59:10,585 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.33 vs. limit=22.5 2024-09-20 05:59:21,121 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=835716.0, ans=0.125 2024-09-20 05:59:36,285 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 05:59:39,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=835762.6666666666, ans=0.125 2024-09-20 05:59:54,894 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2024-09-20 06:00:01,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=835809.3333333334, ans=0.0 2024-09-20 06:00:05,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=835809.3333333334, ans=0.125 2024-09-20 06:00:10,426 INFO [train.py:1198] (1/2) Epoch 47, batch 800, loss[loss=0.1708, simple_loss=0.2315, pruned_loss=0.03945, ctc_loss=0.09192, cr_loss=0.3188, over 34479.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2577, pruned_loss=0.05079, ctc_loss=0.1104, cr_loss=0.3806, over 6658218.07 frames. ], batch size: 85, lr: 2.54e-03, grad_scale: 32.0 2024-09-20 06:00:12,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=835856.0, ans=0.125 2024-09-20 06:00:18,107 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=835856.0, ans=0.025 2024-09-20 06:00:22,495 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.230e+02 2.551e+02 2.947e+02 3.444e+02 5.971e+02, threshold=5.894e+02, percent-clipped=0.0 2024-09-20 06:00:50,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=835949.3333333334, ans=0.0 2024-09-20 06:01:05,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=835996.0, ans=0.1 2024-09-20 06:01:33,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=836089.3333333334, ans=0.125 2024-09-20 06:01:33,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.38 vs. limit=15.0 2024-09-20 06:01:34,287 INFO [train.py:1198] (1/2) Epoch 47, batch 850, loss[loss=0.2163, simple_loss=0.2779, pruned_loss=0.05692, ctc_loss=0.1228, cr_loss=0.4087, over 34377.00 frames. ], tot_loss[loss=0.1984, simple_loss=0.2577, pruned_loss=0.05086, ctc_loss=0.1105, cr_loss=0.3809, over 6690473.94 frames. ], batch size: 103, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:01:45,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=836089.3333333334, ans=0.2 2024-09-20 06:02:02,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=836136.0, ans=0.0 2024-09-20 06:02:11,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=836182.6666666666, ans=0.125 2024-09-20 06:02:35,075 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=836229.3333333334, ans=0.0 2024-09-20 06:02:58,574 INFO [train.py:1198] (1/2) Epoch 47, batch 900, loss[loss=0.1732, simple_loss=0.2357, pruned_loss=0.03971, ctc_loss=0.09083, cr_loss=0.326, over 34449.00 frames. ], tot_loss[loss=0.1989, simple_loss=0.2582, pruned_loss=0.05106, ctc_loss=0.1109, cr_loss=0.3818, over 6696268.05 frames. ], batch size: 85, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:03:05,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=836322.6666666666, ans=0.125 2024-09-20 06:03:08,221 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.654e+02 3.181e+02 3.768e+02 6.740e+02, threshold=6.363e+02, percent-clipped=2.0 2024-09-20 06:03:29,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=836416.0, ans=10.0 2024-09-20 06:03:30,242 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.70 vs. limit=15.0 2024-09-20 06:03:30,299 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.35 vs. limit=15.0 2024-09-20 06:03:41,741 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=7.763e-03 2024-09-20 06:03:42,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=17.75 vs. limit=22.5 2024-09-20 06:03:57,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=836462.6666666666, ans=15.0 2024-09-20 06:04:12,636 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.23 vs. limit=15.0 2024-09-20 06:04:15,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.95 vs. limit=6.0 2024-09-20 06:04:22,852 INFO [train.py:1198] (1/2) Epoch 47, batch 950, loss[loss=0.1942, simple_loss=0.2524, pruned_loss=0.04941, ctc_loss=0.1091, cr_loss=0.3823, over 34712.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2585, pruned_loss=0.05113, ctc_loss=0.1111, cr_loss=0.382, over 6699854.80 frames. ], batch size: 87, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:04:41,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=836602.6666666666, ans=0.1 2024-09-20 06:05:15,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=836696.0, ans=0.025 2024-09-20 06:05:16,081 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:05:33,972 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:05:44,915 INFO [train.py:1198] (1/2) Epoch 47, batch 1000, loss[loss=0.1941, simple_loss=0.2545, pruned_loss=0.04868, ctc_loss=0.1072, cr_loss=0.3734, over 34483.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2592, pruned_loss=0.05154, ctc_loss=0.1119, cr_loss=0.3834, over 6690236.62 frames. ], batch size: 90, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:05:48,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.23 vs. limit=15.0 2024-09-20 06:05:54,875 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.142e+02 2.608e+02 2.943e+02 3.771e+02 5.602e+02, threshold=5.887e+02, percent-clipped=0.0 2024-09-20 06:05:56,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=836789.3333333334, ans=0.125 2024-09-20 06:06:08,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=836836.0, ans=0.0 2024-09-20 06:06:49,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=836929.3333333334, ans=0.125 2024-09-20 06:07:01,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=836976.0, ans=0.125 2024-09-20 06:07:02,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=836976.0, ans=0.1 2024-09-20 06:07:09,127 INFO [train.py:1198] (1/2) Epoch 47, batch 1050, loss[loss=0.2069, simple_loss=0.2714, pruned_loss=0.05265, ctc_loss=0.1116, cr_loss=0.3683, over 34572.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2584, pruned_loss=0.0512, ctc_loss=0.1112, cr_loss=0.3815, over 6701468.59 frames. ], batch size: 99, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:07:14,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=837022.6666666666, ans=0.0 2024-09-20 06:07:19,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=837022.6666666666, ans=0.2 2024-09-20 06:07:19,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=837022.6666666666, ans=0.2 2024-09-20 06:07:37,987 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=5.66 vs. limit=10.0 2024-09-20 06:07:54,482 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.60 vs. limit=6.0 2024-09-20 06:07:55,838 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=837116.0, ans=0.0 2024-09-20 06:07:55,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=837116.0, ans=0.125 2024-09-20 06:08:05,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=837162.6666666666, ans=0.125 2024-09-20 06:08:10,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=837162.6666666666, ans=0.1 2024-09-20 06:08:33,493 INFO [train.py:1198] (1/2) Epoch 47, batch 1100, loss[loss=0.2149, simple_loss=0.272, pruned_loss=0.05832, ctc_loss=0.1242, cr_loss=0.4061, over 34376.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2583, pruned_loss=0.05117, ctc_loss=0.1112, cr_loss=0.382, over 6714443.87 frames. ], batch size: 91, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:08:41,919 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:08:41,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=837256.0, ans=0.125 2024-09-20 06:08:43,214 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.106e+02 2.577e+02 2.945e+02 4.023e+02 5.955e+02, threshold=5.890e+02, percent-clipped=1.0 2024-09-20 06:08:56,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=837302.6666666666, ans=0.5 2024-09-20 06:09:47,592 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=837442.6666666666, ans=0.1 2024-09-20 06:09:49,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=837442.6666666666, ans=0.035 2024-09-20 06:09:55,592 INFO [train.py:1198] (1/2) Epoch 47, batch 1150, loss[loss=0.1979, simple_loss=0.2589, pruned_loss=0.04996, ctc_loss=0.1088, cr_loss=0.3816, over 34349.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2582, pruned_loss=0.05119, ctc_loss=0.1112, cr_loss=0.3822, over 6713717.15 frames. ], batch size: 91, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:10:00,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=837489.3333333334, ans=0.0 2024-09-20 06:10:13,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=837536.0, ans=0.2 2024-09-20 06:10:27,313 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=837536.0, ans=0.0 2024-09-20 06:10:33,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=837582.6666666666, ans=0.025 2024-09-20 06:10:38,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=837582.6666666666, ans=0.0 2024-09-20 06:11:19,395 INFO [train.py:1198] (1/2) Epoch 47, batch 1200, loss[loss=0.2001, simple_loss=0.262, pruned_loss=0.05033, ctc_loss=0.1101, cr_loss=0.3855, over 34557.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.2591, pruned_loss=0.0515, ctc_loss=0.1117, cr_loss=0.3834, over 6706206.66 frames. ], batch size: 99, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:11:22,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=837722.6666666666, ans=0.125 2024-09-20 06:11:29,156 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.157e+02 2.465e+02 2.801e+02 3.194e+02 4.730e+02, threshold=5.602e+02, percent-clipped=0.0 2024-09-20 06:11:59,751 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=9.178e-02 2024-09-20 06:12:01,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=837816.0, ans=0.07 2024-09-20 06:12:14,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=837862.6666666666, ans=0.0 2024-09-20 06:12:17,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=837862.6666666666, ans=0.0 2024-09-20 06:12:24,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=837862.6666666666, ans=0.04949747468305833 2024-09-20 06:12:24,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=837862.6666666666, ans=0.0 2024-09-20 06:12:29,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.57 vs. limit=15.0 2024-09-20 06:12:43,322 INFO [train.py:1198] (1/2) Epoch 47, batch 1250, loss[loss=0.2146, simple_loss=0.2735, pruned_loss=0.05741, ctc_loss=0.1221, cr_loss=0.4097, over 34368.00 frames. ], tot_loss[loss=0.2005, simple_loss=0.2597, pruned_loss=0.05168, ctc_loss=0.1121, cr_loss=0.3848, over 6740564.34 frames. ], batch size: 107, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:13:21,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=838049.3333333334, ans=0.125 2024-09-20 06:13:21,774 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2024-09-20 06:13:42,863 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=838096.0, ans=0.125 2024-09-20 06:13:45,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.65 vs. limit=15.0 2024-09-20 06:14:06,186 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=838189.3333333334, ans=0.125 2024-09-20 06:14:07,494 INFO [train.py:1198] (1/2) Epoch 47, batch 1300, loss[loss=0.2041, simple_loss=0.2657, pruned_loss=0.05194, ctc_loss=0.1144, cr_loss=0.3916, over 33077.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2589, pruned_loss=0.05144, ctc_loss=0.1115, cr_loss=0.3837, over 6743533.17 frames. ], batch size: 130, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:14:08,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2024-09-20 06:14:09,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=838189.3333333334, ans=0.0 2024-09-20 06:14:17,232 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.255e+02 2.662e+02 3.197e+02 3.748e+02 6.281e+02, threshold=6.394e+02, percent-clipped=1.0 2024-09-20 06:14:23,121 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.39 vs. limit=22.5 2024-09-20 06:15:31,491 INFO [train.py:1198] (1/2) Epoch 47, batch 1350, loss[loss=0.2017, simple_loss=0.2605, pruned_loss=0.05265, ctc_loss=0.1116, cr_loss=0.3823, over 34518.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2585, pruned_loss=0.05117, ctc_loss=0.1111, cr_loss=0.3827, over 6763680.69 frames. ], batch size: 94, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:15:35,767 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-20 06:15:41,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=838422.6666666666, ans=0.025 2024-09-20 06:16:00,630 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=838469.3333333334, ans=0.0 2024-09-20 06:16:05,682 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=838516.0, ans=0.125 2024-09-20 06:16:45,616 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=22.5 2024-09-20 06:16:48,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=838609.3333333334, ans=0.125 2024-09-20 06:16:53,093 INFO [train.py:1198] (1/2) Epoch 47, batch 1400, loss[loss=0.1828, simple_loss=0.2391, pruned_loss=0.04561, ctc_loss=0.1037, cr_loss=0.3645, over 34298.00 frames. ], tot_loss[loss=0.1986, simple_loss=0.258, pruned_loss=0.0509, ctc_loss=0.1107, cr_loss=0.3819, over 6776134.97 frames. ], batch size: 80, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:17:04,677 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.308e+02 2.696e+02 3.163e+02 4.310e+02 7.041e+02, threshold=6.326e+02, percent-clipped=3.0 2024-09-20 06:18:02,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=838842.6666666666, ans=0.125 2024-09-20 06:18:16,967 INFO [train.py:1198] (1/2) Epoch 47, batch 1450, loss[loss=0.2218, simple_loss=0.2766, pruned_loss=0.06203, ctc_loss=0.1274, cr_loss=0.4351, over 34481.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2587, pruned_loss=0.05105, ctc_loss=0.1109, cr_loss=0.3825, over 6772814.22 frames. ], batch size: 110, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:18:25,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=838889.3333333334, ans=0.05 2024-09-20 06:18:30,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=838889.3333333334, ans=0.0 2024-09-20 06:19:19,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=839029.3333333334, ans=0.1 2024-09-20 06:19:26,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=839076.0, ans=0.0 2024-09-20 06:19:31,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=839076.0, ans=0.125 2024-09-20 06:19:41,290 INFO [train.py:1198] (1/2) Epoch 47, batch 1500, loss[loss=0.2092, simple_loss=0.2703, pruned_loss=0.05431, ctc_loss=0.1164, cr_loss=0.4026, over 34463.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2591, pruned_loss=0.05136, ctc_loss=0.1116, cr_loss=0.384, over 6773879.26 frames. ], batch size: 100, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:19:49,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=839122.6666666666, ans=0.125 2024-09-20 06:19:52,730 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.197e+02 2.531e+02 2.817e+02 3.263e+02 4.518e+02, threshold=5.634e+02, percent-clipped=0.0 2024-09-20 06:19:56,536 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=839169.3333333334, ans=0.1 2024-09-20 06:20:00,600 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.25 vs. limit=12.0 2024-09-20 06:20:19,510 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=839216.0, ans=0.125 2024-09-20 06:20:23,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2024-09-20 06:20:37,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=839262.6666666666, ans=0.0 2024-09-20 06:21:03,517 INFO [train.py:1198] (1/2) Epoch 47, batch 1550, loss[loss=0.224, simple_loss=0.2792, pruned_loss=0.06297, ctc_loss=0.1325, cr_loss=0.4113, over 34421.00 frames. ], tot_loss[loss=0.2003, simple_loss=0.2594, pruned_loss=0.0517, ctc_loss=0.1122, cr_loss=0.3849, over 6745105.13 frames. ], batch size: 105, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:21:41,572 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=839449.3333333334, ans=0.125 2024-09-20 06:22:05,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=839496.0, ans=0.0 2024-09-20 06:22:21,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=839542.6666666666, ans=0.0 2024-09-20 06:22:27,811 INFO [train.py:1198] (1/2) Epoch 47, batch 1600, loss[loss=0.2027, simple_loss=0.2678, pruned_loss=0.05002, ctc_loss=0.1101, cr_loss=0.3914, over 34567.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2591, pruned_loss=0.0516, ctc_loss=0.112, cr_loss=0.3841, over 6725933.20 frames. ], batch size: 99, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:22:39,391 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.143e+02 2.667e+02 3.138e+02 3.843e+02 6.701e+02, threshold=6.275e+02, percent-clipped=3.0 2024-09-20 06:22:47,862 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=839636.0, ans=0.125 2024-09-20 06:22:58,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=839636.0, ans=0.2 2024-09-20 06:23:33,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=839729.3333333334, ans=0.125 2024-09-20 06:23:33,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=839729.3333333334, ans=0.025 2024-09-20 06:23:34,690 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=839776.0, ans=0.1 2024-09-20 06:23:50,839 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=839822.6666666666, ans=0.015 2024-09-20 06:23:52,300 INFO [train.py:1198] (1/2) Epoch 47, batch 1650, loss[loss=0.205, simple_loss=0.2721, pruned_loss=0.0502, ctc_loss=0.1107, cr_loss=0.3863, over 34400.00 frames. ], tot_loss[loss=0.1995, simple_loss=0.2587, pruned_loss=0.05133, ctc_loss=0.1114, cr_loss=0.3827, over 6718164.78 frames. ], batch size: 103, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:23:52,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=839822.6666666666, ans=0.1 2024-09-20 06:23:52,601 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=839822.6666666666, ans=0.025 2024-09-20 06:23:54,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=839822.6666666666, ans=0.125 2024-09-20 06:23:54,509 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=839822.6666666666, ans=0.125 2024-09-20 06:24:13,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=839869.3333333334, ans=0.0 2024-09-20 06:24:19,458 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.76 vs. limit=12.0 2024-09-20 06:25:13,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=840009.3333333334, ans=0.125 2024-09-20 06:25:22,539 INFO [train.py:1198] (1/2) Epoch 47, batch 1700, loss[loss=0.1711, simple_loss=0.2323, pruned_loss=0.03945, ctc_loss=0.0894, cr_loss=0.3266, over 34297.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2586, pruned_loss=0.05111, ctc_loss=0.1112, cr_loss=0.3829, over 6743794.33 frames. ], batch size: 80, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:25:34,189 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.284e+02 2.591e+02 3.096e+02 4.061e+02 7.745e+02, threshold=6.192e+02, percent-clipped=1.0 2024-09-20 06:25:34,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=840056.0, ans=0.2 2024-09-20 06:25:50,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=840102.6666666666, ans=0.125 2024-09-20 06:26:25,058 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=840196.0, ans=0.125 2024-09-20 06:26:44,544 INFO [train.py:1198] (1/2) Epoch 47, batch 1750, loss[loss=0.1689, simple_loss=0.2268, pruned_loss=0.03985, ctc_loss=0.09028, cr_loss=0.3315, over 34129.00 frames. ], tot_loss[loss=0.1989, simple_loss=0.2582, pruned_loss=0.05104, ctc_loss=0.111, cr_loss=0.3823, over 6752329.90 frames. ], batch size: 78, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:26:51,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=840289.3333333334, ans=0.125 2024-09-20 06:27:05,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=840336.0, ans=0.125 2024-09-20 06:27:16,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=840336.0, ans=0.2 2024-09-20 06:27:39,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=840429.3333333334, ans=0.025 2024-09-20 06:27:49,562 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:28:08,516 INFO [train.py:1198] (1/2) Epoch 47, batch 1800, loss[loss=0.2066, simple_loss=0.2675, pruned_loss=0.05356, ctc_loss=0.1143, cr_loss=0.3919, over 34709.00 frames. ], tot_loss[loss=0.1995, simple_loss=0.2588, pruned_loss=0.05127, ctc_loss=0.1115, cr_loss=0.3836, over 6755700.21 frames. ], batch size: 97, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:28:12,709 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.00 vs. limit=10.0 2024-09-20 06:28:20,048 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.073e+02 2.527e+02 3.070e+02 3.679e+02 5.855e+02, threshold=6.139e+02, percent-clipped=0.0 2024-09-20 06:29:21,647 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=840709.3333333334, ans=0.0 2024-09-20 06:29:32,923 INFO [train.py:1198] (1/2) Epoch 47, batch 1850, loss[loss=0.2147, simple_loss=0.2742, pruned_loss=0.05729, ctc_loss=0.1228, cr_loss=0.4015, over 34465.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2584, pruned_loss=0.05125, ctc_loss=0.1114, cr_loss=0.3837, over 6762105.85 frames. ], batch size: 100, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:29:40,653 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.19 vs. limit=10.0 2024-09-20 06:29:42,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=840756.0, ans=0.125 2024-09-20 06:30:12,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=840849.3333333334, ans=0.0 2024-09-20 06:30:37,748 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=840942.6666666666, ans=0.0 2024-09-20 06:30:53,783 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=12.0 2024-09-20 06:30:57,438 INFO [train.py:1198] (1/2) Epoch 47, batch 1900, loss[loss=0.1959, simple_loss=0.264, pruned_loss=0.04649, ctc_loss=0.1033, cr_loss=0.3542, over 34375.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.259, pruned_loss=0.05134, ctc_loss=0.1116, cr_loss=0.3841, over 6771958.65 frames. ], batch size: 103, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:31:08,812 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.251e+02 2.772e+02 3.342e+02 4.026e+02 7.257e+02, threshold=6.683e+02, percent-clipped=6.0 2024-09-20 06:31:12,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=841036.0, ans=0.09899494936611666 2024-09-20 06:31:14,085 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=841036.0, ans=0.125 2024-09-20 06:31:38,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=841082.6666666666, ans=0.125 2024-09-20 06:31:43,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=841082.6666666666, ans=0.125 2024-09-20 06:31:59,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=841129.3333333334, ans=0.035 2024-09-20 06:31:59,890 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=841129.3333333334, ans=0.1 2024-09-20 06:32:06,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=841176.0, ans=0.2 2024-09-20 06:32:18,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.01 vs. limit=15.0 2024-09-20 06:32:19,033 INFO [train.py:1198] (1/2) Epoch 47, batch 1950, loss[loss=0.2, simple_loss=0.2542, pruned_loss=0.05309, ctc_loss=0.1173, cr_loss=0.4057, over 34381.00 frames. ], tot_loss[loss=0.2006, simple_loss=0.26, pruned_loss=0.05163, ctc_loss=0.1123, cr_loss=0.3858, over 6788965.99 frames. ], batch size: 91, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:32:40,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=841269.3333333334, ans=0.09899494936611666 2024-09-20 06:33:13,007 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.44 vs. limit=8.0 2024-09-20 06:33:18,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=841362.6666666666, ans=0.125 2024-09-20 06:33:42,964 INFO [train.py:1198] (1/2) Epoch 47, batch 2000, loss[loss=0.1818, simple_loss=0.2374, pruned_loss=0.04576, ctc_loss=0.1002, cr_loss=0.3681, over 34176.00 frames. ], tot_loss[loss=0.2008, simple_loss=0.2602, pruned_loss=0.05167, ctc_loss=0.1125, cr_loss=0.3862, over 6764284.81 frames. ], batch size: 78, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:33:54,615 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.204e+02 2.562e+02 2.849e+02 3.334e+02 5.303e+02, threshold=5.698e+02, percent-clipped=0.0 2024-09-20 06:34:08,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=841502.6666666666, ans=0.0 2024-09-20 06:34:13,359 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=841502.6666666666, ans=0.125 2024-09-20 06:34:22,575 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=5.64 vs. limit=15.0 2024-09-20 06:34:43,328 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=841596.0, ans=0.07 2024-09-20 06:35:07,758 INFO [train.py:1198] (1/2) Epoch 47, batch 2050, loss[loss=0.1746, simple_loss=0.2335, pruned_loss=0.04133, ctc_loss=0.09565, cr_loss=0.3457, over 34480.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.2591, pruned_loss=0.05132, ctc_loss=0.1118, cr_loss=0.3845, over 6754807.11 frames. ], batch size: 82, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:35:19,150 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=9.24 vs. limit=15.0 2024-09-20 06:35:26,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=841736.0, ans=0.0 2024-09-20 06:35:47,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=841782.6666666666, ans=0.05 2024-09-20 06:35:55,686 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:35:57,483 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=841829.3333333334, ans=0.125 2024-09-20 06:36:17,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=841876.0, ans=0.1 2024-09-20 06:36:25,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=841876.0, ans=0.125 2024-09-20 06:36:32,013 INFO [train.py:1198] (1/2) Epoch 47, batch 2100, loss[loss=0.2011, simple_loss=0.2609, pruned_loss=0.05205, ctc_loss=0.1118, cr_loss=0.3683, over 34531.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2587, pruned_loss=0.0511, ctc_loss=0.1114, cr_loss=0.3835, over 6768821.44 frames. ], batch size: 94, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:36:32,281 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=841922.6666666666, ans=0.2 2024-09-20 06:36:43,343 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.156e+02 2.542e+02 2.940e+02 3.801e+02 7.709e+02, threshold=5.879e+02, percent-clipped=6.0 2024-09-20 06:37:02,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=15.0 2024-09-20 06:37:05,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.29 vs. limit=6.0 2024-09-20 06:37:09,887 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=842016.0, ans=0.125 2024-09-20 06:37:19,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=842062.6666666666, ans=0.0 2024-09-20 06:37:35,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=842109.3333333334, ans=0.025 2024-09-20 06:37:47,292 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=842109.3333333334, ans=0.025 2024-09-20 06:37:53,746 INFO [train.py:1198] (1/2) Epoch 47, batch 2150, loss[loss=0.1994, simple_loss=0.2576, pruned_loss=0.05159, ctc_loss=0.1132, cr_loss=0.3836, over 34362.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2585, pruned_loss=0.05098, ctc_loss=0.111, cr_loss=0.3825, over 6787814.30 frames. ], batch size: 91, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:38:11,267 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.87 vs. limit=12.0 2024-09-20 06:38:27,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=842249.3333333334, ans=0.1 2024-09-20 06:38:27,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=842249.3333333334, ans=0.125 2024-09-20 06:38:38,784 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.30 vs. limit=8.0 2024-09-20 06:38:39,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=842249.3333333334, ans=0.0 2024-09-20 06:39:17,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=842389.3333333334, ans=0.1 2024-09-20 06:39:18,343 INFO [train.py:1198] (1/2) Epoch 47, batch 2200, loss[loss=0.1994, simple_loss=0.2625, pruned_loss=0.05004, ctc_loss=0.1078, cr_loss=0.3657, over 34425.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2586, pruned_loss=0.05106, ctc_loss=0.111, cr_loss=0.3822, over 6783661.39 frames. ], batch size: 100, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:39:22,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=842389.3333333334, ans=0.125 2024-09-20 06:39:29,722 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.209e+02 2.661e+02 3.196e+02 4.057e+02 5.893e+02, threshold=6.391e+02, percent-clipped=0.0 2024-09-20 06:39:39,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=842436.0, ans=0.125 2024-09-20 06:40:42,243 INFO [train.py:1198] (1/2) Epoch 47, batch 2250, loss[loss=0.2125, simple_loss=0.2716, pruned_loss=0.0565, ctc_loss=0.12, cr_loss=0.4121, over 34407.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2586, pruned_loss=0.05113, ctc_loss=0.1113, cr_loss=0.3829, over 6780325.15 frames. ], batch size: 95, lr: 2.53e-03, grad_scale: 32.0 2024-09-20 06:40:42,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=842622.6666666666, ans=0.125 2024-09-20 06:40:44,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.62 vs. limit=15.0 2024-09-20 06:41:25,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=842716.0, ans=0.125 2024-09-20 06:41:35,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=842762.6666666666, ans=0.0 2024-09-20 06:41:35,139 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=842762.6666666666, ans=0.025 2024-09-20 06:41:46,112 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2024-09-20 06:42:04,787 INFO [train.py:1198] (1/2) Epoch 47, batch 2300, loss[loss=0.1746, simple_loss=0.2378, pruned_loss=0.03982, ctc_loss=0.09238, cr_loss=0.3336, over 34255.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2576, pruned_loss=0.05083, ctc_loss=0.1107, cr_loss=0.3819, over 6765073.51 frames. ], batch size: 83, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 06:42:18,304 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.125e+02 2.602e+02 3.008e+02 3.690e+02 5.122e+02, threshold=6.016e+02, percent-clipped=0.0 2024-09-20 06:42:43,366 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=22.5 2024-09-20 06:42:52,818 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 06:42:57,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=842996.0, ans=0.125 2024-09-20 06:43:09,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=842996.0, ans=0.125 2024-09-20 06:43:12,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.34 vs. limit=22.5 2024-09-20 06:43:22,501 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=843042.6666666666, ans=0.125 2024-09-20 06:43:28,687 INFO [train.py:1198] (1/2) Epoch 47, batch 2350, loss[loss=0.2012, simple_loss=0.2624, pruned_loss=0.05116, ctc_loss=0.112, cr_loss=0.3836, over 34683.00 frames. ], tot_loss[loss=0.1987, simple_loss=0.2578, pruned_loss=0.05098, ctc_loss=0.1109, cr_loss=0.3827, over 6772532.19 frames. ], batch size: 97, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 06:44:06,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=843182.6666666666, ans=0.125 2024-09-20 06:44:13,454 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=843182.6666666666, ans=0.1 2024-09-20 06:44:51,869 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=12.0 2024-09-20 06:44:52,486 INFO [train.py:1198] (1/2) Epoch 47, batch 2400, loss[loss=0.1877, simple_loss=0.2442, pruned_loss=0.04782, ctc_loss=0.1053, cr_loss=0.3614, over 34570.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2582, pruned_loss=0.05127, ctc_loss=0.1115, cr_loss=0.3841, over 6777069.30 frames. ], batch size: 89, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 06:45:03,747 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.662e+02 2.952e+02 3.894e+02 5.523e+02, threshold=5.904e+02, percent-clipped=0.0 2024-09-20 06:45:05,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=843322.6666666666, ans=0.125 2024-09-20 06:45:07,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=843369.3333333334, ans=0.125 2024-09-20 06:45:29,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=843416.0, ans=0.07 2024-09-20 06:45:30,698 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=843416.0, ans=0.0 2024-09-20 06:45:42,394 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=843462.6666666666, ans=0.0 2024-09-20 06:45:44,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.10 vs. limit=15.0 2024-09-20 06:45:48,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=843462.6666666666, ans=0.2 2024-09-20 06:45:50,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=843462.6666666666, ans=0.1 2024-09-20 06:46:07,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=843509.3333333334, ans=0.0 2024-09-20 06:46:17,162 INFO [train.py:1198] (1/2) Epoch 47, batch 2450, loss[loss=0.2005, simple_loss=0.2609, pruned_loss=0.05149, ctc_loss=0.1115, cr_loss=0.3698, over 34429.00 frames. ], tot_loss[loss=0.1996, simple_loss=0.2588, pruned_loss=0.05138, ctc_loss=0.1118, cr_loss=0.3847, over 6752130.87 frames. ], batch size: 95, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 06:46:20,696 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=843556.0, ans=0.125 2024-09-20 06:46:35,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=843602.6666666666, ans=0.125 2024-09-20 06:46:38,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=843602.6666666666, ans=0.2 2024-09-20 06:46:40,181 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=843602.6666666666, ans=0.125 2024-09-20 06:46:40,319 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=843602.6666666666, ans=0.125 2024-09-20 06:47:19,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=843696.0, ans=0.0 2024-09-20 06:47:19,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=843696.0, ans=0.0 2024-09-20 06:47:32,158 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.83 vs. limit=15.0 2024-09-20 06:47:33,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=843742.6666666666, ans=0.125 2024-09-20 06:47:40,794 INFO [train.py:1198] (1/2) Epoch 47, batch 2500, loss[loss=0.1997, simple_loss=0.2691, pruned_loss=0.04705, ctc_loss=0.1057, cr_loss=0.3742, over 34450.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2589, pruned_loss=0.05143, ctc_loss=0.1119, cr_loss=0.3846, over 6763463.77 frames. ], batch size: 100, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 06:47:52,222 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.169e+02 2.529e+02 2.812e+02 3.317e+02 5.462e+02, threshold=5.624e+02, percent-clipped=0.0 2024-09-20 06:48:17,326 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=843882.6666666666, ans=0.125 2024-09-20 06:48:22,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=843882.6666666666, ans=0.1 2024-09-20 06:48:35,391 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=843929.3333333334, ans=0.1 2024-09-20 06:48:50,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=843976.0, ans=0.1 2024-09-20 06:49:02,969 INFO [train.py:1198] (1/2) Epoch 47, batch 2550, loss[loss=0.1719, simple_loss=0.2285, pruned_loss=0.04189, ctc_loss=0.09219, cr_loss=0.3255, over 34180.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.2589, pruned_loss=0.05147, ctc_loss=0.112, cr_loss=0.3852, over 6767520.23 frames. ], batch size: 78, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 06:49:51,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=844116.0, ans=0.2 2024-09-20 06:49:54,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.28 vs. limit=15.0 2024-09-20 06:50:27,024 INFO [train.py:1198] (1/2) Epoch 47, batch 2600, loss[loss=0.1946, simple_loss=0.2532, pruned_loss=0.04969, ctc_loss=0.1076, cr_loss=0.3789, over 34346.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2592, pruned_loss=0.05152, ctc_loss=0.1121, cr_loss=0.3855, over 6763310.06 frames. ], batch size: 91, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 06:50:29,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.33 vs. limit=15.0 2024-09-20 06:50:38,330 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.178e+02 2.527e+02 2.879e+02 3.746e+02 5.405e+02, threshold=5.759e+02, percent-clipped=0.0 2024-09-20 06:50:38,789 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=844256.0, ans=0.0 2024-09-20 06:51:01,644 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=844349.3333333334, ans=0.0 2024-09-20 06:51:15,113 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2024-09-20 06:51:18,184 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=844396.0, ans=0.1 2024-09-20 06:51:42,591 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=844442.6666666666, ans=0.125 2024-09-20 06:51:47,364 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=844442.6666666666, ans=0.125 2024-09-20 06:51:50,265 INFO [train.py:1198] (1/2) Epoch 47, batch 2650, loss[loss=0.2091, simple_loss=0.271, pruned_loss=0.0538, ctc_loss=0.1175, cr_loss=0.401, over 34261.00 frames. ], tot_loss[loss=0.2002, simple_loss=0.2594, pruned_loss=0.05157, ctc_loss=0.1122, cr_loss=0.3864, over 6770441.27 frames. ], batch size: 117, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 06:52:07,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=844536.0, ans=0.2 2024-09-20 06:52:10,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=844536.0, ans=0.025 2024-09-20 06:52:19,990 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=844536.0, ans=0.125 2024-09-20 06:52:20,164 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer2.prob, batch_count=844536.0, ans=0.125 2024-09-20 06:52:26,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=844582.6666666666, ans=0.0 2024-09-20 06:52:28,362 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=844582.6666666666, ans=0.125 2024-09-20 06:52:29,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=844582.6666666666, ans=0.125 2024-09-20 06:52:53,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=844629.3333333334, ans=0.0 2024-09-20 06:52:58,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=844676.0, ans=0.09899494936611666 2024-09-20 06:53:00,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=844676.0, ans=0.1 2024-09-20 06:53:12,559 INFO [train.py:1198] (1/2) Epoch 47, batch 2700, loss[loss=0.1982, simple_loss=0.2606, pruned_loss=0.04918, ctc_loss=0.1101, cr_loss=0.3861, over 34637.00 frames. ], tot_loss[loss=0.2004, simple_loss=0.2596, pruned_loss=0.05161, ctc_loss=0.1122, cr_loss=0.3863, over 6764462.29 frames. ], batch size: 102, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 06:53:23,959 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.156e+02 2.530e+02 2.851e+02 3.447e+02 5.939e+02, threshold=5.702e+02, percent-clipped=1.0 2024-09-20 06:53:24,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=844722.6666666666, ans=0.125 2024-09-20 06:53:42,665 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=844769.3333333334, ans=0.0 2024-09-20 06:53:47,554 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.prob, batch_count=844816.0, ans=0.125 2024-09-20 06:53:51,814 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.18 vs. limit=10.0 2024-09-20 06:54:14,969 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.75 vs. limit=10.0 2024-09-20 06:54:19,277 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=844909.3333333334, ans=0.1 2024-09-20 06:54:37,041 INFO [train.py:1198] (1/2) Epoch 47, batch 2750, loss[loss=0.1917, simple_loss=0.2502, pruned_loss=0.04868, ctc_loss=0.1055, cr_loss=0.3703, over 34633.00 frames. ], tot_loss[loss=0.1993, simple_loss=0.2585, pruned_loss=0.05122, ctc_loss=0.1113, cr_loss=0.3839, over 6761760.45 frames. ], batch size: 88, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 06:54:56,105 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=12.99 vs. limit=15.0 2024-09-20 06:55:02,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=845002.6666666666, ans=0.1 2024-09-20 06:55:10,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=22.5 2024-09-20 06:55:15,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=845049.3333333334, ans=0.2 2024-09-20 06:55:20,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=845049.3333333334, ans=0.95 2024-09-20 06:55:20,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=845049.3333333334, ans=0.07 2024-09-20 06:55:33,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=845096.0, ans=0.1 2024-09-20 06:55:53,134 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=845142.6666666666, ans=0.125 2024-09-20 06:56:01,226 INFO [train.py:1198] (1/2) Epoch 47, batch 2800, loss[loss=0.2094, simple_loss=0.2649, pruned_loss=0.05638, ctc_loss=0.1264, cr_loss=0.3948, over 22910.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2588, pruned_loss=0.05147, ctc_loss=0.1118, cr_loss=0.3846, over 6739259.94 frames. ], batch size: 245, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 06:56:01,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=845189.3333333334, ans=0.025 2024-09-20 06:56:04,924 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=845189.3333333334, ans=0.125 2024-09-20 06:56:12,531 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.103e+02 2.596e+02 2.977e+02 3.750e+02 6.759e+02, threshold=5.953e+02, percent-clipped=2.0 2024-09-20 06:56:42,299 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=845282.6666666666, ans=0.0 2024-09-20 06:56:42,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=845282.6666666666, ans=0.0 2024-09-20 06:56:44,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=845282.6666666666, ans=22.5 2024-09-20 06:56:45,564 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=845282.6666666666, ans=0.025 2024-09-20 06:56:46,339 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.18 vs. limit=10.0 2024-09-20 06:56:46,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.70 vs. limit=15.0 2024-09-20 06:57:25,338 INFO [train.py:1198] (1/2) Epoch 47, batch 2850, loss[loss=0.1956, simple_loss=0.2531, pruned_loss=0.05043, ctc_loss=0.1109, cr_loss=0.3768, over 34479.00 frames. ], tot_loss[loss=0.2001, simple_loss=0.2591, pruned_loss=0.0516, ctc_loss=0.1121, cr_loss=0.3851, over 6725013.52 frames. ], batch size: 90, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 06:57:26,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.63 vs. limit=6.0 2024-09-20 06:57:42,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=845469.3333333334, ans=0.125 2024-09-20 06:57:53,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=845469.3333333334, ans=0.125 2024-09-20 06:57:55,968 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.13 vs. limit=22.5 2024-09-20 06:57:57,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=845516.0, ans=0.125 2024-09-20 06:57:58,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=845516.0, ans=0.1 2024-09-20 06:58:03,681 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=845516.0, ans=0.125 2024-09-20 06:58:13,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=845516.0, ans=0.125 2024-09-20 06:58:21,612 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=845562.6666666666, ans=0.125 2024-09-20 06:58:45,705 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.95 vs. limit=22.5 2024-09-20 06:58:49,575 INFO [train.py:1198] (1/2) Epoch 47, batch 2900, loss[loss=0.197, simple_loss=0.2536, pruned_loss=0.05134, ctc_loss=0.1124, cr_loss=0.3828, over 34537.00 frames. ], tot_loss[loss=0.201, simple_loss=0.2602, pruned_loss=0.0519, ctc_loss=0.1126, cr_loss=0.3868, over 6755599.02 frames. ], batch size: 94, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 06:59:01,176 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.091e+02 2.568e+02 2.954e+02 3.733e+02 8.281e+02, threshold=5.907e+02, percent-clipped=3.0 2024-09-20 06:59:10,739 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.56 vs. limit=10.0 2024-09-20 06:59:29,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=15.0 2024-09-20 06:59:52,453 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=845796.0, ans=0.2 2024-09-20 07:00:03,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=845842.6666666666, ans=0.0 2024-09-20 07:00:05,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=845842.6666666666, ans=0.0 2024-09-20 07:00:10,208 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=845889.3333333334, ans=0.0 2024-09-20 07:00:11,630 INFO [train.py:1198] (1/2) Epoch 47, batch 2950, loss[loss=0.1913, simple_loss=0.2485, pruned_loss=0.04863, ctc_loss=0.1096, cr_loss=0.3742, over 34622.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2588, pruned_loss=0.05141, ctc_loss=0.1118, cr_loss=0.3845, over 6750878.79 frames. ], batch size: 88, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 07:00:11,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=845889.3333333334, ans=0.0 2024-09-20 07:00:21,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=845889.3333333334, ans=0.125 2024-09-20 07:00:51,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.77 vs. limit=15.0 2024-09-20 07:00:56,276 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=845982.6666666666, ans=0.125 2024-09-20 07:01:03,473 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:01:03,532 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=846029.3333333334, ans=0.025 2024-09-20 07:01:04,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=846029.3333333334, ans=0.125 2024-09-20 07:01:35,828 INFO [train.py:1198] (1/2) Epoch 47, batch 3000, loss[loss=0.2079, simple_loss=0.2679, pruned_loss=0.054, ctc_loss=0.1179, cr_loss=0.4065, over 34543.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2583, pruned_loss=0.05115, ctc_loss=0.1113, cr_loss=0.3837, over 6751622.54 frames. ], batch size: 94, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 07:01:35,828 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 07:01:39,564 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.4.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.7468, 3.4542, 3.0291, 3.4486], device='cuda:1') 2024-09-20 07:01:54,011 INFO [train.py:1230] (1/2) Epoch 47, validation: loss=0.1482, simple_loss=0.2416, pruned_loss=0.0236, ctc_loss=0.0384, cr_loss=2.299e-14, over 944034.00 frames. 2024-09-20 07:01:54,011 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-20 07:01:56,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.11 vs. limit=22.5 2024-09-20 07:02:05,459 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.256e+02 2.635e+02 2.977e+02 3.863e+02 9.263e+02, threshold=5.953e+02, percent-clipped=2.0 2024-09-20 07:02:09,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=846169.3333333334, ans=0.125 2024-09-20 07:02:35,092 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:02:35,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=846216.0, ans=0.125 2024-09-20 07:02:43,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=846262.6666666666, ans=0.125 2024-09-20 07:03:07,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=846309.3333333334, ans=0.0 2024-09-20 07:03:15,469 INFO [train.py:1198] (1/2) Epoch 47, batch 3050, loss[loss=0.1871, simple_loss=0.2462, pruned_loss=0.04616, ctc_loss=0.1046, cr_loss=0.3683, over 34580.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.259, pruned_loss=0.0515, ctc_loss=0.1119, cr_loss=0.3846, over 6744245.34 frames. ], batch size: 89, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 07:03:34,168 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.58 vs. limit=12.0 2024-09-20 07:03:35,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=846402.6666666666, ans=0.0 2024-09-20 07:03:43,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=846402.6666666666, ans=0.125 2024-09-20 07:04:03,123 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.67 vs. limit=10.0 2024-09-20 07:04:15,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=846496.0, ans=0.125 2024-09-20 07:04:36,353 INFO [train.py:1198] (1/2) Epoch 47, batch 3100, loss[loss=0.207, simple_loss=0.2693, pruned_loss=0.05268, ctc_loss=0.1164, cr_loss=0.4, over 34220.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2588, pruned_loss=0.05143, ctc_loss=0.1118, cr_loss=0.3848, over 6744873.29 frames. ], batch size: 117, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 07:04:47,654 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.158e+02 2.573e+02 2.865e+02 3.423e+02 5.688e+02, threshold=5.730e+02, percent-clipped=0.0 2024-09-20 07:05:07,525 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.55 vs. limit=22.5 2024-09-20 07:05:10,151 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=846682.6666666666, ans=0.125 2024-09-20 07:05:23,741 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-20 07:05:36,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=846729.3333333334, ans=0.1 2024-09-20 07:05:56,992 INFO [train.py:1198] (1/2) Epoch 47, batch 3150, loss[loss=0.2067, simple_loss=0.2714, pruned_loss=0.05174, ctc_loss=0.1152, cr_loss=0.3877, over 33970.00 frames. ], tot_loss[loss=0.1994, simple_loss=0.2585, pruned_loss=0.05132, ctc_loss=0.1116, cr_loss=0.3845, over 6750496.28 frames. ], batch size: 122, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 07:06:02,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=846822.6666666666, ans=0.0 2024-09-20 07:06:05,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=846822.6666666666, ans=0.1 2024-09-20 07:06:08,492 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=846822.6666666666, ans=0.0 2024-09-20 07:06:44,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.74 vs. limit=10.0 2024-09-20 07:07:19,384 INFO [train.py:1198] (1/2) Epoch 47, batch 3200, loss[loss=0.1922, simple_loss=0.2485, pruned_loss=0.04976, ctc_loss=0.106, cr_loss=0.3796, over 34554.00 frames. ], tot_loss[loss=0.1989, simple_loss=0.2581, pruned_loss=0.0511, ctc_loss=0.1111, cr_loss=0.3837, over 6762645.57 frames. ], batch size: 94, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 07:07:30,567 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.100e+02 2.609e+02 3.055e+02 3.656e+02 6.675e+02, threshold=6.109e+02, percent-clipped=3.0 2024-09-20 07:07:49,510 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.57 vs. limit=22.5 2024-09-20 07:07:50,369 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=847102.6666666666, ans=0.0 2024-09-20 07:08:20,986 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=847196.0, ans=0.125 2024-09-20 07:08:30,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=847242.6666666666, ans=0.0 2024-09-20 07:08:37,791 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.51 vs. limit=22.5 2024-09-20 07:08:41,756 INFO [train.py:1198] (1/2) Epoch 47, batch 3250, loss[loss=0.2119, simple_loss=0.274, pruned_loss=0.05477, ctc_loss=0.1186, cr_loss=0.4152, over 34651.00 frames. ], tot_loss[loss=0.1994, simple_loss=0.2586, pruned_loss=0.05127, ctc_loss=0.1115, cr_loss=0.3843, over 6772047.39 frames. ], batch size: 98, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 07:08:46,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=847289.3333333334, ans=0.07 2024-09-20 07:08:55,529 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=7.60 vs. limit=15.0 2024-09-20 07:08:58,779 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.58 vs. limit=22.5 2024-09-20 07:09:13,268 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.28 vs. limit=15.0 2024-09-20 07:09:14,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=847382.6666666666, ans=0.125 2024-09-20 07:09:25,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=847382.6666666666, ans=0.025 2024-09-20 07:09:26,198 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.65 vs. limit=22.5 2024-09-20 07:10:02,952 INFO [train.py:1198] (1/2) Epoch 47, batch 3300, loss[loss=0.2173, simple_loss=0.2765, pruned_loss=0.0582, ctc_loss=0.1263, cr_loss=0.4122, over 33067.00 frames. ], tot_loss[loss=0.1986, simple_loss=0.2577, pruned_loss=0.05096, ctc_loss=0.1109, cr_loss=0.3828, over 6770005.55 frames. ], batch size: 130, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 07:10:06,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=847522.6666666666, ans=0.2 2024-09-20 07:10:14,198 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 2.566e+02 2.882e+02 3.631e+02 6.678e+02, threshold=5.763e+02, percent-clipped=2.0 2024-09-20 07:10:14,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=847522.6666666666, ans=0.025 2024-09-20 07:10:15,387 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.29 vs. limit=15.0 2024-09-20 07:10:24,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=847569.3333333334, ans=0.5 2024-09-20 07:10:33,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=847616.0, ans=0.125 2024-09-20 07:11:03,165 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.17 vs. limit=15.0 2024-09-20 07:11:23,102 INFO [train.py:1198] (1/2) Epoch 47, batch 3350, loss[loss=0.2143, simple_loss=0.2741, pruned_loss=0.05695, ctc_loss=0.1217, cr_loss=0.4067, over 33888.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2582, pruned_loss=0.05112, ctc_loss=0.1112, cr_loss=0.383, over 6743150.64 frames. ], batch size: 122, lr: 2.52e-03, grad_scale: 64.0 2024-09-20 07:11:29,063 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.68 vs. limit=22.5 2024-09-20 07:11:31,396 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=847756.0, ans=0.0 2024-09-20 07:11:37,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=847802.6666666666, ans=0.015 2024-09-20 07:11:39,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=847802.6666666666, ans=0.125 2024-09-20 07:12:17,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff3.min_abs, batch_count=847896.0, ans=0.2 2024-09-20 07:12:19,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=847896.0, ans=0.1 2024-09-20 07:12:35,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=847942.6666666666, ans=0.125 2024-09-20 07:12:36,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=847942.6666666666, ans=0.1 2024-09-20 07:12:44,573 INFO [train.py:1198] (1/2) Epoch 47, batch 3400, loss[loss=0.1795, simple_loss=0.236, pruned_loss=0.04446, ctc_loss=0.1011, cr_loss=0.3473, over 34150.00 frames. ], tot_loss[loss=0.1995, simple_loss=0.2585, pruned_loss=0.05137, ctc_loss=0.1117, cr_loss=0.3849, over 6733632.56 frames. ], batch size: 78, lr: 2.52e-03, grad_scale: 64.0 2024-09-20 07:12:50,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=847989.3333333334, ans=0.125 2024-09-20 07:12:56,825 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.077e+02 2.523e+02 2.803e+02 3.252e+02 6.914e+02, threshold=5.607e+02, percent-clipped=4.0 2024-09-20 07:13:06,854 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=848036.0, ans=0.0 2024-09-20 07:13:37,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=848129.3333333334, ans=0.125 2024-09-20 07:14:05,800 INFO [train.py:1198] (1/2) Epoch 47, batch 3450, loss[loss=0.2201, simple_loss=0.2804, pruned_loss=0.05883, ctc_loss=0.1274, cr_loss=0.4146, over 32988.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.2589, pruned_loss=0.05146, ctc_loss=0.1118, cr_loss=0.3852, over 6746009.55 frames. ], batch size: 130, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 07:14:10,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.68 vs. limit=15.0 2024-09-20 07:14:10,832 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=848222.6666666666, ans=0.125 2024-09-20 07:14:10,876 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=848222.6666666666, ans=0.0 2024-09-20 07:14:14,469 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.76 vs. limit=12.0 2024-09-20 07:14:34,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=848269.3333333334, ans=0.125 2024-09-20 07:14:36,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=848316.0, ans=0.125 2024-09-20 07:14:36,498 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:14:50,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=848316.0, ans=0.95 2024-09-20 07:14:55,543 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=848362.6666666666, ans=0.2 2024-09-20 07:15:16,894 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=848409.3333333334, ans=0.125 2024-09-20 07:15:18,753 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=15.0 2024-09-20 07:15:23,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=848409.3333333334, ans=0.05 2024-09-20 07:15:26,082 INFO [train.py:1198] (1/2) Epoch 47, batch 3500, loss[loss=0.1711, simple_loss=0.2338, pruned_loss=0.03921, ctc_loss=0.08778, cr_loss=0.313, over 34469.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2583, pruned_loss=0.05113, ctc_loss=0.1113, cr_loss=0.3838, over 6747060.77 frames. ], batch size: 85, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 07:15:38,772 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.046e+02 2.558e+02 2.834e+02 3.546e+02 6.642e+02, threshold=5.669e+02, percent-clipped=3.0 2024-09-20 07:15:54,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.58 vs. limit=15.0 2024-09-20 07:16:09,365 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=848549.3333333334, ans=0.125 2024-09-20 07:16:20,706 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=848596.0, ans=0.2 2024-09-20 07:16:28,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=848596.0, ans=0.125 2024-09-20 07:16:32,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=848642.6666666666, ans=0.125 2024-09-20 07:16:34,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=848642.6666666666, ans=0.0 2024-09-20 07:16:46,688 INFO [train.py:1198] (1/2) Epoch 47, batch 3550, loss[loss=0.2088, simple_loss=0.2689, pruned_loss=0.0543, ctc_loss=0.119, cr_loss=0.4044, over 34391.00 frames. ], tot_loss[loss=0.1993, simple_loss=0.2586, pruned_loss=0.0512, ctc_loss=0.1114, cr_loss=0.3842, over 6757094.07 frames. ], batch size: 103, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 07:16:48,589 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=848689.3333333334, ans=0.2 2024-09-20 07:16:48,718 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:16:53,772 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.88 vs. limit=6.0 2024-09-20 07:17:10,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=848736.0, ans=0.125 2024-09-20 07:17:11,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=848736.0, ans=0.125 2024-09-20 07:17:13,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=848736.0, ans=0.0 2024-09-20 07:17:21,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=848782.6666666666, ans=0.125 2024-09-20 07:17:26,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.84 vs. limit=22.5 2024-09-20 07:17:54,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=848876.0, ans=0.1 2024-09-20 07:18:06,820 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.03 vs. limit=15.0 2024-09-20 07:18:07,180 INFO [train.py:1198] (1/2) Epoch 47, batch 3600, loss[loss=0.1816, simple_loss=0.2433, pruned_loss=0.04301, ctc_loss=0.09838, cr_loss=0.3542, over 34472.00 frames. ], tot_loss[loss=0.1995, simple_loss=0.2589, pruned_loss=0.05124, ctc_loss=0.1115, cr_loss=0.3848, over 6767605.92 frames. ], batch size: 90, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 07:18:15,517 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=848922.6666666666, ans=0.0 2024-09-20 07:18:16,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=848922.6666666666, ans=0.025 2024-09-20 07:18:21,516 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 2.584e+02 2.978e+02 3.862e+02 6.367e+02, threshold=5.955e+02, percent-clipped=7.0 2024-09-20 07:18:34,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=848969.3333333334, ans=0.125 2024-09-20 07:18:45,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=849016.0, ans=0.1 2024-09-20 07:18:48,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=849016.0, ans=0.0 2024-09-20 07:18:57,794 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.80 vs. limit=22.5 2024-09-20 07:18:58,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=849062.6666666666, ans=0.125 2024-09-20 07:19:05,671 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.94 vs. limit=15.0 2024-09-20 07:19:11,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=849109.3333333334, ans=0.125 2024-09-20 07:19:12,159 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.94 vs. limit=10.0 2024-09-20 07:19:27,433 INFO [train.py:1198] (1/2) Epoch 47, batch 3650, loss[loss=0.2122, simple_loss=0.2744, pruned_loss=0.05498, ctc_loss=0.1198, cr_loss=0.4022, over 34441.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2584, pruned_loss=0.05106, ctc_loss=0.1112, cr_loss=0.384, over 6770829.38 frames. ], batch size: 110, lr: 2.52e-03, grad_scale: 32.0 2024-09-20 07:19:55,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=849202.6666666666, ans=0.1 2024-09-20 07:20:14,753 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=849296.0, ans=0.0 2024-09-20 07:20:14,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=849296.0, ans=0.0 2024-09-20 07:20:22,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=849296.0, ans=0.2 2024-09-20 07:20:33,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=849342.6666666666, ans=0.0 2024-09-20 07:20:33,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.const_attention_rate, batch_count=849342.6666666666, ans=0.025 2024-09-20 07:20:34,023 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.97 vs. limit=6.0 2024-09-20 07:20:41,084 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=849342.6666666666, ans=0.0 2024-09-20 07:20:48,730 INFO [train.py:1198] (1/2) Epoch 47, batch 3700, loss[loss=0.2061, simple_loss=0.271, pruned_loss=0.05142, ctc_loss=0.1152, cr_loss=0.3801, over 34655.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.258, pruned_loss=0.05076, ctc_loss=0.1107, cr_loss=0.3824, over 6785320.76 frames. ], batch size: 102, lr: 2.51e-03, grad_scale: 32.0 2024-09-20 07:21:03,169 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.214e+02 2.538e+02 2.864e+02 3.730e+02 6.402e+02, threshold=5.727e+02, percent-clipped=3.0 2024-09-20 07:21:08,515 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:21:29,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=849482.6666666666, ans=0.125 2024-09-20 07:21:29,194 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=849482.6666666666, ans=0.0 2024-09-20 07:21:35,787 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=849529.3333333334, ans=0.0 2024-09-20 07:21:42,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=849529.3333333334, ans=0.0 2024-09-20 07:21:45,525 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=849529.3333333334, ans=0.1 2024-09-20 07:21:45,527 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=849529.3333333334, ans=0.0 2024-09-20 07:22:09,139 INFO [train.py:1198] (1/2) Epoch 47, batch 3750, loss[loss=0.2208, simple_loss=0.2821, pruned_loss=0.05858, ctc_loss=0.1265, cr_loss=0.427, over 34345.00 frames. ], tot_loss[loss=0.2016, simple_loss=0.2612, pruned_loss=0.05193, ctc_loss=0.113, cr_loss=0.3887, over 6785541.14 frames. ], batch size: 113, lr: 2.51e-03, grad_scale: 32.0 2024-09-20 07:22:14,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=849622.6666666666, ans=0.2 2024-09-20 07:22:15,829 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=849622.6666666666, ans=0.125 2024-09-20 07:22:44,599 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=849716.0, ans=0.0 2024-09-20 07:22:49,468 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=849716.0, ans=0.0 2024-09-20 07:22:50,038 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.23 vs. limit=10.0 2024-09-20 07:22:50,232 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2024-09-20 07:22:51,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=849716.0, ans=0.125 2024-09-20 07:23:00,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.32 vs. limit=10.0 2024-09-20 07:23:01,811 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=849762.6666666666, ans=0.125 2024-09-20 07:23:08,387 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=849762.6666666666, ans=0.0 2024-09-20 07:23:09,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=849762.6666666666, ans=0.125 2024-09-20 07:23:14,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=849809.3333333334, ans=0.1 2024-09-20 07:23:30,540 INFO [train.py:1198] (1/2) Epoch 47, batch 3800, loss[loss=0.224, simple_loss=0.2758, pruned_loss=0.06443, ctc_loss=0.1339, cr_loss=0.4146, over 30004.00 frames. ], tot_loss[loss=0.2046, simple_loss=0.2639, pruned_loss=0.05324, ctc_loss=0.1154, cr_loss=0.3939, over 6676169.87 frames. ], batch size: 176, lr: 2.51e-03, grad_scale: 32.0 2024-09-20 07:23:39,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=849856.0, ans=0.0 2024-09-20 07:23:45,516 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.219e+02 2.440e+02 2.575e+02 2.769e+02 3.711e+02, threshold=5.150e+02, percent-clipped=0.0 2024-09-20 07:23:47,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=849902.6666666666, ans=0.0 2024-09-20 07:23:47,942 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=22.5 2024-09-20 07:23:55,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=849902.6666666666, ans=0.015 2024-09-20 07:23:57,850 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.45 vs. limit=15.0 2024-09-20 07:24:04,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.const_attention_rate, batch_count=849949.3333333334, ans=0.025 2024-09-20 07:24:32,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=849996.0, ans=0.1 2024-09-20 07:24:49,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=850042.6666666666, ans=10.0 2024-09-20 07:24:53,804 INFO [train.py:1198] (1/2) Epoch 47, batch 3850, loss[loss=0.2225, simple_loss=0.2727, pruned_loss=0.06459, ctc_loss=0.1332, cr_loss=0.4095, over 24205.00 frames. ], tot_loss[loss=0.2073, simple_loss=0.2655, pruned_loss=0.05468, ctc_loss=0.1185, cr_loss=0.3981, over 6252229.93 frames. ], batch size: 244, lr: 2.51e-03, grad_scale: 32.0 2024-09-20 07:25:11,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=850136.0, ans=0.125 2024-09-20 07:26:07,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=850215.3333333334, ans=0.0 2024-09-20 07:26:25,101 INFO [train.py:1198] (1/2) Epoch 48, batch 0, loss[loss=0.1799, simple_loss=0.2395, pruned_loss=0.04398, ctc_loss=0.09614, cr_loss=0.3286, over 34455.00 frames. ], tot_loss[loss=0.1799, simple_loss=0.2395, pruned_loss=0.04398, ctc_loss=0.09614, cr_loss=0.3286, over 34455.00 frames. ], batch size: 85, lr: 2.49e-03, grad_scale: 32.0 2024-09-20 07:26:25,102 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 07:26:41,865 INFO [train.py:1230] (1/2) Epoch 48, validation: loss=0.1486, simple_loss=0.2424, pruned_loss=0.02361, ctc_loss=0.03857, cr_loss=2.307e-14, over 944034.00 frames. 2024-09-20 07:26:41,865 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-20 07:26:51,892 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=850215.3333333334, ans=0.0 2024-09-20 07:26:55,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.84 vs. limit=6.0 2024-09-20 07:27:30,855 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=15.0 2024-09-20 07:27:36,246 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.206e+02 2.803e+02 2.959e+02 3.284e+02 1.045e+03, threshold=5.918e+02, percent-clipped=5.0 2024-09-20 07:28:05,881 INFO [train.py:1198] (1/2) Epoch 48, batch 50, loss[loss=0.1707, simple_loss=0.2302, pruned_loss=0.04017, ctc_loss=0.09063, cr_loss=0.3198, over 34485.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2592, pruned_loss=0.05148, ctc_loss=0.1119, cr_loss=0.3862, over 1480000.65 frames. ], batch size: 82, lr: 2.49e-03, grad_scale: 32.0 2024-09-20 07:28:06,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=850448.6666666666, ans=0.0 2024-09-20 07:28:11,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=850448.6666666666, ans=0.0 2024-09-20 07:28:21,890 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2024-09-20 07:28:38,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.02 vs. limit=15.0 2024-09-20 07:28:47,813 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=850542.0, ans=0.125 2024-09-20 07:29:04,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=850588.6666666666, ans=0.125 2024-09-20 07:29:23,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=850635.3333333334, ans=0.0 2024-09-20 07:29:26,176 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.87 vs. limit=22.5 2024-09-20 07:29:29,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.89 vs. limit=15.0 2024-09-20 07:29:30,206 INFO [train.py:1198] (1/2) Epoch 48, batch 100, loss[loss=0.1895, simple_loss=0.2469, pruned_loss=0.04816, ctc_loss=0.1046, cr_loss=0.3694, over 34581.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2611, pruned_loss=0.05185, ctc_loss=0.1127, cr_loss=0.3881, over 2628861.89 frames. ], batch size: 89, lr: 2.49e-03, grad_scale: 32.0 2024-09-20 07:29:34,155 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=4.07 vs. limit=12.0 2024-09-20 07:29:46,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer1.prob, batch_count=850728.6666666666, ans=0.125 2024-09-20 07:29:54,987 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=850728.6666666666, ans=0.0 2024-09-20 07:29:58,930 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.29 vs. limit=6.0 2024-09-20 07:30:09,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=850775.3333333334, ans=0.025 2024-09-20 07:30:22,199 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.194e+02 2.552e+02 2.943e+02 3.487e+02 6.402e+02, threshold=5.886e+02, percent-clipped=2.0 2024-09-20 07:30:41,030 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=15.0 2024-09-20 07:30:46,716 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=850868.6666666666, ans=0.125 2024-09-20 07:30:51,216 INFO [train.py:1198] (1/2) Epoch 48, batch 150, loss[loss=0.1802, simple_loss=0.2394, pruned_loss=0.04391, ctc_loss=0.09489, cr_loss=0.3548, over 34492.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.2587, pruned_loss=0.05073, ctc_loss=0.1106, cr_loss=0.3832, over 3557567.90 frames. ], batch size: 82, lr: 2.49e-03, grad_scale: 32.0 2024-09-20 07:31:14,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=850962.0, ans=0.2 2024-09-20 07:31:15,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=850962.0, ans=0.125 2024-09-20 07:31:15,032 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=850962.0, ans=0.04949747468305833 2024-09-20 07:31:53,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=851055.3333333334, ans=0.1 2024-09-20 07:32:06,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=851102.0, ans=0.0 2024-09-20 07:32:11,941 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.90 vs. limit=6.0 2024-09-20 07:32:17,395 INFO [train.py:1198] (1/2) Epoch 48, batch 200, loss[loss=0.213, simple_loss=0.2722, pruned_loss=0.0563, ctc_loss=0.1245, cr_loss=0.4095, over 31845.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2578, pruned_loss=0.05069, ctc_loss=0.1103, cr_loss=0.3818, over 4272285.89 frames. ], batch size: 145, lr: 2.49e-03, grad_scale: 32.0 2024-09-20 07:32:25,944 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=851148.6666666666, ans=0.125 2024-09-20 07:32:30,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=851148.6666666666, ans=0.0 2024-09-20 07:32:59,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=851242.0, ans=0.125 2024-09-20 07:33:10,494 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.156e+02 2.791e+02 3.471e+02 4.938e+02 7.040e+02, threshold=6.942e+02, percent-clipped=10.0 2024-09-20 07:33:11,451 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.00 vs. limit=22.5 2024-09-20 07:33:13,492 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.38 vs. limit=15.0 2024-09-20 07:33:19,066 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:33:40,214 INFO [train.py:1198] (1/2) Epoch 48, batch 250, loss[loss=0.2223, simple_loss=0.2858, pruned_loss=0.05848, ctc_loss=0.1233, cr_loss=0.4306, over 34221.00 frames. ], tot_loss[loss=0.1984, simple_loss=0.258, pruned_loss=0.05075, ctc_loss=0.1104, cr_loss=0.3817, over 4834240.98 frames. ], batch size: 117, lr: 2.49e-03, grad_scale: 32.0 2024-09-20 07:33:56,966 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=851428.6666666666, ans=0.0 2024-09-20 07:34:20,080 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=851475.3333333334, ans=0.2 2024-09-20 07:34:25,070 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=851475.3333333334, ans=0.125 2024-09-20 07:34:28,845 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.52 vs. limit=6.0 2024-09-20 07:34:52,320 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.14 vs. limit=15.0 2024-09-20 07:35:01,511 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=851568.6666666666, ans=0.025 2024-09-20 07:35:04,453 INFO [train.py:1198] (1/2) Epoch 48, batch 300, loss[loss=0.2199, simple_loss=0.2792, pruned_loss=0.05953, ctc_loss=0.1256, cr_loss=0.4115, over 34325.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2576, pruned_loss=0.05041, ctc_loss=0.1099, cr_loss=0.3809, over 5262764.57 frames. ], batch size: 107, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:35:19,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=851662.0, ans=0.0 2024-09-20 07:35:21,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=851662.0, ans=12.0 2024-09-20 07:35:44,030 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=851708.6666666666, ans=0.09899494936611666 2024-09-20 07:35:49,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=851708.6666666666, ans=10.0 2024-09-20 07:35:59,026 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.135e+02 2.500e+02 2.851e+02 3.533e+02 5.151e+02, threshold=5.702e+02, percent-clipped=0.0 2024-09-20 07:36:12,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=851802.0, ans=0.2 2024-09-20 07:36:22,655 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=851802.0, ans=0.1 2024-09-20 07:36:28,805 INFO [train.py:1198] (1/2) Epoch 48, batch 350, loss[loss=0.1754, simple_loss=0.2336, pruned_loss=0.04193, ctc_loss=0.09631, cr_loss=0.3517, over 34262.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.2581, pruned_loss=0.05073, ctc_loss=0.1106, cr_loss=0.3825, over 5597101.26 frames. ], batch size: 83, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:36:45,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=851895.3333333334, ans=0.1 2024-09-20 07:36:53,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=851895.3333333334, ans=0.2 2024-09-20 07:36:56,786 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=851895.3333333334, ans=0.1 2024-09-20 07:36:59,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.19 vs. limit=10.0 2024-09-20 07:37:32,205 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=15.0 2024-09-20 07:37:38,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=852035.3333333334, ans=0.0 2024-09-20 07:37:47,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=852035.3333333334, ans=0.0 2024-09-20 07:37:50,772 INFO [train.py:1198] (1/2) Epoch 48, batch 400, loss[loss=0.1971, simple_loss=0.258, pruned_loss=0.04987, ctc_loss=0.1082, cr_loss=0.3736, over 34447.00 frames. ], tot_loss[loss=0.1982, simple_loss=0.2578, pruned_loss=0.05062, ctc_loss=0.1104, cr_loss=0.3821, over 5864615.02 frames. ], batch size: 95, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:38:12,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.57 vs. limit=15.0 2024-09-20 07:38:45,702 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.253e+02 2.478e+02 2.860e+02 3.482e+02 5.917e+02, threshold=5.719e+02, percent-clipped=1.0 2024-09-20 07:39:15,595 INFO [train.py:1198] (1/2) Epoch 48, batch 450, loss[loss=0.2112, simple_loss=0.2745, pruned_loss=0.05431, ctc_loss=0.1178, cr_loss=0.3951, over 34695.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2583, pruned_loss=0.05101, ctc_loss=0.1111, cr_loss=0.3834, over 6051999.71 frames. ], batch size: 97, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:39:51,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.min_positive, batch_count=852408.6666666666, ans=0.025 2024-09-20 07:39:59,372 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=852408.6666666666, ans=0.2 2024-09-20 07:40:02,719 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=852408.6666666666, ans=0.09899494936611666 2024-09-20 07:40:11,567 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.16 vs. limit=22.5 2024-09-20 07:40:16,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=852455.3333333334, ans=0.125 2024-09-20 07:40:19,381 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=852455.3333333334, ans=0.125 2024-09-20 07:40:22,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852502.0, ans=0.1 2024-09-20 07:40:40,199 INFO [train.py:1198] (1/2) Epoch 48, batch 500, loss[loss=0.2073, simple_loss=0.2663, pruned_loss=0.05419, ctc_loss=0.1183, cr_loss=0.4056, over 34480.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2577, pruned_loss=0.05073, ctc_loss=0.1106, cr_loss=0.3824, over 6218079.94 frames. ], batch size: 110, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:40:50,487 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=852548.6666666666, ans=0.125 2024-09-20 07:41:01,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=6.18 vs. limit=15.0 2024-09-20 07:41:02,173 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=852595.3333333334, ans=0.125 2024-09-20 07:41:25,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=852642.0, ans=0.05 2024-09-20 07:41:33,417 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.179e+02 2.544e+02 2.898e+02 3.552e+02 6.691e+02, threshold=5.797e+02, percent-clipped=2.0 2024-09-20 07:41:37,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=852688.6666666666, ans=0.0 2024-09-20 07:41:44,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=13.07 vs. limit=15.0 2024-09-20 07:42:03,057 INFO [train.py:1198] (1/2) Epoch 48, batch 550, loss[loss=0.2116, simple_loss=0.2759, pruned_loss=0.05376, ctc_loss=0.118, cr_loss=0.4046, over 33849.00 frames. ], tot_loss[loss=0.1986, simple_loss=0.2581, pruned_loss=0.05085, ctc_loss=0.1108, cr_loss=0.3828, over 6327680.55 frames. ], batch size: 122, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:42:33,322 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=852828.6666666666, ans=0.125 2024-09-20 07:43:01,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=852922.0, ans=0.1 2024-09-20 07:43:02,888 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=852922.0, ans=0.0 2024-09-20 07:43:04,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=852922.0, ans=0.1 2024-09-20 07:43:06,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=852922.0, ans=0.125 2024-09-20 07:43:15,779 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=852968.6666666666, ans=0.125 2024-09-20 07:43:15,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=852968.6666666666, ans=0.125 2024-09-20 07:43:20,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=852968.6666666666, ans=0.125 2024-09-20 07:43:25,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=853015.3333333334, ans=0.125 2024-09-20 07:43:27,025 INFO [train.py:1198] (1/2) Epoch 48, batch 600, loss[loss=0.2207, simple_loss=0.2792, pruned_loss=0.0595, ctc_loss=0.1287, cr_loss=0.4375, over 34219.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2585, pruned_loss=0.0511, ctc_loss=0.1112, cr_loss=0.3839, over 6430737.38 frames. ], batch size: 117, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:43:42,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=853015.3333333334, ans=0.0 2024-09-20 07:43:55,925 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=853062.0, ans=0.125 2024-09-20 07:44:09,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=853108.6666666666, ans=0.0 2024-09-20 07:44:21,751 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.095e+02 2.592e+02 3.096e+02 3.720e+02 6.433e+02, threshold=6.192e+02, percent-clipped=5.0 2024-09-20 07:44:24,604 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.58 vs. limit=15.0 2024-09-20 07:44:30,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=853155.3333333334, ans=0.125 2024-09-20 07:44:38,801 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2024-09-20 07:44:43,293 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=853202.0, ans=0.07 2024-09-20 07:44:44,976 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=853202.0, ans=0.125 2024-09-20 07:44:51,215 INFO [train.py:1198] (1/2) Epoch 48, batch 650, loss[loss=0.1958, simple_loss=0.2571, pruned_loss=0.04906, ctc_loss=0.107, cr_loss=0.3737, over 34529.00 frames. ], tot_loss[loss=0.198, simple_loss=0.2575, pruned_loss=0.05063, ctc_loss=0.1103, cr_loss=0.3815, over 6522103.63 frames. ], batch size: 94, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:45:04,906 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:45:08,330 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=15.0 2024-09-20 07:45:15,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.06 vs. limit=15.0 2024-09-20 07:45:55,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=853435.3333333334, ans=0.5 2024-09-20 07:46:07,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=853435.3333333334, ans=0.025 2024-09-20 07:46:15,751 INFO [train.py:1198] (1/2) Epoch 48, batch 700, loss[loss=0.1943, simple_loss=0.2459, pruned_loss=0.05256, ctc_loss=0.1114, cr_loss=0.3811, over 34561.00 frames. ], tot_loss[loss=0.1987, simple_loss=0.2579, pruned_loss=0.05098, ctc_loss=0.1109, cr_loss=0.3831, over 6578815.65 frames. ], batch size: 89, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:46:16,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.scale_min, batch_count=853482.0, ans=0.2 2024-09-20 07:46:17,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=853482.0, ans=0.0 2024-09-20 07:46:54,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=853575.3333333334, ans=0.1 2024-09-20 07:47:00,823 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=853575.3333333334, ans=0.125 2024-09-20 07:47:08,570 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.183e+02 2.573e+02 2.804e+02 3.563e+02 5.886e+02, threshold=5.608e+02, percent-clipped=0.0 2024-09-20 07:47:32,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=853668.6666666666, ans=0.0 2024-09-20 07:47:40,111 INFO [train.py:1198] (1/2) Epoch 48, batch 750, loss[loss=0.1993, simple_loss=0.2598, pruned_loss=0.051, ctc_loss=0.1088, cr_loss=0.3776, over 34428.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2576, pruned_loss=0.05076, ctc_loss=0.1105, cr_loss=0.3817, over 6622031.60 frames. ], batch size: 95, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:47:56,722 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=853762.0, ans=0.0 2024-09-20 07:47:58,335 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=853762.0, ans=0.125 2024-09-20 07:48:01,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=853762.0, ans=0.1 2024-09-20 07:48:20,047 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.self_attn2.whiten, num_groups=1, num_channels=768, metric=12.44 vs. limit=22.5 2024-09-20 07:49:01,711 INFO [train.py:1198] (1/2) Epoch 48, batch 800, loss[loss=0.177, simple_loss=0.2339, pruned_loss=0.04346, ctc_loss=0.09658, cr_loss=0.3488, over 34422.00 frames. ], tot_loss[loss=0.1982, simple_loss=0.2575, pruned_loss=0.05073, ctc_loss=0.1104, cr_loss=0.3816, over 6658466.66 frames. ], batch size: 85, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:49:23,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=853995.3333333334, ans=0.2 2024-09-20 07:49:31,850 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=853995.3333333334, ans=0.125 2024-09-20 07:49:56,495 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.129e+02 2.507e+02 3.038e+02 3.733e+02 6.442e+02, threshold=6.076e+02, percent-clipped=2.0 2024-09-20 07:50:08,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=854135.3333333334, ans=0.0 2024-09-20 07:50:09,732 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=854135.3333333334, ans=0.1 2024-09-20 07:50:11,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=854135.3333333334, ans=0.0 2024-09-20 07:50:17,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=854135.3333333334, ans=0.0 2024-09-20 07:50:20,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=854135.3333333334, ans=0.025 2024-09-20 07:50:24,308 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=854182.0, ans=0.2 2024-09-20 07:50:25,522 INFO [train.py:1198] (1/2) Epoch 48, batch 850, loss[loss=0.1992, simple_loss=0.2625, pruned_loss=0.04961, ctc_loss=0.1084, cr_loss=0.3727, over 34372.00 frames. ], tot_loss[loss=0.1982, simple_loss=0.2576, pruned_loss=0.05074, ctc_loss=0.1104, cr_loss=0.3816, over 6691914.58 frames. ], batch size: 103, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:50:29,133 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=854182.0, ans=0.05 2024-09-20 07:50:35,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=854182.0, ans=0.0 2024-09-20 07:50:50,132 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=854228.6666666666, ans=0.0 2024-09-20 07:50:53,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=854228.6666666666, ans=0.0 2024-09-20 07:51:38,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=854368.6666666666, ans=0.2 2024-09-20 07:51:44,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=854368.6666666666, ans=0.125 2024-09-20 07:51:49,296 INFO [train.py:1198] (1/2) Epoch 48, batch 900, loss[loss=0.1649, simple_loss=0.2268, pruned_loss=0.03681, ctc_loss=0.08501, cr_loss=0.3084, over 34444.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.2578, pruned_loss=0.05086, ctc_loss=0.1107, cr_loss=0.3821, over 6698514.64 frames. ], batch size: 85, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:51:52,768 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=854415.3333333334, ans=0.0 2024-09-20 07:52:17,242 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=854462.0, ans=0.125 2024-09-20 07:52:23,961 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=854508.6666666666, ans=0.125 2024-09-20 07:52:32,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=854508.6666666666, ans=0.04949747468305833 2024-09-20 07:52:35,524 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=854508.6666666666, ans=0.0 2024-09-20 07:52:35,728 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=854508.6666666666, ans=0.0 2024-09-20 07:52:41,857 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.310e+02 2.754e+02 3.294e+02 4.067e+02 7.814e+02, threshold=6.587e+02, percent-clipped=1.0 2024-09-20 07:52:42,536 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.66 vs. limit=15.0 2024-09-20 07:52:48,840 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 07:52:52,993 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.20 vs. limit=15.0 2024-09-20 07:52:55,438 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=854602.0, ans=0.125 2024-09-20 07:53:01,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854602.0, ans=0.1 2024-09-20 07:53:11,291 INFO [train.py:1198] (1/2) Epoch 48, batch 950, loss[loss=0.1864, simple_loss=0.2449, pruned_loss=0.04633, ctc_loss=0.1026, cr_loss=0.368, over 34714.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.2579, pruned_loss=0.05082, ctc_loss=0.1107, cr_loss=0.382, over 6700568.28 frames. ], batch size: 87, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:53:14,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=854648.6666666666, ans=0.125 2024-09-20 07:53:16,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=854648.6666666666, ans=0.0 2024-09-20 07:53:51,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=854742.0, ans=0.125 2024-09-20 07:54:08,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.46 vs. limit=15.0 2024-09-20 07:54:32,821 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.64 vs. limit=15.0 2024-09-20 07:54:34,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=854882.0, ans=0.1 2024-09-20 07:54:35,468 INFO [train.py:1198] (1/2) Epoch 48, batch 1000, loss[loss=0.2012, simple_loss=0.2586, pruned_loss=0.05345, ctc_loss=0.1101, cr_loss=0.3734, over 34496.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2586, pruned_loss=0.05115, ctc_loss=0.1113, cr_loss=0.3831, over 6693249.91 frames. ], batch size: 90, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:54:42,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=854882.0, ans=0.125 2024-09-20 07:54:50,881 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=854928.6666666666, ans=0.1 2024-09-20 07:54:56,997 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=22.5 2024-09-20 07:55:05,419 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.20 vs. limit=22.5 2024-09-20 07:55:11,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=854975.3333333334, ans=0.125 2024-09-20 07:55:30,767 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.213e+02 2.563e+02 2.825e+02 3.272e+02 5.059e+02, threshold=5.650e+02, percent-clipped=0.0 2024-09-20 07:55:37,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=855022.0, ans=0.2 2024-09-20 07:55:39,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=855022.0, ans=0.125 2024-09-20 07:55:57,336 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=855068.6666666666, ans=0.2 2024-09-20 07:55:57,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=855068.6666666666, ans=0.1 2024-09-20 07:56:00,284 INFO [train.py:1198] (1/2) Epoch 48, batch 1050, loss[loss=0.2003, simple_loss=0.2627, pruned_loss=0.05038, ctc_loss=0.1094, cr_loss=0.382, over 34592.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2583, pruned_loss=0.05119, ctc_loss=0.1113, cr_loss=0.3828, over 6702858.14 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:56:15,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=855162.0, ans=0.1 2024-09-20 07:56:18,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=855162.0, ans=0.125 2024-09-20 07:56:42,590 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.42 vs. limit=15.0 2024-09-20 07:56:56,619 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=855255.3333333334, ans=0.025 2024-09-20 07:57:18,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=855302.0, ans=15.0 2024-09-20 07:57:24,556 INFO [train.py:1198] (1/2) Epoch 48, batch 1100, loss[loss=0.191, simple_loss=0.2479, pruned_loss=0.04917, ctc_loss=0.1044, cr_loss=0.3691, over 34741.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.258, pruned_loss=0.05102, ctc_loss=0.1108, cr_loss=0.3823, over 6714692.99 frames. ], batch size: 92, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:58:01,409 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=855442.0, ans=0.1 2024-09-20 07:58:07,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=855442.0, ans=0.2 2024-09-20 07:58:12,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=855488.6666666666, ans=0.0 2024-09-20 07:58:17,441 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.170e+02 2.555e+02 2.966e+02 3.616e+02 5.770e+02, threshold=5.933e+02, percent-clipped=1.0 2024-09-20 07:58:26,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=855488.6666666666, ans=0.5 2024-09-20 07:58:32,656 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=855535.3333333334, ans=0.125 2024-09-20 07:58:47,055 INFO [train.py:1198] (1/2) Epoch 48, batch 1150, loss[loss=0.193, simple_loss=0.2533, pruned_loss=0.04824, ctc_loss=0.1084, cr_loss=0.363, over 34350.00 frames. ], tot_loss[loss=0.1989, simple_loss=0.2581, pruned_loss=0.05113, ctc_loss=0.1111, cr_loss=0.3831, over 6712509.49 frames. ], batch size: 91, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 07:58:52,798 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=855582.0, ans=0.1 2024-09-20 07:58:56,158 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=855582.0, ans=0.0 2024-09-20 07:59:07,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=855628.6666666666, ans=0.0 2024-09-20 07:59:39,224 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=855722.0, ans=0.125 2024-09-20 07:59:45,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=855722.0, ans=0.1 2024-09-20 07:59:59,302 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.94 vs. limit=22.5 2024-09-20 08:00:07,907 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.01 vs. limit=15.0 2024-09-20 08:00:11,635 INFO [train.py:1198] (1/2) Epoch 48, batch 1200, loss[loss=0.1964, simple_loss=0.2586, pruned_loss=0.04853, ctc_loss=0.1084, cr_loss=0.3854, over 34571.00 frames. ], tot_loss[loss=0.1993, simple_loss=0.2586, pruned_loss=0.05123, ctc_loss=0.1114, cr_loss=0.3834, over 6706217.36 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 08:00:47,627 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.81 vs. limit=10.0 2024-09-20 08:00:50,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=855908.6666666666, ans=0.0 2024-09-20 08:00:52,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=855908.6666666666, ans=0.125 2024-09-20 08:01:06,493 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.148e+02 2.601e+02 2.871e+02 3.524e+02 6.109e+02, threshold=5.743e+02, percent-clipped=1.0 2024-09-20 08:01:17,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.31 vs. limit=10.0 2024-09-20 08:01:35,956 INFO [train.py:1198] (1/2) Epoch 48, batch 1250, loss[loss=0.2159, simple_loss=0.2722, pruned_loss=0.05912, ctc_loss=0.1237, cr_loss=0.4192, over 34320.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2591, pruned_loss=0.05155, ctc_loss=0.112, cr_loss=0.3853, over 6740500.03 frames. ], batch size: 107, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 08:02:31,600 INFO [scaling.py:801] (1/2) Caught exception in Balancer backward: CUDA out of memory. Tried to allocate 3.77 GiB. GPU 1 has a total capacity of 79.17 GiB of which 3.62 GiB is free. Process 39810 has 75.54 GiB memory in use. Of the allocated memory 29.28 GiB is allocated by PyTorch, and 43.87 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables), size=[198, 384, 700, 19], will continue. 2024-09-20 08:02:45,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=856235.3333333334, ans=0.05 2024-09-20 08:03:00,043 INFO [train.py:1198] (1/2) Epoch 48, batch 1300, loss[loss=0.2015, simple_loss=0.2696, pruned_loss=0.04847, ctc_loss=0.1081, cr_loss=0.3702, over 33117.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2582, pruned_loss=0.05114, ctc_loss=0.1112, cr_loss=0.3833, over 6745096.20 frames. ], batch size: 130, lr: 2.48e-03, grad_scale: 16.0 2024-09-20 08:03:04,051 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.60 vs. limit=6.0 2024-09-20 08:03:25,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.26 vs. limit=22.5 2024-09-20 08:03:41,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=856375.3333333334, ans=0.1 2024-09-20 08:03:44,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=856375.3333333334, ans=0.0 2024-09-20 08:03:46,908 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.06 vs. limit=10.0 2024-09-20 08:03:54,252 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.149e+02 2.495e+02 2.817e+02 3.400e+02 7.266e+02, threshold=5.635e+02, percent-clipped=4.0 2024-09-20 08:03:56,215 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=856422.0, ans=0.5 2024-09-20 08:03:59,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=856422.0, ans=0.05 2024-09-20 08:04:19,199 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=856468.6666666666, ans=0.0 2024-09-20 08:04:22,082 INFO [train.py:1198] (1/2) Epoch 48, batch 1350, loss[loss=0.1991, simple_loss=0.2623, pruned_loss=0.04954, ctc_loss=0.1078, cr_loss=0.3819, over 34556.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2582, pruned_loss=0.05116, ctc_loss=0.1113, cr_loss=0.3833, over 6764445.51 frames. ], batch size: 94, lr: 2.48e-03, grad_scale: 16.0 2024-09-20 08:04:53,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=856562.0, ans=0.0 2024-09-20 08:05:01,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=856608.6666666666, ans=0.125 2024-09-20 08:05:01,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=856608.6666666666, ans=0.125 2024-09-20 08:05:06,168 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=856608.6666666666, ans=0.125 2024-09-20 08:05:45,201 INFO [train.py:1198] (1/2) Epoch 48, batch 1400, loss[loss=0.1751, simple_loss=0.231, pruned_loss=0.04301, ctc_loss=0.09539, cr_loss=0.3534, over 34271.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.2579, pruned_loss=0.05106, ctc_loss=0.111, cr_loss=0.3829, over 6776325.58 frames. ], batch size: 80, lr: 2.48e-03, grad_scale: 16.0 2024-09-20 08:05:48,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=856748.6666666666, ans=0.0 2024-09-20 08:06:42,007 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.253e+02 2.621e+02 3.155e+02 3.858e+02 6.713e+02, threshold=6.310e+02, percent-clipped=5.0 2024-09-20 08:06:42,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.94 vs. limit=22.5 2024-09-20 08:06:47,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=856888.6666666666, ans=0.125 2024-09-20 08:06:50,379 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=856888.6666666666, ans=0.1 2024-09-20 08:06:56,911 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=856935.3333333334, ans=0.0 2024-09-20 08:07:09,826 INFO [train.py:1198] (1/2) Epoch 48, batch 1450, loss[loss=0.2156, simple_loss=0.2721, pruned_loss=0.05878, ctc_loss=0.1257, cr_loss=0.4104, over 34464.00 frames. ], tot_loss[loss=0.1994, simple_loss=0.2586, pruned_loss=0.05125, ctc_loss=0.1114, cr_loss=0.3838, over 6772826.13 frames. ], batch size: 110, lr: 2.48e-03, grad_scale: 16.0 2024-09-20 08:07:14,240 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.50 vs. limit=15.0 2024-09-20 08:07:15,016 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=856982.0, ans=0.125 2024-09-20 08:07:45,196 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.79 vs. limit=22.5 2024-09-20 08:07:55,311 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=6.0 2024-09-20 08:08:05,751 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=857122.0, ans=0.0 2024-09-20 08:08:05,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=857122.0, ans=0.0 2024-09-20 08:08:09,547 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2024-09-20 08:08:11,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=4.34 vs. limit=10.0 2024-09-20 08:08:12,981 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.20 vs. limit=22.5 2024-09-20 08:08:33,463 INFO [train.py:1198] (1/2) Epoch 48, batch 1500, loss[loss=0.2039, simple_loss=0.2671, pruned_loss=0.05167, ctc_loss=0.1114, cr_loss=0.378, over 34445.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.259, pruned_loss=0.05138, ctc_loss=0.1117, cr_loss=0.3847, over 6774490.71 frames. ], batch size: 100, lr: 2.48e-03, grad_scale: 16.0 2024-09-20 08:08:35,498 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=857215.3333333334, ans=0.1 2024-09-20 08:08:38,856 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=857215.3333333334, ans=0.125 2024-09-20 08:08:39,185 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2024-09-20 08:08:42,253 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=857215.3333333334, ans=0.0 2024-09-20 08:08:48,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=857262.0, ans=0.0 2024-09-20 08:08:57,459 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.06 vs. limit=15.0 2024-09-20 08:09:05,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=857308.6666666666, ans=0.125 2024-09-20 08:09:23,833 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=857355.3333333334, ans=0.2 2024-09-20 08:09:28,578 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.265e+02 2.615e+02 2.928e+02 3.663e+02 5.477e+02, threshold=5.856e+02, percent-clipped=0.0 2024-09-20 08:09:37,526 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.587e-02 2024-09-20 08:09:43,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=857402.0, ans=0.0 2024-09-20 08:09:48,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=857402.0, ans=0.125 2024-09-20 08:09:56,547 INFO [train.py:1198] (1/2) Epoch 48, batch 1550, loss[loss=0.2099, simple_loss=0.2692, pruned_loss=0.0552, ctc_loss=0.1198, cr_loss=0.4076, over 34429.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.2591, pruned_loss=0.05151, ctc_loss=0.1119, cr_loss=0.3846, over 6746950.29 frames. ], batch size: 105, lr: 2.48e-03, grad_scale: 16.0 2024-09-20 08:10:34,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=857542.0, ans=0.1 2024-09-20 08:10:54,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=857588.6666666666, ans=0.05 2024-09-20 08:11:20,816 INFO [train.py:1198] (1/2) Epoch 48, batch 1600, loss[loss=0.2025, simple_loss=0.2681, pruned_loss=0.04976, ctc_loss=0.1107, cr_loss=0.3793, over 34552.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2587, pruned_loss=0.0515, ctc_loss=0.112, cr_loss=0.3848, over 6726492.97 frames. ], batch size: 99, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 08:11:56,270 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:11:57,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=857775.3333333334, ans=0.0 2024-09-20 08:12:17,312 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.146e+02 2.572e+02 2.959e+02 3.569e+02 7.340e+02, threshold=5.918e+02, percent-clipped=2.0 2024-09-20 08:12:29,141 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=857868.6666666666, ans=0.0 2024-09-20 08:12:45,005 INFO [train.py:1198] (1/2) Epoch 48, batch 1650, loss[loss=0.2024, simple_loss=0.2674, pruned_loss=0.04998, ctc_loss=0.1102, cr_loss=0.3844, over 34383.00 frames. ], tot_loss[loss=0.1995, simple_loss=0.2586, pruned_loss=0.05135, ctc_loss=0.1116, cr_loss=0.3835, over 6720469.53 frames. ], batch size: 103, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 08:12:52,494 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=6.62 vs. limit=15.0 2024-09-20 08:13:12,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=857962.0, ans=0.1 2024-09-20 08:13:22,793 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=858008.6666666666, ans=0.0 2024-09-20 08:13:44,111 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=858055.3333333334, ans=0.125 2024-09-20 08:13:44,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=858055.3333333334, ans=0.2 2024-09-20 08:13:55,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=858102.0, ans=0.125 2024-09-20 08:14:08,552 INFO [train.py:1198] (1/2) Epoch 48, batch 1700, loss[loss=0.1644, simple_loss=0.2247, pruned_loss=0.03706, ctc_loss=0.08546, cr_loss=0.3243, over 34287.00 frames. ], tot_loss[loss=0.1993, simple_loss=0.2585, pruned_loss=0.05126, ctc_loss=0.1114, cr_loss=0.3836, over 6743696.58 frames. ], batch size: 80, lr: 2.48e-03, grad_scale: 32.0 2024-09-20 08:14:19,442 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.50 vs. limit=12.0 2024-09-20 08:14:20,516 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=858148.6666666666, ans=0.2 2024-09-20 08:14:43,634 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.43 vs. limit=22.5 2024-09-20 08:14:48,092 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=858242.0, ans=0.0 2024-09-20 08:15:01,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=858288.6666666666, ans=0.025 2024-09-20 08:15:02,375 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.243e+02 2.643e+02 3.242e+02 4.010e+02 7.411e+02, threshold=6.484e+02, percent-clipped=3.0 2024-09-20 08:15:10,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=858288.6666666666, ans=0.1 2024-09-20 08:15:13,282 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=9.96 vs. limit=22.5 2024-09-20 08:15:30,263 INFO [train.py:1198] (1/2) Epoch 48, batch 1750, loss[loss=0.1816, simple_loss=0.2374, pruned_loss=0.0461, ctc_loss=0.1006, cr_loss=0.335, over 34197.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2581, pruned_loss=0.05115, ctc_loss=0.1112, cr_loss=0.383, over 6751876.32 frames. ], batch size: 78, lr: 2.48e-03, grad_scale: 16.0 2024-09-20 08:16:00,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=858428.6666666666, ans=0.0 2024-09-20 08:16:13,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=858475.3333333334, ans=0.1 2024-09-20 08:16:20,815 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.90 vs. limit=15.0 2024-09-20 08:16:36,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=858568.6666666666, ans=0.2 2024-09-20 08:16:50,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=858568.6666666666, ans=0.0 2024-09-20 08:16:53,612 INFO [train.py:1198] (1/2) Epoch 48, batch 1800, loss[loss=0.199, simple_loss=0.2625, pruned_loss=0.04947, ctc_loss=0.1088, cr_loss=0.372, over 34698.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2582, pruned_loss=0.05116, ctc_loss=0.1114, cr_loss=0.3833, over 6754189.22 frames. ], batch size: 97, lr: 2.47e-03, grad_scale: 16.0 2024-09-20 08:17:08,607 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=858662.0, ans=0.125 2024-09-20 08:17:29,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=858662.0, ans=0.125 2024-09-20 08:17:50,398 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=858755.3333333334, ans=0.125 2024-09-20 08:17:50,566 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=858755.3333333334, ans=0.125 2024-09-20 08:17:57,155 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.133e+02 2.643e+02 3.107e+02 4.079e+02 5.829e+02, threshold=6.215e+02, percent-clipped=0.0 2024-09-20 08:18:10,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=858802.0, ans=0.125 2024-09-20 08:18:23,588 INFO [train.py:1198] (1/2) Epoch 48, batch 1850, loss[loss=0.2085, simple_loss=0.2753, pruned_loss=0.05185, ctc_loss=0.1125, cr_loss=0.3886, over 34447.00 frames. ], tot_loss[loss=0.1986, simple_loss=0.2579, pruned_loss=0.05088, ctc_loss=0.1108, cr_loss=0.3824, over 6762295.33 frames. ], batch size: 100, lr: 2.47e-03, grad_scale: 16.0 2024-09-20 08:18:54,421 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=22.5 2024-09-20 08:19:37,953 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.64 vs. limit=10.0 2024-09-20 08:19:44,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=859035.3333333334, ans=0.1 2024-09-20 08:19:46,877 INFO [train.py:1198] (1/2) Epoch 48, batch 1900, loss[loss=0.2104, simple_loss=0.2751, pruned_loss=0.05321, ctc_loss=0.1175, cr_loss=0.3957, over 34369.00 frames. ], tot_loss[loss=0.1993, simple_loss=0.2587, pruned_loss=0.05117, ctc_loss=0.1113, cr_loss=0.3835, over 6771917.71 frames. ], batch size: 103, lr: 2.47e-03, grad_scale: 16.0 2024-09-20 08:20:03,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=859128.6666666666, ans=0.0 2024-09-20 08:20:15,691 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.90 vs. limit=15.0 2024-09-20 08:20:42,749 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.225e+02 2.582e+02 2.930e+02 3.587e+02 7.476e+02, threshold=5.861e+02, percent-clipped=1.0 2024-09-20 08:20:49,724 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=859222.0, ans=0.2 2024-09-20 08:21:09,017 INFO [train.py:1198] (1/2) Epoch 48, batch 1950, loss[loss=0.1942, simple_loss=0.2491, pruned_loss=0.05091, ctc_loss=0.1124, cr_loss=0.3751, over 34343.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2593, pruned_loss=0.05123, ctc_loss=0.1115, cr_loss=0.3844, over 6789514.24 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 16.0 2024-09-20 08:21:26,011 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=859362.0, ans=0.0 2024-09-20 08:21:29,704 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.94 vs. limit=15.0 2024-09-20 08:22:28,808 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.01 vs. limit=22.5 2024-09-20 08:22:29,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=859502.0, ans=0.125 2024-09-20 08:22:31,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=859548.6666666666, ans=0.2 2024-09-20 08:22:32,351 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.71 vs. limit=10.0 2024-09-20 08:22:32,858 INFO [train.py:1198] (1/2) Epoch 48, batch 2000, loss[loss=0.1776, simple_loss=0.2333, pruned_loss=0.0441, ctc_loss=0.09772, cr_loss=0.3542, over 34217.00 frames. ], tot_loss[loss=0.2003, simple_loss=0.26, pruned_loss=0.05142, ctc_loss=0.1119, cr_loss=0.3855, over 6764272.93 frames. ], batch size: 78, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:22:40,685 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.09 vs. limit=10.0 2024-09-20 08:23:06,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=859642.0, ans=0.125 2024-09-20 08:23:31,047 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.144e+02 2.553e+02 3.092e+02 3.787e+02 6.473e+02, threshold=6.185e+02, percent-clipped=2.0 2024-09-20 08:23:41,161 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=859735.3333333334, ans=0.2 2024-09-20 08:23:46,049 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:23:57,072 INFO [train.py:1198] (1/2) Epoch 48, batch 2050, loss[loss=0.1833, simple_loss=0.2413, pruned_loss=0.04551, ctc_loss=0.09995, cr_loss=0.3594, over 34453.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.2591, pruned_loss=0.05133, ctc_loss=0.1117, cr_loss=0.3849, over 6754605.16 frames. ], batch size: 82, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:24:05,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=859782.0, ans=0.125 2024-09-20 08:24:05,771 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=859782.0, ans=0.2 2024-09-20 08:24:15,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=859828.6666666666, ans=0.2 2024-09-20 08:24:18,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=859828.6666666666, ans=0.125 2024-09-20 08:24:31,744 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=859875.3333333334, ans=0.2 2024-09-20 08:25:19,111 INFO [train.py:1198] (1/2) Epoch 48, batch 2100, loss[loss=0.1905, simple_loss=0.254, pruned_loss=0.04623, ctc_loss=0.1004, cr_loss=0.3588, over 34532.00 frames. ], tot_loss[loss=0.1995, simple_loss=0.2588, pruned_loss=0.05122, ctc_loss=0.1115, cr_loss=0.3845, over 6769404.43 frames. ], batch size: 94, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:25:22,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=860015.3333333334, ans=0.2 2024-09-20 08:25:29,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=860015.3333333334, ans=0.0 2024-09-20 08:25:42,818 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.25 vs. limit=22.5 2024-09-20 08:25:45,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=860062.0, ans=0.125 2024-09-20 08:26:09,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=860155.3333333334, ans=0.125 2024-09-20 08:26:16,010 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.147e+02 2.595e+02 3.150e+02 3.957e+02 7.100e+02, threshold=6.300e+02, percent-clipped=5.0 2024-09-20 08:26:35,195 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.79 vs. limit=22.5 2024-09-20 08:26:39,307 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=860202.0, ans=0.025 2024-09-20 08:26:42,233 INFO [train.py:1198] (1/2) Epoch 48, batch 2150, loss[loss=0.1989, simple_loss=0.2545, pruned_loss=0.05258, ctc_loss=0.1115, cr_loss=0.3938, over 34324.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.2579, pruned_loss=0.05081, ctc_loss=0.1107, cr_loss=0.3827, over 6788641.23 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:26:55,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=860248.6666666666, ans=10.0 2024-09-20 08:27:05,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=860295.3333333334, ans=0.0 2024-09-20 08:27:52,083 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.67 vs. limit=22.5 2024-09-20 08:28:05,926 INFO [train.py:1198] (1/2) Epoch 48, batch 2200, loss[loss=0.2041, simple_loss=0.2689, pruned_loss=0.05076, ctc_loss=0.1122, cr_loss=0.3822, over 34453.00 frames. ], tot_loss[loss=0.1987, simple_loss=0.2581, pruned_loss=0.05087, ctc_loss=0.1108, cr_loss=0.3833, over 6782824.49 frames. ], batch size: 100, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:28:06,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=860482.0, ans=0.125 2024-09-20 08:28:07,928 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=860482.0, ans=0.0 2024-09-20 08:28:22,693 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=860528.6666666666, ans=0.125 2024-09-20 08:28:32,743 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=860528.6666666666, ans=0.125 2024-09-20 08:28:48,206 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=6.33 vs. limit=15.0 2024-09-20 08:29:01,924 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.097e+02 2.708e+02 3.184e+02 3.879e+02 6.375e+02, threshold=6.368e+02, percent-clipped=1.0 2024-09-20 08:29:07,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=860622.0, ans=0.1 2024-09-20 08:29:20,893 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=860668.6666666666, ans=0.125 2024-09-20 08:29:22,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=860668.6666666666, ans=0.125 2024-09-20 08:29:24,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=860668.6666666666, ans=0.125 2024-09-20 08:29:30,357 INFO [train.py:1198] (1/2) Epoch 48, batch 2250, loss[loss=0.207, simple_loss=0.2686, pruned_loss=0.05325, ctc_loss=0.1143, cr_loss=0.4027, over 34410.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.2583, pruned_loss=0.05095, ctc_loss=0.1108, cr_loss=0.3829, over 6780851.28 frames. ], batch size: 95, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:29:53,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=860762.0, ans=0.125 2024-09-20 08:30:11,330 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=860808.6666666666, ans=0.0 2024-09-20 08:30:23,024 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:30:44,672 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.49 vs. limit=22.5 2024-09-20 08:30:45,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=860902.0, ans=0.0 2024-09-20 08:30:50,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=860902.0, ans=0.125 2024-09-20 08:30:53,625 INFO [train.py:1198] (1/2) Epoch 48, batch 2300, loss[loss=0.1754, simple_loss=0.2339, pruned_loss=0.04227, ctc_loss=0.09347, cr_loss=0.3407, over 34280.00 frames. ], tot_loss[loss=0.1977, simple_loss=0.2571, pruned_loss=0.05052, ctc_loss=0.11, cr_loss=0.381, over 6766651.10 frames. ], batch size: 83, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:31:03,857 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=860948.6666666666, ans=0.07 2024-09-20 08:31:08,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=860995.3333333334, ans=0.0 2024-09-20 08:31:49,396 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.166e+02 2.583e+02 2.949e+02 3.812e+02 5.446e+02, threshold=5.897e+02, percent-clipped=0.0 2024-09-20 08:31:51,780 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.26 vs. limit=15.0 2024-09-20 08:32:00,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.89 vs. limit=15.0 2024-09-20 08:32:15,552 INFO [train.py:1198] (1/2) Epoch 48, batch 2350, loss[loss=0.2135, simple_loss=0.272, pruned_loss=0.05673, ctc_loss=0.1221, cr_loss=0.4286, over 34698.00 frames. ], tot_loss[loss=0.1986, simple_loss=0.2578, pruned_loss=0.05093, ctc_loss=0.1108, cr_loss=0.383, over 6773580.19 frames. ], batch size: 97, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:33:18,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=861322.0, ans=0.0 2024-09-20 08:33:20,483 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:33:23,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=861368.6666666666, ans=0.125 2024-09-20 08:33:40,070 INFO [train.py:1198] (1/2) Epoch 48, batch 2400, loss[loss=0.1927, simple_loss=0.2498, pruned_loss=0.04982, ctc_loss=0.1076, cr_loss=0.3594, over 34573.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2583, pruned_loss=0.05113, ctc_loss=0.1111, cr_loss=0.3843, over 6777618.90 frames. ], batch size: 89, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:33:51,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=861415.3333333334, ans=0.1 2024-09-20 08:34:02,272 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer1.prob, batch_count=861462.0, ans=0.125 2024-09-20 08:34:29,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=861508.6666666666, ans=0.0 2024-09-20 08:34:38,516 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.253e+02 2.607e+02 2.843e+02 3.783e+02 5.653e+02, threshold=5.685e+02, percent-clipped=0.0 2024-09-20 08:34:45,874 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=3.72 vs. limit=10.0 2024-09-20 08:35:04,970 INFO [train.py:1198] (1/2) Epoch 48, batch 2450, loss[loss=0.2128, simple_loss=0.2735, pruned_loss=0.05567, ctc_loss=0.1209, cr_loss=0.4114, over 34382.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2591, pruned_loss=0.05134, ctc_loss=0.1117, cr_loss=0.3852, over 6752768.42 frames. ], batch size: 95, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:35:34,903 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=861695.3333333334, ans=0.0 2024-09-20 08:36:09,236 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:36:26,842 INFO [train.py:1198] (1/2) Epoch 48, batch 2500, loss[loss=0.2038, simple_loss=0.2665, pruned_loss=0.05139, ctc_loss=0.114, cr_loss=0.3895, over 34480.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.259, pruned_loss=0.05137, ctc_loss=0.1117, cr_loss=0.3849, over 6764643.07 frames. ], batch size: 100, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:36:37,046 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=861882.0, ans=0.125 2024-09-20 08:36:40,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=861882.0, ans=0.1 2024-09-20 08:36:41,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=768, metric=4.31 vs. limit=12.0 2024-09-20 08:36:41,601 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2024-09-20 08:36:43,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=861928.6666666666, ans=0.125 2024-09-20 08:36:47,106 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=861928.6666666666, ans=0.125 2024-09-20 08:37:02,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=861975.3333333334, ans=0.2 2024-09-20 08:37:10,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=861975.3333333334, ans=0.125 2024-09-20 08:37:17,618 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.91 vs. limit=6.0 2024-09-20 08:37:20,623 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=862022.0, ans=0.2 2024-09-20 08:37:24,991 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.233e+02 2.526e+02 2.800e+02 3.387e+02 4.797e+02, threshold=5.599e+02, percent-clipped=0.0 2024-09-20 08:37:26,117 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.35 vs. limit=15.0 2024-09-20 08:37:51,336 INFO [train.py:1198] (1/2) Epoch 48, batch 2550, loss[loss=0.1784, simple_loss=0.2317, pruned_loss=0.04573, ctc_loss=0.09802, cr_loss=0.3504, over 34158.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.259, pruned_loss=0.05145, ctc_loss=0.1118, cr_loss=0.3846, over 6767933.15 frames. ], batch size: 78, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:37:57,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.97 vs. limit=15.0 2024-09-20 08:37:58,841 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.61 vs. limit=15.0 2024-09-20 08:38:11,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=862162.0, ans=0.1 2024-09-20 08:38:28,481 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=10.26 vs. limit=22.5 2024-09-20 08:38:36,514 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=15.0 2024-09-20 08:38:37,423 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=862208.6666666666, ans=0.125 2024-09-20 08:38:48,749 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=862255.3333333334, ans=0.0 2024-09-20 08:39:06,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=862302.0, ans=0.05 2024-09-20 08:39:14,818 INFO [train.py:1198] (1/2) Epoch 48, batch 2600, loss[loss=0.2032, simple_loss=0.2596, pruned_loss=0.05443, ctc_loss=0.1142, cr_loss=0.3759, over 34351.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2592, pruned_loss=0.05148, ctc_loss=0.1119, cr_loss=0.3848, over 6764163.03 frames. ], batch size: 91, lr: 2.47e-03, grad_scale: 16.0 2024-09-20 08:39:16,738 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=862348.6666666666, ans=0.1 2024-09-20 08:40:12,530 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.198e+02 2.577e+02 2.901e+02 3.642e+02 5.932e+02, threshold=5.802e+02, percent-clipped=2.0 2024-09-20 08:40:20,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.prob, batch_count=862535.3333333334, ans=0.125 2024-09-20 08:40:34,104 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=862535.3333333334, ans=0.125 2024-09-20 08:40:39,033 INFO [train.py:1198] (1/2) Epoch 48, batch 2650, loss[loss=0.2253, simple_loss=0.2847, pruned_loss=0.06094, ctc_loss=0.1325, cr_loss=0.4366, over 34232.00 frames. ], tot_loss[loss=0.1997, simple_loss=0.2593, pruned_loss=0.05127, ctc_loss=0.1115, cr_loss=0.3845, over 6771188.14 frames. ], batch size: 117, lr: 2.47e-03, grad_scale: 16.0 2024-09-20 08:40:50,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=862582.0, ans=0.1 2024-09-20 08:41:09,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=862628.6666666666, ans=0.125 2024-09-20 08:41:12,846 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.22 vs. limit=15.0 2024-09-20 08:41:42,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=862722.0, ans=0.0 2024-09-20 08:42:01,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=862815.3333333334, ans=0.2 2024-09-20 08:42:02,926 INFO [train.py:1198] (1/2) Epoch 48, batch 2700, loss[loss=0.2108, simple_loss=0.2748, pruned_loss=0.05367, ctc_loss=0.1172, cr_loss=0.3989, over 34591.00 frames. ], tot_loss[loss=0.1996, simple_loss=0.2594, pruned_loss=0.05114, ctc_loss=0.1114, cr_loss=0.3842, over 6766275.78 frames. ], batch size: 102, lr: 2.47e-03, grad_scale: 16.0 2024-09-20 08:42:08,356 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=862815.3333333334, ans=0.125 2024-09-20 08:42:31,215 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.518e-02 2024-09-20 08:42:36,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=862908.6666666666, ans=0.125 2024-09-20 08:43:00,459 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.114e+02 2.605e+02 2.954e+02 3.842e+02 6.458e+02, threshold=5.909e+02, percent-clipped=3.0 2024-09-20 08:43:02,464 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:43:02,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=862955.3333333334, ans=0.2 2024-09-20 08:43:05,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=862955.3333333334, ans=0.025 2024-09-20 08:43:07,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=863002.0, ans=0.0 2024-09-20 08:43:25,415 INFO [train.py:1198] (1/2) Epoch 48, batch 2750, loss[loss=0.2095, simple_loss=0.2634, pruned_loss=0.05721, ctc_loss=0.1241, cr_loss=0.4113, over 34645.00 frames. ], tot_loss[loss=0.1986, simple_loss=0.2582, pruned_loss=0.05079, ctc_loss=0.1108, cr_loss=0.3823, over 6762934.31 frames. ], batch size: 88, lr: 2.47e-03, grad_scale: 16.0 2024-09-20 08:43:25,766 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=863048.6666666666, ans=0.0 2024-09-20 08:43:27,390 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=863048.6666666666, ans=0.125 2024-09-20 08:43:29,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=863048.6666666666, ans=0.04949747468305833 2024-09-20 08:43:45,526 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=863095.3333333334, ans=0.0 2024-09-20 08:43:53,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=863095.3333333334, ans=0.125 2024-09-20 08:44:08,385 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=863142.0, ans=0.09899494936611666 2024-09-20 08:44:11,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863142.0, ans=0.1 2024-09-20 08:44:11,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=863142.0, ans=0.1 2024-09-20 08:44:33,801 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=863235.3333333334, ans=0.04949747468305833 2024-09-20 08:44:40,562 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=863235.3333333334, ans=0.0 2024-09-20 08:44:43,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=863235.3333333334, ans=0.0 2024-09-20 08:44:45,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=863235.3333333334, ans=0.125 2024-09-20 08:44:50,016 INFO [train.py:1198] (1/2) Epoch 48, batch 2800, loss[loss=0.2287, simple_loss=0.2788, pruned_loss=0.0671, ctc_loss=0.14, cr_loss=0.412, over 23285.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.2581, pruned_loss=0.05096, ctc_loss=0.1111, cr_loss=0.3827, over 6740495.30 frames. ], batch size: 244, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:44:55,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=863282.0, ans=0.0 2024-09-20 08:45:22,172 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=863328.6666666666, ans=0.125 2024-09-20 08:45:29,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=863375.3333333334, ans=0.0 2024-09-20 08:45:49,923 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.138e+02 2.598e+02 2.914e+02 3.477e+02 1.065e+03, threshold=5.828e+02, percent-clipped=3.0 2024-09-20 08:45:52,741 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=9.02 vs. limit=15.0 2024-09-20 08:46:13,329 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=863515.3333333334, ans=0.1 2024-09-20 08:46:14,638 INFO [train.py:1198] (1/2) Epoch 48, batch 2850, loss[loss=0.197, simple_loss=0.2516, pruned_loss=0.05253, ctc_loss=0.1111, cr_loss=0.3786, over 34488.00 frames. ], tot_loss[loss=0.1995, simple_loss=0.2588, pruned_loss=0.05128, ctc_loss=0.1116, cr_loss=0.3838, over 6724636.11 frames. ], batch size: 90, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:46:28,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=863515.3333333334, ans=0.125 2024-09-20 08:46:31,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863562.0, ans=0.1 2024-09-20 08:47:09,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=863655.3333333334, ans=0.125 2024-09-20 08:47:31,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=863702.0, ans=0.125 2024-09-20 08:47:36,533 INFO [train.py:1198] (1/2) Epoch 48, batch 2900, loss[loss=0.2053, simple_loss=0.2627, pruned_loss=0.05407, ctc_loss=0.1184, cr_loss=0.4014, over 34549.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2594, pruned_loss=0.05142, ctc_loss=0.112, cr_loss=0.3854, over 6754654.10 frames. ], batch size: 94, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:47:45,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=12.0 2024-09-20 08:47:49,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=863748.6666666666, ans=0.1 2024-09-20 08:48:19,919 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:48:23,159 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=863842.0, ans=0.125 2024-09-20 08:48:35,729 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.203e+02 2.630e+02 3.207e+02 3.924e+02 6.083e+02, threshold=6.415e+02, percent-clipped=2.0 2024-09-20 08:48:42,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=863935.3333333334, ans=0.2 2024-09-20 08:49:02,179 INFO [train.py:1198] (1/2) Epoch 48, batch 2950, loss[loss=0.1956, simple_loss=0.2492, pruned_loss=0.05169, ctc_loss=0.1149, cr_loss=0.3924, over 34636.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2583, pruned_loss=0.05104, ctc_loss=0.1112, cr_loss=0.3831, over 6750353.25 frames. ], batch size: 88, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:49:30,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=864028.6666666666, ans=0.125 2024-09-20 08:49:30,352 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=864028.6666666666, ans=0.0 2024-09-20 08:49:48,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=864075.3333333334, ans=0.1 2024-09-20 08:49:48,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=864075.3333333334, ans=0.1 2024-09-20 08:49:49,457 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.54 vs. limit=15.0 2024-09-20 08:49:56,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=864122.0, ans=0.125 2024-09-20 08:50:24,549 INFO [train.py:1198] (1/2) Epoch 48, batch 3000, loss[loss=0.2021, simple_loss=0.2554, pruned_loss=0.05485, ctc_loss=0.1166, cr_loss=0.3943, over 34535.00 frames. ], tot_loss[loss=0.1987, simple_loss=0.258, pruned_loss=0.05089, ctc_loss=0.1111, cr_loss=0.3832, over 6750299.33 frames. ], batch size: 94, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:50:24,549 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 08:50:41,331 INFO [train.py:1230] (1/2) Epoch 48, validation: loss=0.1492, simple_loss=0.242, pruned_loss=0.0243, ctc_loss=0.03919, cr_loss=2.351e-14, over 944034.00 frames. 2024-09-20 08:50:41,331 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-20 08:51:38,614 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.156e+02 2.582e+02 2.854e+02 3.530e+02 7.595e+02, threshold=5.708e+02, percent-clipped=2.0 2024-09-20 08:51:40,577 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=864355.3333333334, ans=0.2 2024-09-20 08:51:42,114 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=864355.3333333334, ans=0.1 2024-09-20 08:51:58,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=864402.0, ans=0.125 2024-09-20 08:52:02,895 INFO [train.py:1198] (1/2) Epoch 48, batch 3050, loss[loss=0.1995, simple_loss=0.2568, pruned_loss=0.05233, ctc_loss=0.1128, cr_loss=0.3758, over 34581.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2588, pruned_loss=0.05103, ctc_loss=0.1113, cr_loss=0.3838, over 6743108.40 frames. ], batch size: 89, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:52:11,231 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=864448.6666666666, ans=0.035 2024-09-20 08:52:16,918 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.76 vs. limit=10.0 2024-09-20 08:52:25,460 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=15.0 2024-09-20 08:52:28,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=864495.3333333334, ans=0.0 2024-09-20 08:52:39,409 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 08:52:50,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=864542.0, ans=0.0 2024-09-20 08:52:53,882 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=864588.6666666666, ans=0.125 2024-09-20 08:53:05,740 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=12.0 2024-09-20 08:53:11,436 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=864635.3333333334, ans=0.125 2024-09-20 08:53:16,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=864635.3333333334, ans=0.0 2024-09-20 08:53:17,860 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module2.balancer2.prob, batch_count=864635.3333333334, ans=0.125 2024-09-20 08:53:27,165 INFO [train.py:1198] (1/2) Epoch 48, batch 3100, loss[loss=0.2196, simple_loss=0.2782, pruned_loss=0.05938, ctc_loss=0.1275, cr_loss=0.4185, over 34238.00 frames. ], tot_loss[loss=0.1993, simple_loss=0.2587, pruned_loss=0.05116, ctc_loss=0.1115, cr_loss=0.3842, over 6743691.93 frames. ], batch size: 117, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:53:30,683 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=864682.0, ans=0.0 2024-09-20 08:53:43,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=864728.6666666666, ans=0.125 2024-09-20 08:54:13,101 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.48 vs. limit=15.0 2024-09-20 08:54:18,869 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=864822.0, ans=0.125 2024-09-20 08:54:23,503 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.225e+02 2.584e+02 2.945e+02 3.938e+02 7.738e+02, threshold=5.889e+02, percent-clipped=3.0 2024-09-20 08:54:37,566 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.88 vs. limit=15.0 2024-09-20 08:54:47,937 INFO [train.py:1198] (1/2) Epoch 48, batch 3150, loss[loss=0.2028, simple_loss=0.2682, pruned_loss=0.05, ctc_loss=0.1102, cr_loss=0.3848, over 33727.00 frames. ], tot_loss[loss=0.1994, simple_loss=0.2588, pruned_loss=0.05116, ctc_loss=0.1114, cr_loss=0.3841, over 6750210.88 frames. ], batch size: 122, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:54:59,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=864915.3333333334, ans=0.025 2024-09-20 08:55:15,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=864962.0, ans=0.125 2024-09-20 08:55:28,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=865008.6666666666, ans=0.125 2024-09-20 08:56:05,900 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.67 vs. limit=15.0 2024-09-20 08:56:08,515 INFO [train.py:1198] (1/2) Epoch 48, batch 3200, loss[loss=0.1943, simple_loss=0.2578, pruned_loss=0.04783, ctc_loss=0.1033, cr_loss=0.361, over 34557.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.2583, pruned_loss=0.05093, ctc_loss=0.1109, cr_loss=0.3821, over 6760533.01 frames. ], batch size: 94, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:56:09,173 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.75 vs. limit=10.0 2024-09-20 08:56:15,493 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=865148.6666666666, ans=0.0 2024-09-20 08:56:20,213 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=865148.6666666666, ans=0.1 2024-09-20 08:56:34,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.41 vs. limit=15.0 2024-09-20 08:56:43,169 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=865242.0, ans=0.125 2024-09-20 08:56:52,917 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=865242.0, ans=0.0 2024-09-20 08:56:54,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=865242.0, ans=0.125 2024-09-20 08:56:58,553 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2024-09-20 08:57:01,631 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2024-09-20 08:57:05,457 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.152e+02 2.514e+02 2.958e+02 3.666e+02 5.111e+02, threshold=5.916e+02, percent-clipped=0.0 2024-09-20 08:57:12,694 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2024-09-20 08:57:15,470 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=865335.3333333334, ans=0.125 2024-09-20 08:57:29,720 INFO [train.py:1198] (1/2) Epoch 48, batch 3250, loss[loss=0.1998, simple_loss=0.2647, pruned_loss=0.04873, ctc_loss=0.1101, cr_loss=0.385, over 34660.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2586, pruned_loss=0.05103, ctc_loss=0.1111, cr_loss=0.3826, over 6770301.71 frames. ], batch size: 98, lr: 2.47e-03, grad_scale: 32.0 2024-09-20 08:57:31,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff3_skip_rate, batch_count=865382.0, ans=0.0 2024-09-20 08:57:53,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=865428.6666666666, ans=0.2 2024-09-20 08:57:57,217 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=865428.6666666666, ans=0.125 2024-09-20 08:58:03,651 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=865475.3333333334, ans=0.2 2024-09-20 08:58:18,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=865522.0, ans=0.125 2024-09-20 08:58:40,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=865568.6666666666, ans=0.2 2024-09-20 08:58:44,138 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.86 vs. limit=15.0 2024-09-20 08:58:48,649 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=865568.6666666666, ans=0.125 2024-09-20 08:58:50,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=865615.3333333334, ans=0.025 2024-09-20 08:58:50,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=865615.3333333334, ans=0.125 2024-09-20 08:58:51,622 INFO [train.py:1198] (1/2) Epoch 48, batch 3300, loss[loss=0.2019, simple_loss=0.2671, pruned_loss=0.04981, ctc_loss=0.111, cr_loss=0.3695, over 33067.00 frames. ], tot_loss[loss=0.198, simple_loss=0.2575, pruned_loss=0.0506, ctc_loss=0.1102, cr_loss=0.3802, over 6768866.56 frames. ], batch size: 130, lr: 2.46e-03, grad_scale: 32.0 2024-09-20 08:58:55,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=865615.3333333334, ans=0.0 2024-09-20 08:59:30,312 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=865708.6666666666, ans=0.0 2024-09-20 08:59:37,389 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.45 vs. limit=15.0 2024-09-20 08:59:49,176 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.275e+02 2.550e+02 2.757e+02 3.460e+02 5.522e+02, threshold=5.515e+02, percent-clipped=0.0 2024-09-20 08:59:56,144 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=865802.0, ans=0.0 2024-09-20 08:59:59,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=865802.0, ans=0.1 2024-09-20 09:00:04,005 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=865802.0, ans=0.2 2024-09-20 09:00:07,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=865802.0, ans=0.0 2024-09-20 09:00:10,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=865802.0, ans=0.125 2024-09-20 09:00:12,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=865848.6666666666, ans=0.0 2024-09-20 09:00:13,359 INFO [train.py:1198] (1/2) Epoch 48, batch 3350, loss[loss=0.2142, simple_loss=0.2742, pruned_loss=0.05684, ctc_loss=0.1223, cr_loss=0.4015, over 33917.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.2579, pruned_loss=0.05085, ctc_loss=0.1107, cr_loss=0.3811, over 6742519.02 frames. ], batch size: 122, lr: 2.46e-03, grad_scale: 32.0 2024-09-20 09:00:13,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=865848.6666666666, ans=0.125 2024-09-20 09:00:24,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=865848.6666666666, ans=0.1 2024-09-20 09:00:30,632 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2024-09-20 09:00:33,955 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2024-09-20 09:00:49,015 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.035e-02 2024-09-20 09:00:49,677 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.39 vs. limit=15.0 2024-09-20 09:00:53,680 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=865942.0, ans=0.1 2024-09-20 09:00:55,319 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:01:15,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=866035.3333333334, ans=0.1 2024-09-20 09:01:33,258 INFO [train.py:1198] (1/2) Epoch 48, batch 3400, loss[loss=0.1722, simple_loss=0.2283, pruned_loss=0.04233, ctc_loss=0.09169, cr_loss=0.3276, over 34184.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.2577, pruned_loss=0.05095, ctc_loss=0.1108, cr_loss=0.3817, over 6732769.97 frames. ], batch size: 78, lr: 2.46e-03, grad_scale: 32.0 2024-09-20 09:01:54,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=866128.6666666666, ans=0.0 2024-09-20 09:01:57,847 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=866128.6666666666, ans=0.0 2024-09-20 09:01:58,409 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=3.91 vs. limit=15.0 2024-09-20 09:02:01,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2024-09-20 09:02:30,006 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.178e+02 2.498e+02 2.926e+02 3.422e+02 9.429e+02, threshold=5.852e+02, percent-clipped=4.0 2024-09-20 09:02:46,130 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=866268.6666666666, ans=0.2 2024-09-20 09:02:53,817 INFO [train.py:1198] (1/2) Epoch 48, batch 3450, loss[loss=0.193, simple_loss=0.2626, pruned_loss=0.04462, ctc_loss=0.09897, cr_loss=0.3607, over 33210.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.2579, pruned_loss=0.05081, ctc_loss=0.1108, cr_loss=0.3818, over 6745909.36 frames. ], batch size: 130, lr: 2.46e-03, grad_scale: 32.0 2024-09-20 09:03:04,250 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.57 vs. limit=15.0 2024-09-20 09:03:09,384 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.29 vs. limit=22.5 2024-09-20 09:03:17,775 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=866362.0, ans=0.2 2024-09-20 09:03:17,844 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=866362.0, ans=0.125 2024-09-20 09:03:22,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=866362.0, ans=0.2 2024-09-20 09:03:32,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.96 vs. limit=22.5 2024-09-20 09:03:58,858 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=866502.0, ans=0.125 2024-09-20 09:04:16,326 INFO [train.py:1198] (1/2) Epoch 48, batch 3500, loss[loss=0.1881, simple_loss=0.2454, pruned_loss=0.04808, ctc_loss=0.1048, cr_loss=0.3424, over 34494.00 frames. ], tot_loss[loss=0.1984, simple_loss=0.2577, pruned_loss=0.05079, ctc_loss=0.1107, cr_loss=0.382, over 6748405.38 frames. ], batch size: 85, lr: 2.46e-03, grad_scale: 32.0 2024-09-20 09:04:26,279 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=866548.6666666666, ans=0.025 2024-09-20 09:04:42,214 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=866595.3333333334, ans=10.0 2024-09-20 09:04:50,712 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.82 vs. limit=6.0 2024-09-20 09:04:57,131 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=12.0 2024-09-20 09:05:12,544 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.117e+02 2.535e+02 2.895e+02 3.661e+02 7.065e+02, threshold=5.789e+02, percent-clipped=1.0 2024-09-20 09:05:17,633 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=866688.6666666666, ans=0.1 2024-09-20 09:05:36,480 INFO [train.py:1198] (1/2) Epoch 48, batch 3550, loss[loss=0.205, simple_loss=0.2702, pruned_loss=0.05098, ctc_loss=0.1129, cr_loss=0.3782, over 34363.00 frames. ], tot_loss[loss=0.1987, simple_loss=0.2579, pruned_loss=0.05098, ctc_loss=0.1111, cr_loss=0.383, over 6758574.51 frames. ], batch size: 103, lr: 2.46e-03, grad_scale: 32.0 2024-09-20 09:05:43,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=866782.0, ans=0.125 2024-09-20 09:06:05,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=866828.6666666666, ans=0.04949747468305833 2024-09-20 09:06:31,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=866922.0, ans=0.125 2024-09-20 09:06:56,381 INFO [train.py:1198] (1/2) Epoch 48, batch 3600, loss[loss=0.187, simple_loss=0.2454, pruned_loss=0.04675, ctc_loss=0.1037, cr_loss=0.3578, over 34520.00 frames. ], tot_loss[loss=0.1989, simple_loss=0.2581, pruned_loss=0.05104, ctc_loss=0.1112, cr_loss=0.3829, over 6768329.56 frames. ], batch size: 90, lr: 2.46e-03, grad_scale: 32.0 2024-09-20 09:07:16,112 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=867062.0, ans=0.125 2024-09-20 09:07:28,427 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=867108.6666666666, ans=0.2 2024-09-20 09:07:37,377 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=867108.6666666666, ans=0.0 2024-09-20 09:07:56,073 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.166e+02 2.645e+02 3.383e+02 4.404e+02 7.234e+02, threshold=6.766e+02, percent-clipped=9.0 2024-09-20 09:07:59,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=867155.3333333334, ans=0.1 2024-09-20 09:08:09,388 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=867202.0, ans=0.0 2024-09-20 09:08:18,567 INFO [train.py:1198] (1/2) Epoch 48, batch 3650, loss[loss=0.2162, simple_loss=0.2752, pruned_loss=0.05813, ctc_loss=0.1235, cr_loss=0.407, over 34393.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2576, pruned_loss=0.05082, ctc_loss=0.1107, cr_loss=0.3817, over 6770133.10 frames. ], batch size: 110, lr: 2.46e-03, grad_scale: 32.0 2024-09-20 09:08:23,657 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=867248.6666666666, ans=0.1 2024-09-20 09:08:31,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=867248.6666666666, ans=0.125 2024-09-20 09:08:34,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=867295.3333333334, ans=0.0 2024-09-20 09:08:46,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=867295.3333333334, ans=0.04949747468305833 2024-09-20 09:08:49,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=867342.0, ans=0.0 2024-09-20 09:08:50,949 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=867342.0, ans=0.2 2024-09-20 09:08:56,625 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=22.5 2024-09-20 09:09:15,119 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.94 vs. limit=15.0 2024-09-20 09:09:38,494 INFO [train.py:1198] (1/2) Epoch 48, batch 3700, loss[loss=0.2084, simple_loss=0.272, pruned_loss=0.05318, ctc_loss=0.1143, cr_loss=0.3905, over 34606.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2578, pruned_loss=0.05069, ctc_loss=0.1106, cr_loss=0.3814, over 6784215.02 frames. ], batch size: 102, lr: 2.46e-03, grad_scale: 32.0 2024-09-20 09:09:46,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=867482.0, ans=0.125 2024-09-20 09:10:07,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=867528.6666666666, ans=0.125 2024-09-20 09:10:31,897 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=867622.0, ans=0.125 2024-09-20 09:10:36,260 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.259e+02 2.547e+02 2.854e+02 3.267e+02 6.222e+02, threshold=5.708e+02, percent-clipped=0.0 2024-09-20 09:10:36,602 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=867622.0, ans=0.125 2024-09-20 09:10:41,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=867622.0, ans=0.125 2024-09-20 09:10:42,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=867668.6666666666, ans=0.025 2024-09-20 09:10:46,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=867668.6666666666, ans=0.025 2024-09-20 09:10:56,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=4.93 vs. limit=15.0 2024-09-20 09:10:58,995 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=867715.3333333334, ans=0.0 2024-09-20 09:11:00,284 INFO [train.py:1198] (1/2) Epoch 48, batch 3750, loss[loss=0.2147, simple_loss=0.2731, pruned_loss=0.05742, ctc_loss=0.1233, cr_loss=0.4221, over 34355.00 frames. ], tot_loss[loss=0.2014, simple_loss=0.2609, pruned_loss=0.05189, ctc_loss=0.1128, cr_loss=0.3875, over 6785739.23 frames. ], batch size: 113, lr: 2.46e-03, grad_scale: 32.0 2024-09-20 09:11:11,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=867715.3333333334, ans=0.125 2024-09-20 09:11:19,949 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:11:29,627 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=867762.0, ans=0.04949747468305833 2024-09-20 09:11:29,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=867762.0, ans=0.2 2024-09-20 09:11:33,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.89 vs. limit=15.0 2024-09-20 09:11:54,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=867855.3333333334, ans=0.125 2024-09-20 09:11:58,413 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.65 vs. limit=5.0 2024-09-20 09:12:08,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=867902.0, ans=0.125 2024-09-20 09:12:10,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=867902.0, ans=0.125 2024-09-20 09:12:10,237 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=867902.0, ans=0.07 2024-09-20 09:12:12,162 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=8.64 vs. limit=12.0 2024-09-20 09:12:21,157 INFO [train.py:1198] (1/2) Epoch 48, batch 3800, loss[loss=0.2087, simple_loss=0.2609, pruned_loss=0.05798, ctc_loss=0.1248, cr_loss=0.39, over 29562.00 frames. ], tot_loss[loss=0.2041, simple_loss=0.2633, pruned_loss=0.0531, ctc_loss=0.1151, cr_loss=0.3931, over 6674169.61 frames. ], batch size: 175, lr: 2.46e-03, grad_scale: 32.0 2024-09-20 09:12:28,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=867948.6666666666, ans=0.1 2024-09-20 09:12:31,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=867948.6666666666, ans=0.125 2024-09-20 09:12:36,472 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=867995.3333333334, ans=0.0 2024-09-20 09:13:06,008 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=868042.0, ans=0.0 2024-09-20 09:13:17,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=868088.6666666666, ans=0.04949747468305833 2024-09-20 09:13:22,216 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.268e+02 2.544e+02 2.768e+02 3.080e+02 4.836e+02, threshold=5.537e+02, percent-clipped=0.0 2024-09-20 09:13:30,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=868135.3333333334, ans=0.125 2024-09-20 09:13:37,695 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=868135.3333333334, ans=0.2 2024-09-20 09:13:44,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=868182.0, ans=0.0 2024-09-20 09:13:45,485 INFO [train.py:1198] (1/2) Epoch 48, batch 3850, loss[loss=0.235, simple_loss=0.284, pruned_loss=0.06942, ctc_loss=0.1486, cr_loss=0.4389, over 23493.00 frames. ], tot_loss[loss=0.2069, simple_loss=0.2651, pruned_loss=0.05459, ctc_loss=0.1182, cr_loss=0.3967, over 6249233.65 frames. ], batch size: 245, lr: 2.46e-03, grad_scale: 32.0 2024-09-20 09:13:53,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=868182.0, ans=0.0 2024-09-20 09:13:59,884 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2024-09-20 09:14:06,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=868228.6666666666, ans=0.125 2024-09-20 09:14:09,384 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=868228.6666666666, ans=0.0 2024-09-20 09:15:17,580 INFO [train.py:1198] (1/2) Epoch 49, batch 0, loss[loss=0.1926, simple_loss=0.2496, pruned_loss=0.04923, ctc_loss=0.1086, cr_loss=0.383, over 34470.00 frames. ], tot_loss[loss=0.1926, simple_loss=0.2496, pruned_loss=0.04923, ctc_loss=0.1086, cr_loss=0.383, over 34470.00 frames. ], batch size: 85, lr: 2.44e-03, grad_scale: 32.0 2024-09-20 09:15:17,580 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 09:15:34,371 INFO [train.py:1230] (1/2) Epoch 49, validation: loss=0.1484, simple_loss=0.2422, pruned_loss=0.02346, ctc_loss=0.03826, cr_loss=2.417e-14, over 944034.00 frames. 2024-09-20 09:15:34,371 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-20 09:15:46,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=868308.0, ans=0.125 2024-09-20 09:15:59,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=868354.6666666666, ans=0.5 2024-09-20 09:16:06,678 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=868354.6666666666, ans=0.5 2024-09-20 09:16:44,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=868494.6666666666, ans=0.125 2024-09-20 09:16:44,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=868494.6666666666, ans=0.1 2024-09-20 09:16:59,099 INFO [train.py:1198] (1/2) Epoch 49, batch 50, loss[loss=0.1808, simple_loss=0.2346, pruned_loss=0.04631, ctc_loss=0.102, cr_loss=0.3512, over 34516.00 frames. ], tot_loss[loss=0.2017, simple_loss=0.2605, pruned_loss=0.05231, ctc_loss=0.1135, cr_loss=0.3888, over 1479260.02 frames. ], batch size: 82, lr: 2.44e-03, grad_scale: 32.0 2024-09-20 09:17:13,766 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.145e+02 2.613e+02 2.793e+02 3.290e+02 1.117e+03, threshold=5.587e+02, percent-clipped=1.0 2024-09-20 09:17:17,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=868588.0, ans=0.125 2024-09-20 09:17:20,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=868588.0, ans=0.125 2024-09-20 09:17:25,761 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=868588.0, ans=0.125 2024-09-20 09:17:29,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=868588.0, ans=0.125 2024-09-20 09:17:42,267 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=868634.6666666666, ans=0.1 2024-09-20 09:17:57,122 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=868681.3333333334, ans=0.125 2024-09-20 09:18:13,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=868728.0, ans=0.125 2024-09-20 09:18:22,649 INFO [train.py:1198] (1/2) Epoch 49, batch 100, loss[loss=0.188, simple_loss=0.2439, pruned_loss=0.04783, ctc_loss=0.1071, cr_loss=0.3729, over 34581.00 frames. ], tot_loss[loss=0.2028, simple_loss=0.262, pruned_loss=0.05259, ctc_loss=0.1145, cr_loss=0.3918, over 2627999.88 frames. ], batch size: 89, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:18:44,831 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2024-09-20 09:19:03,845 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=868868.0, ans=0.125 2024-09-20 09:19:30,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=868961.3333333334, ans=0.09899494936611666 2024-09-20 09:19:44,341 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.58 vs. limit=15.0 2024-09-20 09:19:46,495 INFO [train.py:1198] (1/2) Epoch 49, batch 150, loss[loss=0.1822, simple_loss=0.2344, pruned_loss=0.04779, ctc_loss=0.1015, cr_loss=0.3529, over 34491.00 frames. ], tot_loss[loss=0.2, simple_loss=0.2595, pruned_loss=0.0513, ctc_loss=0.1119, cr_loss=0.386, over 3555682.78 frames. ], batch size: 82, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:20:01,389 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.211e+02 2.522e+02 2.931e+02 3.584e+02 6.015e+02, threshold=5.861e+02, percent-clipped=3.0 2024-09-20 09:20:21,341 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=869101.3333333334, ans=0.0 2024-09-20 09:20:34,317 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=869148.0, ans=0.0 2024-09-20 09:21:08,441 INFO [train.py:1198] (1/2) Epoch 49, batch 200, loss[loss=0.212, simple_loss=0.2677, pruned_loss=0.05769, ctc_loss=0.1233, cr_loss=0.4064, over 31934.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2586, pruned_loss=0.0511, ctc_loss=0.1112, cr_loss=0.3839, over 4268878.22 frames. ], batch size: 145, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:21:35,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=869288.0, ans=0.125 2024-09-20 09:22:02,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2024-09-20 09:22:26,736 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=869428.0, ans=0.0 2024-09-20 09:22:32,967 INFO [train.py:1198] (1/2) Epoch 49, batch 250, loss[loss=0.1975, simple_loss=0.2619, pruned_loss=0.04845, ctc_loss=0.1068, cr_loss=0.3718, over 34254.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.2584, pruned_loss=0.0509, ctc_loss=0.1109, cr_loss=0.3832, over 4831111.73 frames. ], batch size: 117, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:22:47,850 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.187e+02 2.653e+02 3.216e+02 4.399e+02 8.309e+02, threshold=6.433e+02, percent-clipped=8.0 2024-09-20 09:23:03,338 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.84 vs. limit=15.0 2024-09-20 09:23:05,259 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2024-09-20 09:23:12,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=869568.0, ans=0.1 2024-09-20 09:23:32,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.15 vs. limit=10.0 2024-09-20 09:23:57,124 INFO [train.py:1198] (1/2) Epoch 49, batch 300, loss[loss=0.2185, simple_loss=0.2786, pruned_loss=0.05812, ctc_loss=0.1242, cr_loss=0.4349, over 34331.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.2581, pruned_loss=0.05099, ctc_loss=0.111, cr_loss=0.3837, over 5259121.12 frames. ], batch size: 107, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:24:04,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=869708.0, ans=0.125 2024-09-20 09:24:22,645 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2024-09-20 09:24:36,663 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=869801.3333333334, ans=0.0 2024-09-20 09:25:01,538 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=869894.6666666666, ans=0.125 2024-09-20 09:25:12,076 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=5.29 vs. limit=15.0 2024-09-20 09:25:19,266 INFO [train.py:1198] (1/2) Epoch 49, batch 350, loss[loss=0.1783, simple_loss=0.2352, pruned_loss=0.04396, ctc_loss=0.09661, cr_loss=0.3561, over 34286.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.2584, pruned_loss=0.05088, ctc_loss=0.1109, cr_loss=0.3831, over 5594926.48 frames. ], batch size: 83, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:25:23,087 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=10.76 vs. limit=22.5 2024-09-20 09:25:24,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=869941.3333333334, ans=0.125 2024-09-20 09:25:31,024 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=869941.3333333334, ans=0.0 2024-09-20 09:25:31,738 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.65 vs. limit=12.0 2024-09-20 09:25:33,893 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.100e+02 2.602e+02 2.887e+02 3.526e+02 6.347e+02, threshold=5.774e+02, percent-clipped=0.0 2024-09-20 09:25:39,960 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.10 vs. limit=12.0 2024-09-20 09:26:11,229 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=22.5 2024-09-20 09:26:11,923 INFO [scaling.py:801] (1/2) Caught exception in Balancer backward: CUDA out of memory. Tried to allocate 3.77 GiB. GPU 1 has a total capacity of 79.17 GiB of which 3.60 GiB is free. Process 39810 has 75.57 GiB memory in use. Of the allocated memory 29.31 GiB is allocated by PyTorch, and 43.86 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables), size=[226, 384, 614, 19], will continue. 2024-09-20 09:26:38,658 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:26:40,303 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=870128.0, ans=0.2 2024-09-20 09:26:43,209 INFO [train.py:1198] (1/2) Epoch 49, batch 400, loss[loss=0.2032, simple_loss=0.2626, pruned_loss=0.05267, ctc_loss=0.1155, cr_loss=0.3847, over 34422.00 frames. ], tot_loss[loss=0.1981, simple_loss=0.2578, pruned_loss=0.05052, ctc_loss=0.1102, cr_loss=0.3819, over 5863111.51 frames. ], batch size: 95, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:26:48,458 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=870174.6666666666, ans=0.125 2024-09-20 09:27:17,578 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.61 vs. limit=22.5 2024-09-20 09:27:28,936 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=870268.0, ans=0.1 2024-09-20 09:27:34,358 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.48 vs. limit=15.0 2024-09-20 09:27:38,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=870314.6666666666, ans=0.0 2024-09-20 09:27:47,176 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=870314.6666666666, ans=0.125 2024-09-20 09:28:08,135 INFO [train.py:1198] (1/2) Epoch 49, batch 450, loss[loss=0.2052, simple_loss=0.2692, pruned_loss=0.05184, ctc_loss=0.1106, cr_loss=0.3874, over 34684.00 frames. ], tot_loss[loss=0.1982, simple_loss=0.258, pruned_loss=0.05055, ctc_loss=0.1102, cr_loss=0.3822, over 6053038.11 frames. ], batch size: 97, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:28:21,494 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=870408.0, ans=0.05 2024-09-20 09:28:21,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=870408.0, ans=0.125 2024-09-20 09:28:22,900 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.167e+02 2.526e+02 2.871e+02 3.621e+02 5.147e+02, threshold=5.741e+02, percent-clipped=1.0 2024-09-20 09:28:50,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=768, metric=16.89 vs. limit=22.5 2024-09-20 09:28:59,666 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=870548.0, ans=0.1 2024-09-20 09:29:32,350 INFO [train.py:1198] (1/2) Epoch 49, batch 500, loss[loss=0.2229, simple_loss=0.2806, pruned_loss=0.06072, ctc_loss=0.1313, cr_loss=0.4384, over 34428.00 frames. ], tot_loss[loss=0.1977, simple_loss=0.2573, pruned_loss=0.05043, ctc_loss=0.11, cr_loss=0.3812, over 6218125.62 frames. ], batch size: 110, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:29:57,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=870688.0, ans=0.125 2024-09-20 09:30:33,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=870781.3333333334, ans=0.125 2024-09-20 09:30:37,901 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.26 vs. limit=15.0 2024-09-20 09:30:54,983 INFO [train.py:1198] (1/2) Epoch 49, batch 550, loss[loss=0.202, simple_loss=0.2656, pruned_loss=0.05064, ctc_loss=0.1091, cr_loss=0.3807, over 33866.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2572, pruned_loss=0.05052, ctc_loss=0.1101, cr_loss=0.3808, over 6326246.51 frames. ], batch size: 122, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:31:01,462 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.86 vs. limit=22.5 2024-09-20 09:31:12,230 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.187e+02 2.541e+02 2.806e+02 3.371e+02 6.027e+02, threshold=5.612e+02, percent-clipped=1.0 2024-09-20 09:31:14,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=870921.3333333334, ans=0.125 2024-09-20 09:31:30,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=870968.0, ans=0.1 2024-09-20 09:31:59,521 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=15.0 2024-09-20 09:32:19,686 INFO [train.py:1198] (1/2) Epoch 49, batch 600, loss[loss=0.2132, simple_loss=0.2753, pruned_loss=0.05542, ctc_loss=0.1191, cr_loss=0.4124, over 34280.00 frames. ], tot_loss[loss=0.1976, simple_loss=0.2571, pruned_loss=0.05044, ctc_loss=0.11, cr_loss=0.3809, over 6429202.31 frames. ], batch size: 117, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:32:41,875 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.67 vs. limit=15.0 2024-09-20 09:32:47,542 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=871154.6666666666, ans=0.2 2024-09-20 09:33:05,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=871201.3333333334, ans=0.0 2024-09-20 09:33:16,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.const_attention_rate, batch_count=871248.0, ans=0.025 2024-09-20 09:33:23,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=871248.0, ans=0.125 2024-09-20 09:33:27,014 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=871294.6666666666, ans=0.125 2024-09-20 09:33:28,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=871294.6666666666, ans=0.0 2024-09-20 09:33:42,928 INFO [train.py:1198] (1/2) Epoch 49, batch 650, loss[loss=0.2026, simple_loss=0.2637, pruned_loss=0.05118, ctc_loss=0.1158, cr_loss=0.3983, over 34551.00 frames. ], tot_loss[loss=0.1969, simple_loss=0.2566, pruned_loss=0.05011, ctc_loss=0.1094, cr_loss=0.3792, over 6520947.10 frames. ], batch size: 94, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 09:33:56,326 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:33:57,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=871388.0, ans=0.025 2024-09-20 09:33:58,628 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.15 vs. limit=10.0 2024-09-20 09:33:59,334 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.225e+02 2.638e+02 2.890e+02 3.912e+02 8.012e+02, threshold=5.779e+02, percent-clipped=3.0 2024-09-20 09:34:04,512 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=871388.0, ans=0.125 2024-09-20 09:34:47,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=871528.0, ans=0.125 2024-09-20 09:34:52,037 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.33 vs. limit=15.0 2024-09-20 09:34:53,375 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.14 vs. limit=15.0 2024-09-20 09:34:56,079 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=871528.0, ans=0.125 2024-09-20 09:35:02,886 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=871528.0, ans=0.125 2024-09-20 09:35:04,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=871528.0, ans=0.1 2024-09-20 09:35:07,413 INFO [train.py:1198] (1/2) Epoch 49, batch 700, loss[loss=0.1914, simple_loss=0.2468, pruned_loss=0.04982, ctc_loss=0.1069, cr_loss=0.3749, over 34586.00 frames. ], tot_loss[loss=0.1976, simple_loss=0.2574, pruned_loss=0.05031, ctc_loss=0.1097, cr_loss=0.3801, over 6579375.39 frames. ], batch size: 89, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 09:35:08,115 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2024-09-20 09:35:35,641 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=871621.3333333334, ans=0.2 2024-09-20 09:35:55,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=871714.6666666666, ans=0.0 2024-09-20 09:36:30,105 INFO [train.py:1198] (1/2) Epoch 49, batch 750, loss[loss=0.1995, simple_loss=0.264, pruned_loss=0.04928, ctc_loss=0.1083, cr_loss=0.3691, over 34427.00 frames. ], tot_loss[loss=0.1973, simple_loss=0.257, pruned_loss=0.05022, ctc_loss=0.1095, cr_loss=0.3796, over 6623452.72 frames. ], batch size: 95, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 09:36:46,246 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.556e+02 2.883e+02 3.823e+02 5.428e+02, threshold=5.766e+02, percent-clipped=0.0 2024-09-20 09:37:03,228 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=871901.3333333334, ans=0.025 2024-09-20 09:37:15,087 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=871901.3333333334, ans=0.125 2024-09-20 09:37:18,320 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=871901.3333333334, ans=0.0 2024-09-20 09:37:33,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=871948.0, ans=0.2 2024-09-20 09:37:43,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=871994.6666666666, ans=0.125 2024-09-20 09:37:43,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=871994.6666666666, ans=0.0 2024-09-20 09:37:54,362 INFO [train.py:1198] (1/2) Epoch 49, batch 800, loss[loss=0.1784, simple_loss=0.2367, pruned_loss=0.04327, ctc_loss=0.09747, cr_loss=0.3529, over 34499.00 frames. ], tot_loss[loss=0.1968, simple_loss=0.2566, pruned_loss=0.05004, ctc_loss=0.1092, cr_loss=0.3787, over 6659108.85 frames. ], batch size: 85, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:37:54,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=872041.3333333334, ans=0.025 2024-09-20 09:37:56,637 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.21 vs. limit=22.5 2024-09-20 09:37:57,872 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=872041.3333333334, ans=0.04949747468305833 2024-09-20 09:38:29,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=872134.6666666666, ans=0.2 2024-09-20 09:38:42,717 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=872134.6666666666, ans=0.05 2024-09-20 09:38:47,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=872181.3333333334, ans=0.0 2024-09-20 09:39:00,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=872228.0, ans=0.125 2024-09-20 09:39:05,561 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=872228.0, ans=0.0 2024-09-20 09:39:08,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=872228.0, ans=0.1 2024-09-20 09:39:18,131 INFO [train.py:1198] (1/2) Epoch 49, batch 850, loss[loss=0.2013, simple_loss=0.2681, pruned_loss=0.04921, ctc_loss=0.1082, cr_loss=0.3622, over 34367.00 frames. ], tot_loss[loss=0.1966, simple_loss=0.2565, pruned_loss=0.04994, ctc_loss=0.109, cr_loss=0.3785, over 6690229.54 frames. ], batch size: 103, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:39:18,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=872274.6666666666, ans=0.0 2024-09-20 09:39:24,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=872274.6666666666, ans=0.125 2024-09-20 09:39:31,473 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=872274.6666666666, ans=0.0 2024-09-20 09:39:34,244 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.186e+02 2.714e+02 3.105e+02 3.940e+02 5.793e+02, threshold=6.209e+02, percent-clipped=1.0 2024-09-20 09:39:41,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=872321.3333333334, ans=0.1 2024-09-20 09:39:41,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=872321.3333333334, ans=0.2 2024-09-20 09:39:54,661 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=22.5 2024-09-20 09:39:56,048 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=872368.0, ans=0.125 2024-09-20 09:40:09,929 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.64 vs. limit=15.0 2024-09-20 09:40:12,645 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:40:19,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=872414.6666666666, ans=0.125 2024-09-20 09:40:27,460 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=872461.3333333334, ans=0.125 2024-09-20 09:40:30,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=872461.3333333334, ans=0.125 2024-09-20 09:40:40,594 INFO [train.py:1198] (1/2) Epoch 49, batch 900, loss[loss=0.1742, simple_loss=0.2331, pruned_loss=0.04164, ctc_loss=0.09238, cr_loss=0.3399, over 34515.00 frames. ], tot_loss[loss=0.197, simple_loss=0.2567, pruned_loss=0.05014, ctc_loss=0.1094, cr_loss=0.3791, over 6695490.95 frames. ], batch size: 85, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:41:00,622 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=872554.6666666666, ans=0.5 2024-09-20 09:41:06,957 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=872554.6666666666, ans=0.125 2024-09-20 09:41:11,992 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=872554.6666666666, ans=0.0 2024-09-20 09:41:21,956 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=872601.3333333334, ans=0.125 2024-09-20 09:41:33,555 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=872648.0, ans=0.125 2024-09-20 09:41:37,244 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.42 vs. limit=15.0 2024-09-20 09:42:04,270 INFO [train.py:1198] (1/2) Epoch 49, batch 950, loss[loss=0.184, simple_loss=0.2434, pruned_loss=0.04491, ctc_loss=0.09976, cr_loss=0.3738, over 34687.00 frames. ], tot_loss[loss=0.1975, simple_loss=0.2572, pruned_loss=0.05028, ctc_loss=0.1098, cr_loss=0.3805, over 6700364.16 frames. ], batch size: 87, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:42:08,496 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=768, metric=19.39 vs. limit=22.5 2024-09-20 09:42:15,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=872741.3333333334, ans=0.0 2024-09-20 09:42:20,360 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.093e+02 2.683e+02 3.038e+02 3.820e+02 5.944e+02, threshold=6.075e+02, percent-clipped=0.0 2024-09-20 09:42:35,266 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=7.46 vs. limit=15.0 2024-09-20 09:42:42,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=872834.6666666666, ans=0.125 2024-09-20 09:42:56,125 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=872881.3333333334, ans=0.125 2024-09-20 09:43:28,136 INFO [train.py:1198] (1/2) Epoch 49, batch 1000, loss[loss=0.1917, simple_loss=0.2491, pruned_loss=0.04927, ctc_loss=0.1063, cr_loss=0.3634, over 34469.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.258, pruned_loss=0.05061, ctc_loss=0.1103, cr_loss=0.3818, over 6694845.52 frames. ], batch size: 90, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 09:43:53,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=873021.3333333334, ans=0.125 2024-09-20 09:43:53,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.75 vs. limit=10.0 2024-09-20 09:44:01,563 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=873068.0, ans=0.0 2024-09-20 09:44:52,178 INFO [train.py:1198] (1/2) Epoch 49, batch 1050, loss[loss=0.1927, simple_loss=0.2596, pruned_loss=0.04545, ctc_loss=0.1018, cr_loss=0.3665, over 34563.00 frames. ], tot_loss[loss=0.1976, simple_loss=0.2572, pruned_loss=0.05037, ctc_loss=0.1099, cr_loss=0.3809, over 6704704.55 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 09:44:59,128 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=873208.0, ans=0.125 2024-09-20 09:45:03,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=873208.0, ans=0.125 2024-09-20 09:45:10,349 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.292e+02 2.634e+02 2.955e+02 3.512e+02 6.483e+02, threshold=5.909e+02, percent-clipped=1.0 2024-09-20 09:45:17,428 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:45:19,015 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873254.6666666666, ans=0.1 2024-09-20 09:45:27,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=873301.3333333334, ans=0.0 2024-09-20 09:45:27,687 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=4.75 vs. limit=12.0 2024-09-20 09:45:38,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=873301.3333333334, ans=0.125 2024-09-20 09:45:56,058 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.26 vs. limit=8.0 2024-09-20 09:46:10,067 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=873394.6666666666, ans=0.04949747468305833 2024-09-20 09:46:16,610 INFO [train.py:1198] (1/2) Epoch 49, batch 1100, loss[loss=0.1956, simple_loss=0.258, pruned_loss=0.04835, ctc_loss=0.1084, cr_loss=0.3719, over 34366.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2573, pruned_loss=0.05054, ctc_loss=0.1102, cr_loss=0.381, over 6718278.00 frames. ], batch size: 91, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 09:46:25,037 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=873441.3333333334, ans=0.0 2024-09-20 09:46:45,542 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.79 vs. limit=15.0 2024-09-20 09:47:39,052 INFO [train.py:1198] (1/2) Epoch 49, batch 1150, loss[loss=0.1967, simple_loss=0.2565, pruned_loss=0.0499, ctc_loss=0.1089, cr_loss=0.3846, over 34361.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2572, pruned_loss=0.05056, ctc_loss=0.1102, cr_loss=0.3814, over 6716019.53 frames. ], batch size: 91, lr: 2.43e-03, grad_scale: 16.0 2024-09-20 09:47:49,759 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=12.0 2024-09-20 09:47:57,031 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.163e+02 2.585e+02 2.878e+02 3.528e+02 5.294e+02, threshold=5.757e+02, percent-clipped=0.0 2024-09-20 09:48:10,947 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=873721.3333333334, ans=0.0 2024-09-20 09:48:16,830 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.09 vs. limit=15.0 2024-09-20 09:48:31,775 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=4.58 vs. limit=15.0 2024-09-20 09:48:40,558 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=873814.6666666666, ans=0.125 2024-09-20 09:48:55,938 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.39 vs. limit=15.0 2024-09-20 09:49:03,264 INFO [train.py:1198] (1/2) Epoch 49, batch 1200, loss[loss=0.2101, simple_loss=0.2659, pruned_loss=0.05672, ctc_loss=0.1202, cr_loss=0.4212, over 34545.00 frames. ], tot_loss[loss=0.1984, simple_loss=0.2579, pruned_loss=0.05069, ctc_loss=0.1106, cr_loss=0.3826, over 6708843.66 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:49:03,723 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=873908.0, ans=0.0 2024-09-20 09:49:20,254 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=873954.6666666666, ans=0.1 2024-09-20 09:49:21,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=873954.6666666666, ans=0.125 2024-09-20 09:49:28,813 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.55 vs. limit=15.0 2024-09-20 09:50:06,741 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=874048.0, ans=0.125 2024-09-20 09:50:24,954 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=874094.6666666666, ans=0.1 2024-09-20 09:50:27,790 INFO [train.py:1198] (1/2) Epoch 49, batch 1250, loss[loss=0.2082, simple_loss=0.2728, pruned_loss=0.05246, ctc_loss=0.115, cr_loss=0.3917, over 34338.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2587, pruned_loss=0.05088, ctc_loss=0.111, cr_loss=0.3837, over 6742428.08 frames. ], batch size: 107, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:50:33,094 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=874141.3333333334, ans=0.125 2024-09-20 09:50:46,087 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.206e+02 2.621e+02 2.847e+02 3.457e+02 5.477e+02, threshold=5.694e+02, percent-clipped=0.0 2024-09-20 09:51:07,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=874234.6666666666, ans=0.125 2024-09-20 09:51:07,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=874234.6666666666, ans=0.125 2024-09-20 09:51:12,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=874234.6666666666, ans=0.1 2024-09-20 09:51:27,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=874281.3333333334, ans=0.025 2024-09-20 09:51:27,534 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=874281.3333333334, ans=0.1 2024-09-20 09:51:30,238 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.61 vs. limit=10.0 2024-09-20 09:51:51,956 INFO [train.py:1198] (1/2) Epoch 49, batch 1300, loss[loss=0.1999, simple_loss=0.2677, pruned_loss=0.04811, ctc_loss=0.1058, cr_loss=0.3694, over 32928.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.258, pruned_loss=0.05064, ctc_loss=0.1105, cr_loss=0.3823, over 6746138.50 frames. ], batch size: 130, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:52:06,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=874421.3333333334, ans=0.125 2024-09-20 09:52:14,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=874421.3333333334, ans=0.0 2024-09-20 09:52:31,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=874468.0, ans=0.1 2024-09-20 09:52:31,820 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=874468.0, ans=0.0 2024-09-20 09:52:43,177 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=874514.6666666666, ans=0.125 2024-09-20 09:53:02,246 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.21 vs. limit=22.5 2024-09-20 09:53:04,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=874561.3333333334, ans=0.2 2024-09-20 09:53:14,432 INFO [train.py:1198] (1/2) Epoch 49, batch 1350, loss[loss=0.2075, simple_loss=0.2641, pruned_loss=0.05538, ctc_loss=0.1203, cr_loss=0.4035, over 34517.00 frames. ], tot_loss[loss=0.1981, simple_loss=0.2578, pruned_loss=0.05057, ctc_loss=0.1103, cr_loss=0.3817, over 6764869.92 frames. ], batch size: 94, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:53:31,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=874654.6666666666, ans=0.2 2024-09-20 09:53:32,175 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.347e+02 2.687e+02 3.216e+02 3.953e+02 6.565e+02, threshold=6.432e+02, percent-clipped=1.0 2024-09-20 09:53:50,339 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=874701.3333333334, ans=0.125 2024-09-20 09:53:52,098 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:54:31,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=874794.6666666666, ans=0.0 2024-09-20 09:54:35,251 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=874794.6666666666, ans=0.07 2024-09-20 09:54:35,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=874794.6666666666, ans=0.0 2024-09-20 09:54:38,244 INFO [train.py:1198] (1/2) Epoch 49, batch 1400, loss[loss=0.1838, simple_loss=0.2381, pruned_loss=0.04714, ctc_loss=0.1024, cr_loss=0.3698, over 34305.00 frames. ], tot_loss[loss=0.1982, simple_loss=0.2577, pruned_loss=0.05065, ctc_loss=0.1105, cr_loss=0.3823, over 6776870.03 frames. ], batch size: 80, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:54:45,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=874841.3333333334, ans=0.05 2024-09-20 09:55:21,450 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:55:36,635 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.93 vs. limit=12.0 2024-09-20 09:55:52,727 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=875028.0, ans=0.125 2024-09-20 09:55:54,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=875028.0, ans=0.0 2024-09-20 09:56:02,112 INFO [train.py:1198] (1/2) Epoch 49, batch 1450, loss[loss=0.2107, simple_loss=0.2727, pruned_loss=0.05463, ctc_loss=0.1167, cr_loss=0.4011, over 34411.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.2581, pruned_loss=0.05072, ctc_loss=0.1105, cr_loss=0.383, over 6775236.52 frames. ], batch size: 110, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:56:04,221 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass_mid.scale_min, batch_count=875074.6666666666, ans=0.2 2024-09-20 09:56:07,351 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=875074.6666666666, ans=0.1 2024-09-20 09:56:07,442 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=875074.6666666666, ans=0.1 2024-09-20 09:56:07,885 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.53 vs. limit=10.0 2024-09-20 09:56:09,002 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:56:14,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=875074.6666666666, ans=0.125 2024-09-20 09:56:20,136 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.227e+02 2.537e+02 2.890e+02 3.750e+02 6.908e+02, threshold=5.780e+02, percent-clipped=1.0 2024-09-20 09:56:28,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=875121.3333333334, ans=0.1 2024-09-20 09:56:38,300 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=875168.0, ans=0.1 2024-09-20 09:56:43,259 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=875168.0, ans=0.0 2024-09-20 09:56:52,376 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.49 vs. limit=22.5 2024-09-20 09:57:12,628 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=875261.3333333334, ans=0.125 2024-09-20 09:57:24,014 INFO [train.py:1198] (1/2) Epoch 49, batch 1500, loss[loss=0.2064, simple_loss=0.271, pruned_loss=0.05239, ctc_loss=0.1108, cr_loss=0.3725, over 34452.00 frames. ], tot_loss[loss=0.1987, simple_loss=0.2585, pruned_loss=0.05071, ctc_loss=0.1106, cr_loss=0.3832, over 6774032.64 frames. ], batch size: 100, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:58:03,263 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.69 vs. limit=10.0 2024-09-20 09:58:04,305 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=875401.3333333334, ans=0.125 2024-09-20 09:58:22,978 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=875448.0, ans=0.125 2024-09-20 09:58:31,331 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=875494.6666666666, ans=0.0 2024-09-20 09:58:32,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=875494.6666666666, ans=0.0 2024-09-20 09:58:48,688 INFO [train.py:1198] (1/2) Epoch 49, batch 1550, loss[loss=0.2128, simple_loss=0.2725, pruned_loss=0.0563, ctc_loss=0.1211, cr_loss=0.4064, over 34417.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2585, pruned_loss=0.05094, ctc_loss=0.1111, cr_loss=0.3838, over 6746960.99 frames. ], batch size: 105, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 09:59:06,732 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.218e+02 2.499e+02 2.851e+02 3.603e+02 7.077e+02, threshold=5.703e+02, percent-clipped=5.0 2024-09-20 09:59:13,598 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 09:59:18,506 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=875588.0, ans=0.07 2024-09-20 09:59:21,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=875634.6666666666, ans=0.125 2024-09-20 09:59:38,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.88 vs. limit=15.0 2024-09-20 09:59:48,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.71 vs. limit=22.5 2024-09-20 10:00:12,024 INFO [train.py:1198] (1/2) Epoch 49, batch 1600, loss[loss=0.2079, simple_loss=0.2714, pruned_loss=0.05269, ctc_loss=0.1148, cr_loss=0.4013, over 34596.00 frames. ], tot_loss[loss=0.1989, simple_loss=0.2583, pruned_loss=0.05098, ctc_loss=0.1112, cr_loss=0.3837, over 6726380.08 frames. ], batch size: 99, lr: 2.43e-03, grad_scale: 32.0 2024-09-20 10:00:44,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=875868.0, ans=0.09899494936611666 2024-09-20 10:01:16,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=875914.6666666666, ans=0.0 2024-09-20 10:01:21,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=875961.3333333334, ans=0.125 2024-09-20 10:01:25,264 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.43 vs. limit=6.0 2024-09-20 10:01:26,449 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.651e-02 2024-09-20 10:01:27,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=875961.3333333334, ans=0.1 2024-09-20 10:01:29,485 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=875961.3333333334, ans=0.125 2024-09-20 10:01:31,059 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=875961.3333333334, ans=0.1 2024-09-20 10:01:35,683 INFO [train.py:1198] (1/2) Epoch 49, batch 1650, loss[loss=0.2079, simple_loss=0.2723, pruned_loss=0.05155, ctc_loss=0.1181, cr_loss=0.4192, over 34387.00 frames. ], tot_loss[loss=0.1986, simple_loss=0.2581, pruned_loss=0.05079, ctc_loss=0.111, cr_loss=0.3832, over 6718991.02 frames. ], batch size: 103, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:01:50,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=876054.6666666666, ans=0.2 2024-09-20 10:01:53,879 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.248e+02 2.617e+02 3.003e+02 3.882e+02 5.717e+02, threshold=6.007e+02, percent-clipped=1.0 2024-09-20 10:02:00,632 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=876054.6666666666, ans=0.0 2024-09-20 10:02:03,843 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=876054.6666666666, ans=0.1 2024-09-20 10:02:07,229 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=876101.3333333334, ans=0.125 2024-09-20 10:02:12,796 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.59 vs. limit=15.0 2024-09-20 10:02:16,008 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=4.83 vs. limit=12.0 2024-09-20 10:02:22,219 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876101.3333333334, ans=0.1 2024-09-20 10:02:27,183 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=876148.0, ans=0.0 2024-09-20 10:02:41,932 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:02:58,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=876241.3333333334, ans=0.0 2024-09-20 10:02:59,471 INFO [train.py:1198] (1/2) Epoch 49, batch 1700, loss[loss=0.162, simple_loss=0.2237, pruned_loss=0.03566, ctc_loss=0.08215, cr_loss=0.3166, over 34292.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.258, pruned_loss=0.05066, ctc_loss=0.1106, cr_loss=0.3822, over 6744815.10 frames. ], batch size: 80, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:03:03,135 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=876241.3333333334, ans=0.0 2024-09-20 10:03:22,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=876288.0, ans=0.125 2024-09-20 10:03:42,469 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=876334.6666666666, ans=0.2 2024-09-20 10:03:50,571 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=876381.3333333334, ans=0.2 2024-09-20 10:04:21,940 INFO [train.py:1198] (1/2) Epoch 49, batch 1750, loss[loss=0.1733, simple_loss=0.2301, pruned_loss=0.0419, ctc_loss=0.09447, cr_loss=0.3421, over 34131.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2575, pruned_loss=0.05042, ctc_loss=0.1102, cr_loss=0.3816, over 6752528.57 frames. ], batch size: 78, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:04:40,049 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.282e+02 2.626e+02 3.078e+02 3.983e+02 8.054e+02, threshold=6.156e+02, percent-clipped=2.0 2024-09-20 10:05:03,581 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=876568.0, ans=0.125 2024-09-20 10:05:15,597 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:05:17,091 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=876614.6666666666, ans=0.125 2024-09-20 10:05:41,556 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=876661.3333333334, ans=0.0 2024-09-20 10:05:46,112 INFO [train.py:1198] (1/2) Epoch 49, batch 1800, loss[loss=0.2079, simple_loss=0.2705, pruned_loss=0.05301, ctc_loss=0.1146, cr_loss=0.409, over 34711.00 frames. ], tot_loss[loss=0.1981, simple_loss=0.2577, pruned_loss=0.05056, ctc_loss=0.1104, cr_loss=0.3815, over 6756252.60 frames. ], batch size: 97, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:06:06,187 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=876754.6666666666, ans=0.025 2024-09-20 10:06:23,982 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=876801.3333333334, ans=0.025 2024-09-20 10:06:37,421 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:06:43,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.43 vs. limit=15.0 2024-09-20 10:06:43,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=876848.0, ans=0.1 2024-09-20 10:06:43,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=876848.0, ans=0.125 2024-09-20 10:06:48,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=876848.0, ans=0.0 2024-09-20 10:06:55,463 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:06:58,733 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=876894.6666666666, ans=0.1 2024-09-20 10:07:04,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=15.0 2024-09-20 10:07:10,096 INFO [train.py:1198] (1/2) Epoch 49, batch 1850, loss[loss=0.2017, simple_loss=0.2658, pruned_loss=0.05007, ctc_loss=0.1105, cr_loss=0.3838, over 34466.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.2575, pruned_loss=0.05049, ctc_loss=0.1102, cr_loss=0.3815, over 6761689.02 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:07:24,979 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=876988.0, ans=0.125 2024-09-20 10:07:28,058 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.182e+02 2.741e+02 3.125e+02 4.181e+02 6.657e+02, threshold=6.250e+02, percent-clipped=3.0 2024-09-20 10:07:33,463 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:08:19,206 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=877128.0, ans=0.0 2024-09-20 10:08:24,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.45 vs. limit=12.0 2024-09-20 10:08:25,898 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=877128.0, ans=0.125 2024-09-20 10:08:31,901 INFO [train.py:1198] (1/2) Epoch 49, batch 1900, loss[loss=0.2023, simple_loss=0.2657, pruned_loss=0.05078, ctc_loss=0.1114, cr_loss=0.3747, over 34367.00 frames. ], tot_loss[loss=0.1986, simple_loss=0.2583, pruned_loss=0.05072, ctc_loss=0.1106, cr_loss=0.3828, over 6771575.64 frames. ], batch size: 103, lr: 2.42e-03, grad_scale: 16.0 2024-09-20 10:08:32,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=877174.6666666666, ans=0.125 2024-09-20 10:08:38,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=877174.6666666666, ans=0.125 2024-09-20 10:08:43,608 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=877174.6666666666, ans=0.1 2024-09-20 10:09:47,648 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=877361.3333333334, ans=0.125 2024-09-20 10:09:49,248 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=877361.3333333334, ans=0.0 2024-09-20 10:09:52,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=877361.3333333334, ans=0.125 2024-09-20 10:10:02,156 INFO [train.py:1198] (1/2) Epoch 49, batch 1950, loss[loss=0.1954, simple_loss=0.2574, pruned_loss=0.04865, ctc_loss=0.1063, cr_loss=0.3714, over 34361.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2593, pruned_loss=0.05083, ctc_loss=0.1108, cr_loss=0.3835, over 6788741.17 frames. ], batch size: 91, lr: 2.42e-03, grad_scale: 16.0 2024-09-20 10:10:14,621 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=6.73 vs. limit=15.0 2024-09-20 10:10:23,765 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.197e+02 2.691e+02 3.009e+02 3.447e+02 5.754e+02, threshold=6.019e+02, percent-clipped=0.0 2024-09-20 10:10:25,718 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=877454.6666666666, ans=0.125 2024-09-20 10:10:34,468 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.48 vs. limit=15.0 2024-09-20 10:11:00,643 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2024-09-20 10:11:13,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=877594.6666666666, ans=0.2 2024-09-20 10:11:18,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=877594.6666666666, ans=0.125 2024-09-20 10:11:26,221 INFO [train.py:1198] (1/2) Epoch 49, batch 2000, loss[loss=0.1801, simple_loss=0.2368, pruned_loss=0.04496, ctc_loss=0.09808, cr_loss=0.3465, over 34179.00 frames. ], tot_loss[loss=0.1994, simple_loss=0.2595, pruned_loss=0.0509, ctc_loss=0.111, cr_loss=0.3837, over 6763349.62 frames. ], batch size: 78, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:11:36,621 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=877641.3333333334, ans=0.2 2024-09-20 10:11:39,870 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=877641.3333333334, ans=0.0 2024-09-20 10:11:43,310 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=877688.0, ans=0.125 2024-09-20 10:11:52,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=877688.0, ans=0.125 2024-09-20 10:12:03,654 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2024-09-20 10:12:16,031 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=877781.3333333334, ans=0.0 2024-09-20 10:12:22,667 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=877781.3333333334, ans=0.125 2024-09-20 10:12:50,617 INFO [train.py:1198] (1/2) Epoch 49, batch 2050, loss[loss=0.173, simple_loss=0.2297, pruned_loss=0.04161, ctc_loss=0.09474, cr_loss=0.3511, over 34496.00 frames. ], tot_loss[loss=0.1984, simple_loss=0.2582, pruned_loss=0.05062, ctc_loss=0.1105, cr_loss=0.3824, over 6755429.65 frames. ], batch size: 82, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:12:52,742 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=877874.6666666666, ans=0.125 2024-09-20 10:12:54,196 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=877874.6666666666, ans=0.0 2024-09-20 10:13:02,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=877874.6666666666, ans=0.1 2024-09-20 10:13:10,247 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.107e+02 2.693e+02 3.097e+02 3.600e+02 6.370e+02, threshold=6.194e+02, percent-clipped=2.0 2024-09-20 10:13:17,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=877921.3333333334, ans=0.0 2024-09-20 10:13:33,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer1.prob, batch_count=877968.0, ans=0.125 2024-09-20 10:13:46,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=878014.6666666666, ans=0.1 2024-09-20 10:13:50,224 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.44 vs. limit=15.0 2024-09-20 10:13:53,154 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=878014.6666666666, ans=10.0 2024-09-20 10:14:11,392 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.51 vs. limit=22.5 2024-09-20 10:14:14,047 INFO [train.py:1198] (1/2) Epoch 49, batch 2100, loss[loss=0.1958, simple_loss=0.2589, pruned_loss=0.04795, ctc_loss=0.1063, cr_loss=0.3875, over 34541.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2576, pruned_loss=0.0504, ctc_loss=0.11, cr_loss=0.3815, over 6769997.28 frames. ], batch size: 94, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:14:20,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=878108.0, ans=0.125 2024-09-20 10:14:45,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=878201.3333333334, ans=0.1 2024-09-20 10:15:11,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=878248.0, ans=0.125 2024-09-20 10:15:13,521 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.balancer.prob, batch_count=878248.0, ans=0.125 2024-09-20 10:15:22,495 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2024-09-20 10:15:26,611 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=878294.6666666666, ans=0.125 2024-09-20 10:15:29,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=878294.6666666666, ans=0.0 2024-09-20 10:15:36,352 INFO [train.py:1198] (1/2) Epoch 49, batch 2150, loss[loss=0.1942, simple_loss=0.2478, pruned_loss=0.05158, ctc_loss=0.1108, cr_loss=0.3837, over 34370.00 frames. ], tot_loss[loss=0.1973, simple_loss=0.2571, pruned_loss=0.05024, ctc_loss=0.1095, cr_loss=0.3804, over 6788153.58 frames. ], batch size: 91, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:15:40,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=878341.3333333334, ans=0.2 2024-09-20 10:15:52,279 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.24 vs. limit=15.0 2024-09-20 10:15:56,211 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.167e+02 2.529e+02 2.831e+02 3.715e+02 7.373e+02, threshold=5.663e+02, percent-clipped=1.0 2024-09-20 10:16:31,288 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=878481.3333333334, ans=0.0 2024-09-20 10:16:38,479 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=4.81 vs. limit=15.0 2024-09-20 10:16:43,622 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.32 vs. limit=10.0 2024-09-20 10:17:00,771 INFO [train.py:1198] (1/2) Epoch 49, batch 2200, loss[loss=0.2053, simple_loss=0.2665, pruned_loss=0.05275, ctc_loss=0.1123, cr_loss=0.4011, over 34428.00 frames. ], tot_loss[loss=0.1977, simple_loss=0.2573, pruned_loss=0.05041, ctc_loss=0.1098, cr_loss=0.3812, over 6783589.47 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:17:06,025 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=878574.6666666666, ans=0.125 2024-09-20 10:17:06,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=878574.6666666666, ans=0.125 2024-09-20 10:17:08,359 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2024-09-20 10:17:15,912 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=878621.3333333334, ans=0.1 2024-09-20 10:17:20,688 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=878621.3333333334, ans=0.05 2024-09-20 10:17:28,071 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.75 vs. limit=22.5 2024-09-20 10:17:32,298 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=878668.0, ans=0.125 2024-09-20 10:17:37,129 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=878668.0, ans=0.025 2024-09-20 10:17:48,709 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=878714.6666666666, ans=0.125 2024-09-20 10:18:02,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=878714.6666666666, ans=0.125 2024-09-20 10:18:02,817 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=768, metric=4.42 vs. limit=15.0 2024-09-20 10:18:23,585 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=878808.0, ans=0.0 2024-09-20 10:18:24,857 INFO [train.py:1198] (1/2) Epoch 49, batch 2250, loss[loss=0.2034, simple_loss=0.2631, pruned_loss=0.0535, ctc_loss=0.1105, cr_loss=0.367, over 34433.00 frames. ], tot_loss[loss=0.1974, simple_loss=0.2571, pruned_loss=0.05023, ctc_loss=0.1095, cr_loss=0.3807, over 6780965.93 frames. ], batch size: 95, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:18:44,477 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.199e+02 2.721e+02 3.201e+02 3.648e+02 5.805e+02, threshold=6.402e+02, percent-clipped=1.0 2024-09-20 10:18:50,078 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=15.0 2024-09-20 10:19:28,613 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=878994.6666666666, ans=0.0 2024-09-20 10:19:45,413 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=879041.3333333334, ans=10.0 2024-09-20 10:19:46,604 INFO [train.py:1198] (1/2) Epoch 49, batch 2300, loss[loss=0.1757, simple_loss=0.2355, pruned_loss=0.04203, ctc_loss=0.09059, cr_loss=0.3412, over 34287.00 frames. ], tot_loss[loss=0.1964, simple_loss=0.2562, pruned_loss=0.04987, ctc_loss=0.1089, cr_loss=0.3788, over 6766039.45 frames. ], batch size: 83, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:19:53,464 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=879041.3333333334, ans=0.125 2024-09-20 10:20:33,245 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=879134.6666666666, ans=0.0 2024-09-20 10:21:10,953 INFO [train.py:1198] (1/2) Epoch 49, batch 2350, loss[loss=0.1972, simple_loss=0.2593, pruned_loss=0.04953, ctc_loss=0.1041, cr_loss=0.3793, over 34715.00 frames. ], tot_loss[loss=0.1969, simple_loss=0.2566, pruned_loss=0.05009, ctc_loss=0.1094, cr_loss=0.3798, over 6773625.13 frames. ], batch size: 97, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:21:30,432 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.191e+02 2.516e+02 2.799e+02 3.536e+02 5.213e+02, threshold=5.597e+02, percent-clipped=0.0 2024-09-20 10:21:30,999 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.536e-02 2024-09-20 10:21:57,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=879368.0, ans=0.07 2024-09-20 10:22:11,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=879414.6666666666, ans=0.125 2024-09-20 10:22:25,287 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:22:34,285 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.82 vs. limit=15.0 2024-09-20 10:22:34,790 INFO [train.py:1198] (1/2) Epoch 49, batch 2400, loss[loss=0.1842, simple_loss=0.2437, pruned_loss=0.04516, ctc_loss=0.1006, cr_loss=0.3535, over 34579.00 frames. ], tot_loss[loss=0.1976, simple_loss=0.2573, pruned_loss=0.05029, ctc_loss=0.1099, cr_loss=0.3816, over 6777883.17 frames. ], batch size: 89, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:22:38,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=879508.0, ans=0.025 2024-09-20 10:22:43,211 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=879508.0, ans=0.125 2024-09-20 10:22:44,770 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=879508.0, ans=0.0 2024-09-20 10:23:04,948 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=879554.6666666666, ans=0.0 2024-09-20 10:23:07,072 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=8.26 vs. limit=15.0 2024-09-20 10:23:46,115 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=879694.6666666666, ans=0.0 2024-09-20 10:23:51,109 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=879694.6666666666, ans=0.0 2024-09-20 10:23:57,317 INFO [train.py:1198] (1/2) Epoch 49, batch 2450, loss[loss=0.1917, simple_loss=0.2559, pruned_loss=0.04653, ctc_loss=0.1017, cr_loss=0.3513, over 34421.00 frames. ], tot_loss[loss=0.1986, simple_loss=0.2583, pruned_loss=0.05069, ctc_loss=0.1107, cr_loss=0.3826, over 6750195.57 frames. ], batch size: 95, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:23:57,636 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=879741.3333333334, ans=0.125 2024-09-20 10:24:18,492 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.235e+02 2.707e+02 3.092e+02 3.809e+02 5.148e+02, threshold=6.184e+02, percent-clipped=0.0 2024-09-20 10:24:34,252 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.69 vs. limit=15.0 2024-09-20 10:24:48,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=879881.3333333334, ans=0.125 2024-09-20 10:25:17,581 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.95 vs. limit=15.0 2024-09-20 10:25:22,961 INFO [train.py:1198] (1/2) Epoch 49, batch 2500, loss[loss=0.2005, simple_loss=0.2649, pruned_loss=0.04933, ctc_loss=0.1112, cr_loss=0.3806, over 34446.00 frames. ], tot_loss[loss=0.1987, simple_loss=0.2582, pruned_loss=0.05088, ctc_loss=0.1109, cr_loss=0.3831, over 6762369.37 frames. ], batch size: 100, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:25:42,073 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.09 vs. limit=6.0 2024-09-20 10:25:56,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=880068.0, ans=0.0 2024-09-20 10:26:21,178 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=880114.6666666666, ans=0.0 2024-09-20 10:26:27,725 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=880161.3333333334, ans=0.125 2024-09-20 10:26:45,444 INFO [train.py:1198] (1/2) Epoch 49, batch 2550, loss[loss=0.1808, simple_loss=0.2335, pruned_loss=0.04688, ctc_loss=0.1021, cr_loss=0.3473, over 34152.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2578, pruned_loss=0.05068, ctc_loss=0.1105, cr_loss=0.3827, over 6766910.76 frames. ], batch size: 78, lr: 2.42e-03, grad_scale: 16.0 2024-09-20 10:27:03,906 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=880254.6666666666, ans=0.1 2024-09-20 10:27:06,664 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.274e+02 2.635e+02 3.125e+02 3.697e+02 6.327e+02, threshold=6.250e+02, percent-clipped=1.0 2024-09-20 10:27:31,660 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=880301.3333333334, ans=0.125 2024-09-20 10:27:35,123 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=880348.0, ans=0.0 2024-09-20 10:27:45,043 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=880348.0, ans=0.125 2024-09-20 10:27:48,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=880348.0, ans=0.125 2024-09-20 10:27:53,347 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer1.prob, batch_count=880394.6666666666, ans=0.125 2024-09-20 10:28:08,475 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=880441.3333333334, ans=0.125 2024-09-20 10:28:09,968 INFO [train.py:1198] (1/2) Epoch 49, batch 2600, loss[loss=0.1772, simple_loss=0.2385, pruned_loss=0.04202, ctc_loss=0.09283, cr_loss=0.3328, over 34728.00 frames. ], tot_loss[loss=0.1984, simple_loss=0.2581, pruned_loss=0.05065, ctc_loss=0.1105, cr_loss=0.383, over 6761942.06 frames. ], batch size: 92, lr: 2.42e-03, grad_scale: 16.0 2024-09-20 10:28:27,300 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=15.0 2024-09-20 10:28:33,116 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=880488.0, ans=0.125 2024-09-20 10:28:37,934 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=880488.0, ans=0.2 2024-09-20 10:28:41,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=880534.6666666666, ans=0.125 2024-09-20 10:28:49,450 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=880534.6666666666, ans=0.125 2024-09-20 10:29:04,366 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=880581.3333333334, ans=0.0 2024-09-20 10:29:15,870 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:29:28,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=880628.0, ans=0.125 2024-09-20 10:29:30,967 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.90 vs. limit=10.0 2024-09-20 10:29:33,536 INFO [train.py:1198] (1/2) Epoch 49, batch 2650, loss[loss=0.2055, simple_loss=0.2704, pruned_loss=0.05135, ctc_loss=0.1109, cr_loss=0.3914, over 34301.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.2584, pruned_loss=0.05057, ctc_loss=0.1104, cr_loss=0.3826, over 6769970.00 frames. ], batch size: 117, lr: 2.42e-03, grad_scale: 16.0 2024-09-20 10:29:37,876 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.58 vs. limit=15.0 2024-09-20 10:29:54,711 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.147e+02 2.508e+02 2.833e+02 3.228e+02 6.245e+02, threshold=5.665e+02, percent-clipped=0.0 2024-09-20 10:29:56,782 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=880721.3333333334, ans=0.025 2024-09-20 10:30:09,652 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=880768.0, ans=0.125 2024-09-20 10:30:09,671 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=880768.0, ans=0.1 2024-09-20 10:30:34,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=880814.6666666666, ans=0.0 2024-09-20 10:30:55,048 INFO [train.py:1198] (1/2) Epoch 49, batch 2700, loss[loss=0.2072, simple_loss=0.2692, pruned_loss=0.05316, ctc_loss=0.116, cr_loss=0.3895, over 34615.00 frames. ], tot_loss[loss=0.1989, simple_loss=0.2588, pruned_loss=0.05078, ctc_loss=0.1108, cr_loss=0.3831, over 6765426.15 frames. ], batch size: 102, lr: 2.42e-03, grad_scale: 16.0 2024-09-20 10:31:05,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=880908.0, ans=0.125 2024-09-20 10:31:05,150 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=880908.0, ans=0.125 2024-09-20 10:31:21,523 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=880954.6666666666, ans=0.0 2024-09-20 10:31:58,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=881048.0, ans=0.125 2024-09-20 10:32:09,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=881094.6666666666, ans=0.025 2024-09-20 10:32:19,775 INFO [train.py:1198] (1/2) Epoch 49, batch 2750, loss[loss=0.1819, simple_loss=0.2381, pruned_loss=0.04523, ctc_loss=0.1015, cr_loss=0.3744, over 34656.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.2578, pruned_loss=0.05041, ctc_loss=0.11, cr_loss=0.3811, over 6763183.82 frames. ], batch size: 88, lr: 2.42e-03, grad_scale: 16.0 2024-09-20 10:32:26,697 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=881141.3333333334, ans=0.1 2024-09-20 10:32:30,517 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=22.5 2024-09-20 10:32:43,125 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.229e+02 2.672e+02 3.107e+02 3.787e+02 6.236e+02, threshold=6.214e+02, percent-clipped=1.0 2024-09-20 10:32:45,054 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=881188.0, ans=0.125 2024-09-20 10:33:44,287 INFO [train.py:1198] (1/2) Epoch 49, batch 2800, loss[loss=0.2111, simple_loss=0.2695, pruned_loss=0.05565, ctc_loss=0.1253, cr_loss=0.4061, over 23601.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2579, pruned_loss=0.05065, ctc_loss=0.1105, cr_loss=0.3821, over 6741487.03 frames. ], batch size: 244, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:33:46,457 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=881374.6666666666, ans=0.125 2024-09-20 10:34:11,001 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=881421.3333333334, ans=0.0 2024-09-20 10:34:12,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=881421.3333333334, ans=0.025 2024-09-20 10:34:17,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=881468.0, ans=0.125 2024-09-20 10:34:30,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=881468.0, ans=0.125 2024-09-20 10:34:39,579 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.56 vs. limit=22.5 2024-09-20 10:35:06,696 INFO [train.py:1198] (1/2) Epoch 49, batch 2850, loss[loss=0.1939, simple_loss=0.2532, pruned_loss=0.04896, ctc_loss=0.106, cr_loss=0.3896, over 34484.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.2584, pruned_loss=0.05086, ctc_loss=0.111, cr_loss=0.3835, over 6725796.66 frames. ], batch size: 90, lr: 2.42e-03, grad_scale: 32.0 2024-09-20 10:35:22,461 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.97 vs. limit=22.5 2024-09-20 10:35:30,159 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.254e+02 2.602e+02 2.980e+02 3.618e+02 6.669e+02, threshold=5.960e+02, percent-clipped=2.0 2024-09-20 10:35:30,499 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=881654.6666666666, ans=0.5 2024-09-20 10:35:38,813 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:35:51,124 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.67 vs. limit=15.0 2024-09-20 10:35:55,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=881701.3333333334, ans=0.125 2024-09-20 10:36:00,289 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=881748.0, ans=0.2 2024-09-20 10:36:03,752 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=881748.0, ans=0.125 2024-09-20 10:36:05,136 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=881748.0, ans=0.05 2024-09-20 10:36:31,461 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=881841.3333333334, ans=0.125 2024-09-20 10:36:32,833 INFO [train.py:1198] (1/2) Epoch 49, batch 2900, loss[loss=0.1876, simple_loss=0.2464, pruned_loss=0.04706, ctc_loss=0.1018, cr_loss=0.359, over 34523.00 frames. ], tot_loss[loss=0.1999, simple_loss=0.2596, pruned_loss=0.05125, ctc_loss=0.1116, cr_loss=0.3856, over 6756182.72 frames. ], batch size: 94, lr: 2.42e-03, grad_scale: 16.0 2024-09-20 10:37:12,806 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=881934.6666666666, ans=0.125 2024-09-20 10:37:47,424 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=882028.0, ans=0.125 2024-09-20 10:37:55,349 INFO [train.py:1198] (1/2) Epoch 49, batch 2950, loss[loss=0.1922, simple_loss=0.2444, pruned_loss=0.05117, ctc_loss=0.1116, cr_loss=0.3847, over 34628.00 frames. ], tot_loss[loss=0.1984, simple_loss=0.258, pruned_loss=0.05067, ctc_loss=0.1105, cr_loss=0.3831, over 6750135.50 frames. ], batch size: 88, lr: 2.42e-03, grad_scale: 16.0 2024-09-20 10:38:18,372 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.173e+02 2.671e+02 3.328e+02 4.021e+02 6.046e+02, threshold=6.656e+02, percent-clipped=1.0 2024-09-20 10:38:30,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=882168.0, ans=0.025 2024-09-20 10:38:33,707 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=882168.0, ans=0.125 2024-09-20 10:38:48,471 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=882214.6666666666, ans=0.0 2024-09-20 10:38:54,161 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.85 vs. limit=15.0 2024-09-20 10:38:58,175 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=882214.6666666666, ans=0.0 2024-09-20 10:39:01,802 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.43 vs. limit=15.0 2024-09-20 10:39:05,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.93 vs. limit=22.5 2024-09-20 10:39:09,582 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=882261.3333333334, ans=0.125 2024-09-20 10:39:19,569 INFO [train.py:1198] (1/2) Epoch 49, batch 3000, loss[loss=0.196, simple_loss=0.258, pruned_loss=0.04857, ctc_loss=0.1072, cr_loss=0.3835, over 34557.00 frames. ], tot_loss[loss=0.1982, simple_loss=0.2578, pruned_loss=0.0506, ctc_loss=0.1103, cr_loss=0.3831, over 6751146.06 frames. ], batch size: 94, lr: 2.42e-03, grad_scale: 16.0 2024-09-20 10:39:19,569 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 10:39:36,361 INFO [train.py:1230] (1/2) Epoch 49, validation: loss=0.1478, simple_loss=0.2412, pruned_loss=0.02334, ctc_loss=0.03813, cr_loss=2.39e-14, over 944034.00 frames. 2024-09-20 10:39:36,361 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-20 10:39:54,137 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.76 vs. limit=6.0 2024-09-20 10:40:16,021 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=882401.3333333334, ans=0.125 2024-09-20 10:40:59,627 INFO [train.py:1198] (1/2) Epoch 49, batch 3050, loss[loss=0.191, simple_loss=0.2473, pruned_loss=0.04895, ctc_loss=0.1089, cr_loss=0.3757, over 34575.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.2584, pruned_loss=0.05084, ctc_loss=0.1108, cr_loss=0.3841, over 6744181.17 frames. ], batch size: 89, lr: 2.42e-03, grad_scale: 16.0 2024-09-20 10:41:09,603 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=882541.3333333334, ans=0.1 2024-09-20 10:41:20,740 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=882588.0, ans=0.125 2024-09-20 10:41:20,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=882588.0, ans=0.125 2024-09-20 10:41:21,983 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 1.995e+02 2.676e+02 2.959e+02 3.631e+02 6.422e+02, threshold=5.917e+02, percent-clipped=0.0 2024-09-20 10:41:31,280 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=882634.6666666666, ans=6.0 2024-09-20 10:42:12,756 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.91 vs. limit=22.5 2024-09-20 10:42:18,618 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=882774.6666666666, ans=0.125 2024-09-20 10:42:19,935 INFO [train.py:1198] (1/2) Epoch 49, batch 3100, loss[loss=0.2086, simple_loss=0.2725, pruned_loss=0.05269, ctc_loss=0.1161, cr_loss=0.4008, over 34187.00 frames. ], tot_loss[loss=0.1981, simple_loss=0.2578, pruned_loss=0.05058, ctc_loss=0.1103, cr_loss=0.3826, over 6743222.70 frames. ], batch size: 117, lr: 2.42e-03, grad_scale: 16.0 2024-09-20 10:42:23,518 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=882774.6666666666, ans=0.0 2024-09-20 10:42:25,035 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=882774.6666666666, ans=0.125 2024-09-20 10:42:32,953 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=882774.6666666666, ans=0.125 2024-09-20 10:42:36,856 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=768, metric=2.84 vs. limit=15.0 2024-09-20 10:42:52,883 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=22.5 2024-09-20 10:42:55,739 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=882868.0, ans=0.025 2024-09-20 10:43:17,052 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.ff2_skip_rate, batch_count=882914.6666666666, ans=0.0 2024-09-20 10:43:39,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=883008.0, ans=0.125 2024-09-20 10:43:41,100 INFO [train.py:1198] (1/2) Epoch 49, batch 3150, loss[loss=0.2005, simple_loss=0.2617, pruned_loss=0.05077, ctc_loss=0.1121, cr_loss=0.3826, over 33857.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.2581, pruned_loss=0.05073, ctc_loss=0.1106, cr_loss=0.383, over 6749369.96 frames. ], batch size: 122, lr: 2.42e-03, grad_scale: 16.0 2024-09-20 10:43:52,522 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=883008.0, ans=0.1 2024-09-20 10:43:58,135 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=768, metric=14.64 vs. limit=22.5 2024-09-20 10:44:03,787 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.204e+02 2.555e+02 3.067e+02 3.866e+02 7.623e+02, threshold=6.133e+02, percent-clipped=3.0 2024-09-20 10:44:28,382 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=883148.0, ans=0.025 2024-09-20 10:44:49,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer2.prob, batch_count=883194.6666666666, ans=0.125 2024-09-20 10:45:03,631 INFO [train.py:1198] (1/2) Epoch 49, batch 3200, loss[loss=0.2007, simple_loss=0.2574, pruned_loss=0.05284, ctc_loss=0.1127, cr_loss=0.3956, over 34526.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.2575, pruned_loss=0.05052, ctc_loss=0.1101, cr_loss=0.3821, over 6763772.19 frames. ], batch size: 94, lr: 2.41e-03, grad_scale: 32.0 2024-09-20 10:45:17,360 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=4.91 vs. limit=10.0 2024-09-20 10:45:28,535 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.96 vs. limit=22.5 2024-09-20 10:45:58,106 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.23 vs. limit=10.0 2024-09-20 10:46:25,846 INFO [train.py:1198] (1/2) Epoch 49, batch 3250, loss[loss=0.2158, simple_loss=0.2764, pruned_loss=0.05689, ctc_loss=0.1218, cr_loss=0.4275, over 34656.00 frames. ], tot_loss[loss=0.1982, simple_loss=0.2579, pruned_loss=0.05064, ctc_loss=0.1103, cr_loss=0.3826, over 6772174.96 frames. ], batch size: 98, lr: 2.41e-03, grad_scale: 32.0 2024-09-20 10:46:30,965 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=883474.6666666666, ans=0.0 2024-09-20 10:46:48,398 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.255e+02 2.516e+02 2.874e+02 3.413e+02 5.942e+02, threshold=5.749e+02, percent-clipped=0.0 2024-09-20 10:47:04,571 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:47:11,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=883568.0, ans=0.125 2024-09-20 10:47:16,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.balancer2.prob, batch_count=883614.6666666666, ans=0.125 2024-09-20 10:47:22,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=883614.6666666666, ans=0.0 2024-09-20 10:47:22,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=883614.6666666666, ans=0.07 2024-09-20 10:47:34,248 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2024-09-20 10:47:46,115 INFO [train.py:1198] (1/2) Epoch 49, batch 3300, loss[loss=0.2124, simple_loss=0.2749, pruned_loss=0.05524, ctc_loss=0.119, cr_loss=0.388, over 33038.00 frames. ], tot_loss[loss=0.1973, simple_loss=0.2569, pruned_loss=0.05033, ctc_loss=0.1096, cr_loss=0.3806, over 6771441.66 frames. ], batch size: 130, lr: 2.41e-03, grad_scale: 32.0 2024-09-20 10:48:04,408 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_skip_rate, batch_count=883754.6666666666, ans=0.0 2024-09-20 10:48:06,222 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2024-09-20 10:48:10,537 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=883754.6666666666, ans=0.125 2024-09-20 10:48:32,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=768, metric=15.30 vs. limit=22.5 2024-09-20 10:48:56,425 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=883894.6666666666, ans=0.1 2024-09-20 10:49:07,291 INFO [train.py:1198] (1/2) Epoch 49, batch 3350, loss[loss=0.2003, simple_loss=0.2615, pruned_loss=0.05075, ctc_loss=0.1114, cr_loss=0.3821, over 33819.00 frames. ], tot_loss[loss=0.1981, simple_loss=0.2578, pruned_loss=0.05059, ctc_loss=0.1102, cr_loss=0.3816, over 6745336.47 frames. ], batch size: 122, lr: 2.41e-03, grad_scale: 32.0 2024-09-20 10:49:12,780 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:49:30,078 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.166e+02 2.538e+02 2.925e+02 3.549e+02 6.977e+02, threshold=5.851e+02, percent-clipped=1.0 2024-09-20 10:49:36,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=883988.0, ans=0.2 2024-09-20 10:49:51,553 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=884034.6666666666, ans=0.125 2024-09-20 10:50:12,062 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=884128.0, ans=0.125 2024-09-20 10:50:26,497 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=884128.0, ans=0.125 2024-09-20 10:50:29,391 INFO [train.py:1198] (1/2) Epoch 49, batch 3400, loss[loss=0.1777, simple_loss=0.2354, pruned_loss=0.04318, ctc_loss=0.09818, cr_loss=0.3513, over 34224.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.258, pruned_loss=0.05077, ctc_loss=0.1105, cr_loss=0.3822, over 6734455.80 frames. ], batch size: 78, lr: 2.41e-03, grad_scale: 32.0 2024-09-20 10:50:47,265 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=884221.3333333334, ans=0.1 2024-09-20 10:51:07,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=884268.0, ans=0.125 2024-09-20 10:51:34,875 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=884361.3333333334, ans=0.125 2024-09-20 10:51:39,546 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=884361.3333333334, ans=0.04949747468305833 2024-09-20 10:51:39,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=884361.3333333334, ans=0.05 2024-09-20 10:51:43,571 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.55 vs. limit=12.0 2024-09-20 10:51:44,641 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=15.0 2024-09-20 10:51:50,316 INFO [train.py:1198] (1/2) Epoch 49, batch 3450, loss[loss=0.2055, simple_loss=0.2678, pruned_loss=0.05203, ctc_loss=0.1145, cr_loss=0.4053, over 32959.00 frames. ], tot_loss[loss=0.1987, simple_loss=0.2584, pruned_loss=0.05081, ctc_loss=0.1107, cr_loss=0.3826, over 6746139.70 frames. ], batch size: 130, lr: 2.41e-03, grad_scale: 32.0 2024-09-20 10:52:10,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=884454.6666666666, ans=0.125 2024-09-20 10:52:12,815 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.158e+02 2.540e+02 2.800e+02 3.282e+02 7.571e+02, threshold=5.599e+02, percent-clipped=3.0 2024-09-20 10:52:14,673 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=884454.6666666666, ans=0.0 2024-09-20 10:52:20,899 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=884501.3333333334, ans=0.0 2024-09-20 10:52:36,998 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=884548.0, ans=0.0 2024-09-20 10:52:39,125 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.38 vs. limit=15.0 2024-09-20 10:52:46,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=884548.0, ans=0.125 2024-09-20 10:52:46,710 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=884548.0, ans=0.125 2024-09-20 10:52:49,941 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=884548.0, ans=0.1 2024-09-20 10:53:10,436 INFO [train.py:1198] (1/2) Epoch 49, batch 3500, loss[loss=0.1769, simple_loss=0.2342, pruned_loss=0.04327, ctc_loss=0.097, cr_loss=0.3408, over 34482.00 frames. ], tot_loss[loss=0.1984, simple_loss=0.258, pruned_loss=0.05071, ctc_loss=0.1105, cr_loss=0.382, over 6748889.98 frames. ], batch size: 85, lr: 2.41e-03, grad_scale: 32.0 2024-09-20 10:53:17,273 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=884641.3333333334, ans=0.04949747468305833 2024-09-20 10:53:33,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=884688.0, ans=0.125 2024-09-20 10:53:33,053 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=884688.0, ans=0.0 2024-09-20 10:54:18,275 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=884828.0, ans=0.125 2024-09-20 10:54:27,515 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=884828.0, ans=0.125 2024-09-20 10:54:30,413 INFO [train.py:1198] (1/2) Epoch 49, batch 3550, loss[loss=0.2086, simple_loss=0.2717, pruned_loss=0.05353, ctc_loss=0.1144, cr_loss=0.3911, over 34353.00 frames. ], tot_loss[loss=0.1987, simple_loss=0.2584, pruned_loss=0.05076, ctc_loss=0.1106, cr_loss=0.3825, over 6758207.17 frames. ], batch size: 103, lr: 2.41e-03, grad_scale: 32.0 2024-09-20 10:54:44,232 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=884874.6666666666, ans=0.125 2024-09-20 10:54:49,247 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=884921.3333333334, ans=0.125 2024-09-20 10:54:53,681 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.173e+02 2.675e+02 3.200e+02 4.384e+02 7.017e+02, threshold=6.400e+02, percent-clipped=9.0 2024-09-20 10:54:54,097 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=884921.3333333334, ans=0.125 2024-09-20 10:55:02,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=884968.0, ans=0.125 2024-09-20 10:55:14,064 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=884968.0, ans=0.125 2024-09-20 10:55:27,293 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=15.0 2024-09-20 10:55:39,721 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=885061.3333333334, ans=0.2 2024-09-20 10:55:42,960 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=885061.3333333334, ans=0.125 2024-09-20 10:55:52,454 INFO [train.py:1198] (1/2) Epoch 49, batch 3600, loss[loss=0.2004, simple_loss=0.2575, pruned_loss=0.05248, ctc_loss=0.1126, cr_loss=0.394, over 34476.00 frames. ], tot_loss[loss=0.199, simple_loss=0.2587, pruned_loss=0.0509, ctc_loss=0.1109, cr_loss=0.3836, over 6768016.91 frames. ], batch size: 90, lr: 2.41e-03, grad_scale: 32.0 2024-09-20 10:56:13,318 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=885154.6666666666, ans=0.125 2024-09-20 10:56:15,271 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.19 vs. limit=15.0 2024-09-20 10:56:25,781 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=885201.3333333334, ans=0.1 2024-09-20 10:56:36,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=885201.3333333334, ans=0.125 2024-09-20 10:56:41,014 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.53 vs. limit=5.0 2024-09-20 10:56:55,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=885294.6666666666, ans=0.1 2024-09-20 10:56:55,943 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=885294.6666666666, ans=0.125 2024-09-20 10:57:11,760 INFO [train.py:1198] (1/2) Epoch 49, batch 3650, loss[loss=0.2155, simple_loss=0.273, pruned_loss=0.05829, ctc_loss=0.124, cr_loss=0.4144, over 34454.00 frames. ], tot_loss[loss=0.1981, simple_loss=0.2578, pruned_loss=0.05058, ctc_loss=0.1103, cr_loss=0.3817, over 6770414.22 frames. ], batch size: 110, lr: 2.41e-03, grad_scale: 16.0 2024-09-20 10:57:35,734 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.173e+02 2.607e+02 3.020e+02 4.015e+02 7.403e+02, threshold=6.040e+02, percent-clipped=4.0 2024-09-20 10:57:52,153 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 10:58:06,316 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=885481.3333333334, ans=0.125 2024-09-20 10:58:19,501 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.02 vs. limit=15.0 2024-09-20 10:58:32,892 INFO [train.py:1198] (1/2) Epoch 49, batch 3700, loss[loss=0.2067, simple_loss=0.2692, pruned_loss=0.05295, ctc_loss=0.1141, cr_loss=0.3896, over 34570.00 frames. ], tot_loss[loss=0.1981, simple_loss=0.2579, pruned_loss=0.05049, ctc_loss=0.1103, cr_loss=0.3819, over 6784996.46 frames. ], batch size: 102, lr: 2.41e-03, grad_scale: 16.0 2024-09-20 10:58:36,799 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.00 vs. limit=22.5 2024-09-20 10:58:41,443 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=885574.6666666666, ans=0.125 2024-09-20 10:58:46,662 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.52 vs. limit=10.0 2024-09-20 10:58:55,859 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=885621.3333333334, ans=0.0 2024-09-20 10:59:07,982 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.05 vs. limit=22.5 2024-09-20 10:59:40,050 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=885761.3333333334, ans=0.125 2024-09-20 10:59:54,256 INFO [train.py:1198] (1/2) Epoch 49, batch 3750, loss[loss=0.2171, simple_loss=0.2742, pruned_loss=0.05929, ctc_loss=0.1245, cr_loss=0.4116, over 34346.00 frames. ], tot_loss[loss=0.201, simple_loss=0.2609, pruned_loss=0.05154, ctc_loss=0.1123, cr_loss=0.3873, over 6786188.66 frames. ], batch size: 113, lr: 2.41e-03, grad_scale: 16.0 2024-09-20 11:00:00,163 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.79 vs. limit=15.0 2024-09-20 11:00:03,714 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.91 vs. limit=15.0 2024-09-20 11:00:04,594 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=885808.0, ans=0.2 2024-09-20 11:00:15,821 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass.skip_rate, batch_count=885854.6666666666, ans=0.09899494936611666 2024-09-20 11:00:16,336 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=13.57 vs. limit=15.0 2024-09-20 11:00:18,604 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.201e+02 2.448e+02 2.609e+02 2.893e+02 5.053e+02, threshold=5.217e+02, percent-clipped=0.0 2024-09-20 11:00:23,805 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=885854.6666666666, ans=0.0 2024-09-20 11:01:09,606 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=885994.6666666666, ans=0.2 2024-09-20 11:01:14,691 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=886041.3333333334, ans=0.2 2024-09-20 11:01:15,944 INFO [train.py:1198] (1/2) Epoch 49, batch 3800, loss[loss=0.2097, simple_loss=0.2682, pruned_loss=0.05638, ctc_loss=0.1161, cr_loss=0.3818, over 29745.00 frames. ], tot_loss[loss=0.2035, simple_loss=0.2631, pruned_loss=0.05269, ctc_loss=0.1145, cr_loss=0.3922, over 6676164.92 frames. ], batch size: 175, lr: 2.41e-03, grad_scale: 16.0 2024-09-20 11:02:02,955 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=886134.6666666666, ans=0.1 2024-09-20 11:02:03,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=886134.6666666666, ans=0.0 2024-09-20 11:02:06,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=886181.3333333334, ans=0.2 2024-09-20 11:02:17,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=886181.3333333334, ans=0.2 2024-09-20 11:02:23,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.const_attention_rate, batch_count=886228.0, ans=0.025 2024-09-20 11:02:24,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.const_attention_rate, batch_count=886228.0, ans=0.025 2024-09-20 11:02:39,630 INFO [train.py:1198] (1/2) Epoch 49, batch 3850, loss[loss=0.2151, simple_loss=0.27, pruned_loss=0.05908, ctc_loss=0.1301, cr_loss=0.3974, over 23970.00 frames. ], tot_loss[loss=0.2063, simple_loss=0.2649, pruned_loss=0.05414, ctc_loss=0.1175, cr_loss=0.3963, over 6251534.83 frames. ], batch size: 244, lr: 2.41e-03, grad_scale: 16.0 2024-09-20 11:02:56,586 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.skip_rate, batch_count=886321.3333333334, ans=0.07 2024-09-20 11:03:04,270 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.364e+02 2.605e+02 2.789e+02 3.072e+02 1.185e+03, threshold=5.578e+02, percent-clipped=1.0 2024-09-20 11:04:05,086 INFO [train.py:1198] (1/2) Epoch 50, batch 0, loss[loss=0.1846, simple_loss=0.244, pruned_loss=0.04561, ctc_loss=0.1003, cr_loss=0.3498, over 34472.00 frames. ], tot_loss[loss=0.1846, simple_loss=0.244, pruned_loss=0.04561, ctc_loss=0.1003, cr_loss=0.3498, over 34472.00 frames. ], batch size: 85, lr: 2.39e-03, grad_scale: 32.0 2024-09-20 11:04:05,086 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 11:04:21,879 INFO [train.py:1230] (1/2) Epoch 50, validation: loss=0.1482, simple_loss=0.2419, pruned_loss=0.02339, ctc_loss=0.03827, cr_loss=2.398e-14, over 944034.00 frames. 2024-09-20 11:04:21,879 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-20 11:04:51,674 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=886442.6666666666, ans=0.125 2024-09-20 11:04:54,849 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=886489.3333333334, ans=0.125 2024-09-20 11:04:59,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.const_attention_rate, batch_count=886489.3333333334, ans=0.025 2024-09-20 11:05:45,715 INFO [train.py:1198] (1/2) Epoch 50, batch 50, loss[loss=0.1711, simple_loss=0.2306, pruned_loss=0.04039, ctc_loss=0.08898, cr_loss=0.324, over 34447.00 frames. ], tot_loss[loss=0.2006, simple_loss=0.2595, pruned_loss=0.05185, ctc_loss=0.1125, cr_loss=0.3898, over 1480864.15 frames. ], batch size: 82, lr: 2.39e-03, grad_scale: 32.0 2024-09-20 11:05:45,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=886629.3333333334, ans=0.0 2024-09-20 11:06:09,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=886676.0, ans=0.125 2024-09-20 11:06:16,730 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.29 vs. limit=15.0 2024-09-20 11:06:17,778 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=886722.6666666666, ans=0.125 2024-09-20 11:06:29,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=886722.6666666666, ans=0.1 2024-09-20 11:06:29,338 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=886722.6666666666, ans=0.125 2024-09-20 11:06:29,514 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=886722.6666666666, ans=0.0 2024-09-20 11:06:52,506 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.205e+02 2.594e+02 2.877e+02 3.514e+02 6.002e+02, threshold=5.753e+02, percent-clipped=3.0 2024-09-20 11:06:58,210 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=5.69 vs. limit=10.0 2024-09-20 11:07:04,262 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=886816.0, ans=0.125 2024-09-20 11:07:05,929 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=886816.0, ans=0.125 2024-09-20 11:07:10,504 INFO [train.py:1198] (1/2) Epoch 50, batch 100, loss[loss=0.1897, simple_loss=0.25, pruned_loss=0.04713, ctc_loss=0.1044, cr_loss=0.3575, over 34576.00 frames. ], tot_loss[loss=0.2019, simple_loss=0.2612, pruned_loss=0.05218, ctc_loss=0.1134, cr_loss=0.3898, over 2629809.20 frames. ], batch size: 89, lr: 2.39e-03, grad_scale: 32.0 2024-09-20 11:07:38,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.43 vs. limit=22.5 2024-09-20 11:07:46,535 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=886956.0, ans=0.1 2024-09-20 11:08:01,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=887002.6666666666, ans=0.2 2024-09-20 11:08:15,713 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=887049.3333333334, ans=0.2 2024-09-20 11:08:31,584 INFO [train.py:1198] (1/2) Epoch 50, batch 150, loss[loss=0.1723, simple_loss=0.2326, pruned_loss=0.04044, ctc_loss=0.08848, cr_loss=0.3347, over 34461.00 frames. ], tot_loss[loss=0.1994, simple_loss=0.259, pruned_loss=0.05109, ctc_loss=0.1114, cr_loss=0.3851, over 3557553.70 frames. ], batch size: 82, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:08:33,635 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=887096.0, ans=0.1 2024-09-20 11:08:38,400 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=887096.0, ans=0.0 2024-09-20 11:08:53,261 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=887142.6666666666, ans=0.125 2024-09-20 11:08:54,877 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=887142.6666666666, ans=0.1 2024-09-20 11:09:13,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=887189.3333333334, ans=0.0 2024-09-20 11:09:22,830 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.balancer1.prob, batch_count=887236.0, ans=0.125 2024-09-20 11:09:24,246 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=887236.0, ans=0.125 2024-09-20 11:09:27,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=887236.0, ans=0.0 2024-09-20 11:09:30,896 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=887236.0, ans=0.125 2024-09-20 11:09:36,879 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.086e+02 2.622e+02 3.123e+02 4.280e+02 6.642e+02, threshold=6.247e+02, percent-clipped=3.0 2024-09-20 11:09:45,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=887282.6666666666, ans=0.125 2024-09-20 11:09:51,808 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:09:54,678 INFO [train.py:1198] (1/2) Epoch 50, batch 200, loss[loss=0.2041, simple_loss=0.2649, pruned_loss=0.0526, ctc_loss=0.1152, cr_loss=0.3764, over 31807.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2574, pruned_loss=0.05043, ctc_loss=0.1101, cr_loss=0.382, over 4271334.77 frames. ], batch size: 145, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:09:58,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=887329.3333333334, ans=0.0 2024-09-20 11:09:59,837 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=887329.3333333334, ans=0.125 2024-09-20 11:10:02,118 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.30 vs. limit=6.0 2024-09-20 11:10:12,867 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=887376.0, ans=0.125 2024-09-20 11:10:14,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=887376.0, ans=0.125 2024-09-20 11:10:18,652 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.32 vs. limit=22.5 2024-09-20 11:10:31,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=6.34 vs. limit=15.0 2024-09-20 11:10:42,497 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.18 vs. limit=15.0 2024-09-20 11:11:18,947 INFO [train.py:1198] (1/2) Epoch 50, batch 250, loss[loss=0.2196, simple_loss=0.2781, pruned_loss=0.05947, ctc_loss=0.1267, cr_loss=0.4203, over 34261.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.2576, pruned_loss=0.05044, ctc_loss=0.1101, cr_loss=0.3818, over 4834050.34 frames. ], batch size: 117, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:11:19,380 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=887562.6666666666, ans=0.0 2024-09-20 11:11:24,207 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=887562.6666666666, ans=0.05 2024-09-20 11:11:29,950 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2024-09-20 11:12:22,923 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.66 vs. limit=15.0 2024-09-20 11:12:23,522 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.181e+02 2.541e+02 2.790e+02 3.818e+02 7.407e+02, threshold=5.581e+02, percent-clipped=8.0 2024-09-20 11:12:25,604 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=887749.3333333334, ans=0.0 2024-09-20 11:12:33,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.14 vs. limit=15.0 2024-09-20 11:12:41,585 INFO [train.py:1198] (1/2) Epoch 50, batch 300, loss[loss=0.2238, simple_loss=0.2804, pruned_loss=0.06215, ctc_loss=0.128, cr_loss=0.4319, over 34355.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2573, pruned_loss=0.05046, ctc_loss=0.1101, cr_loss=0.3824, over 5262940.33 frames. ], batch size: 107, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:12:57,434 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.76 vs. limit=15.0 2024-09-20 11:12:57,509 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.04 vs. limit=10.0 2024-09-20 11:14:05,721 INFO [train.py:1198] (1/2) Epoch 50, batch 350, loss[loss=0.1792, simple_loss=0.2397, pruned_loss=0.04251, ctc_loss=0.09857, cr_loss=0.3497, over 34264.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.2581, pruned_loss=0.05073, ctc_loss=0.1107, cr_loss=0.3828, over 5596855.46 frames. ], batch size: 83, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:14:14,165 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.const_attention_rate, batch_count=888029.3333333334, ans=0.025 2024-09-20 11:14:27,102 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2024-09-20 11:14:29,778 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.98 vs. limit=15.0 2024-09-20 11:14:46,428 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.92 vs. limit=15.0 2024-09-20 11:14:47,832 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.76 vs. limit=15.0 2024-09-20 11:14:50,700 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=888122.6666666666, ans=0.025 2024-09-20 11:15:11,556 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.153e+02 2.590e+02 3.074e+02 3.928e+02 6.504e+02, threshold=6.147e+02, percent-clipped=3.0 2024-09-20 11:15:29,380 INFO [train.py:1198] (1/2) Epoch 50, batch 400, loss[loss=0.2038, simple_loss=0.263, pruned_loss=0.05328, ctc_loss=0.113, cr_loss=0.3835, over 34398.00 frames. ], tot_loss[loss=0.1977, simple_loss=0.2576, pruned_loss=0.05031, ctc_loss=0.1098, cr_loss=0.381, over 5863472.49 frames. ], batch size: 95, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:15:29,720 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=888262.6666666666, ans=0.025 2024-09-20 11:15:36,306 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=888262.6666666666, ans=0.2 2024-09-20 11:15:41,088 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=888262.6666666666, ans=0.125 2024-09-20 11:16:04,234 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=888356.0, ans=0.125 2024-09-20 11:16:42,570 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=888449.3333333334, ans=0.0 2024-09-20 11:16:43,260 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.82 vs. limit=12.0 2024-09-20 11:16:53,716 INFO [train.py:1198] (1/2) Epoch 50, batch 450, loss[loss=0.2045, simple_loss=0.2662, pruned_loss=0.05217, ctc_loss=0.1138, cr_loss=0.3909, over 34691.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.2576, pruned_loss=0.05043, ctc_loss=0.1101, cr_loss=0.3813, over 6052720.43 frames. ], batch size: 97, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:17:21,694 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=15.0 2024-09-20 11:17:33,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=888589.3333333334, ans=0.07 2024-09-20 11:17:35,925 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.56 vs. limit=15.0 2024-09-20 11:17:47,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=888636.0, ans=0.0 2024-09-20 11:17:53,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=888636.0, ans=0.125 2024-09-20 11:17:53,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.const_attention_rate, batch_count=888636.0, ans=0.025 2024-09-20 11:17:58,513 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.146e+02 2.465e+02 2.792e+02 3.526e+02 5.420e+02, threshold=5.585e+02, percent-clipped=0.0 2024-09-20 11:18:18,824 INFO [train.py:1198] (1/2) Epoch 50, batch 500, loss[loss=0.2253, simple_loss=0.2841, pruned_loss=0.06195, ctc_loss=0.1269, cr_loss=0.4282, over 34464.00 frames. ], tot_loss[loss=0.1971, simple_loss=0.2569, pruned_loss=0.05014, ctc_loss=0.1095, cr_loss=0.3801, over 6219870.74 frames. ], batch size: 110, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:18:43,958 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=888776.0, ans=0.125 2024-09-20 11:18:45,631 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=888776.0, ans=10.0 2024-09-20 11:19:02,255 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=888822.6666666666, ans=0.125 2024-09-20 11:19:33,358 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=888916.0, ans=0.125 2024-09-20 11:19:41,110 INFO [train.py:1198] (1/2) Epoch 50, batch 550, loss[loss=0.1951, simple_loss=0.2627, pruned_loss=0.04584, ctc_loss=0.1057, cr_loss=0.3692, over 33858.00 frames. ], tot_loss[loss=0.1972, simple_loss=0.2569, pruned_loss=0.05018, ctc_loss=0.1096, cr_loss=0.3803, over 6329655.28 frames. ], batch size: 122, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:19:51,373 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=888962.6666666666, ans=0.0 2024-09-20 11:19:52,980 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=888962.6666666666, ans=0.125 2024-09-20 11:20:08,153 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.54 vs. limit=15.0 2024-09-20 11:20:14,216 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=889056.0, ans=0.125 2024-09-20 11:20:37,311 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=889102.6666666666, ans=0.125 2024-09-20 11:20:48,520 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.217e+02 2.560e+02 2.854e+02 3.480e+02 7.475e+02, threshold=5.708e+02, percent-clipped=1.0 2024-09-20 11:20:58,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=889149.3333333334, ans=0.125 2024-09-20 11:21:02,145 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:21:03,932 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=889196.0, ans=0.0 2024-09-20 11:21:05,146 INFO [train.py:1198] (1/2) Epoch 50, batch 600, loss[loss=0.2216, simple_loss=0.28, pruned_loss=0.06024, ctc_loss=0.1263, cr_loss=0.4369, over 34173.00 frames. ], tot_loss[loss=0.1975, simple_loss=0.2572, pruned_loss=0.05029, ctc_loss=0.1098, cr_loss=0.3809, over 6430756.87 frames. ], batch size: 117, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 11:21:12,594 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=4.51 vs. limit=12.0 2024-09-20 11:21:15,886 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.93 vs. limit=15.0 2024-09-20 11:21:21,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=889242.6666666666, ans=0.0 2024-09-20 11:21:22,663 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=8.85 vs. limit=22.5 2024-09-20 11:21:57,389 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=889336.0, ans=0.125 2024-09-20 11:22:06,722 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.37 vs. limit=15.0 2024-09-20 11:22:22,576 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=889382.6666666666, ans=0.0 2024-09-20 11:22:28,576 INFO [train.py:1198] (1/2) Epoch 50, batch 650, loss[loss=0.1955, simple_loss=0.2589, pruned_loss=0.04766, ctc_loss=0.105, cr_loss=0.3931, over 34516.00 frames. ], tot_loss[loss=0.1967, simple_loss=0.2565, pruned_loss=0.04994, ctc_loss=0.109, cr_loss=0.379, over 6521218.31 frames. ], batch size: 94, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 11:23:05,486 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.const_attention_rate, batch_count=889522.6666666666, ans=0.025 2024-09-20 11:23:13,815 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:23:32,433 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=22.5 2024-09-20 11:23:34,467 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.129e+02 2.558e+02 3.042e+02 3.936e+02 8.162e+02, threshold=6.084e+02, percent-clipped=4.0 2024-09-20 11:23:38,098 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=889616.0, ans=0.1 2024-09-20 11:23:50,719 INFO [train.py:1198] (1/2) Epoch 50, batch 700, loss[loss=0.1947, simple_loss=0.2527, pruned_loss=0.04983, ctc_loss=0.1097, cr_loss=0.3777, over 34587.00 frames. ], tot_loss[loss=0.1971, simple_loss=0.257, pruned_loss=0.05006, ctc_loss=0.1095, cr_loss=0.3804, over 6577883.52 frames. ], batch size: 89, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 11:24:04,400 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:24:12,431 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:24:18,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=889709.3333333334, ans=0.125 2024-09-20 11:24:55,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=889802.6666666666, ans=0.0 2024-09-20 11:24:57,149 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=889849.3333333334, ans=0.1 2024-09-20 11:25:03,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=889849.3333333334, ans=0.125 2024-09-20 11:25:04,506 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=5.31 vs. limit=15.0 2024-09-20 11:25:14,931 INFO [train.py:1198] (1/2) Epoch 50, batch 750, loss[loss=0.2078, simple_loss=0.2668, pruned_loss=0.05428, ctc_loss=0.1175, cr_loss=0.4166, over 34437.00 frames. ], tot_loss[loss=0.1972, simple_loss=0.257, pruned_loss=0.05016, ctc_loss=0.1097, cr_loss=0.3809, over 6623709.31 frames. ], batch size: 95, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 11:25:17,299 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.37 vs. limit=22.5 2024-09-20 11:25:18,375 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=889896.0, ans=0.0 2024-09-20 11:25:41,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.76 vs. limit=15.0 2024-09-20 11:25:42,548 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=889942.6666666666, ans=0.125 2024-09-20 11:25:58,090 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=889989.3333333334, ans=0.2 2024-09-20 11:26:01,324 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=889989.3333333334, ans=0.0 2024-09-20 11:26:04,703 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=890036.0, ans=0.125 2024-09-20 11:26:22,563 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.192e+02 2.560e+02 2.860e+02 3.424e+02 6.053e+02, threshold=5.720e+02, percent-clipped=0.0 2024-09-20 11:26:38,936 INFO [train.py:1198] (1/2) Epoch 50, batch 800, loss[loss=0.1758, simple_loss=0.2336, pruned_loss=0.04227, ctc_loss=0.09696, cr_loss=0.3505, over 34478.00 frames. ], tot_loss[loss=0.1973, simple_loss=0.257, pruned_loss=0.0502, ctc_loss=0.1098, cr_loss=0.3808, over 6659438.02 frames. ], batch size: 85, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:26:44,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=890129.3333333334, ans=0.2 2024-09-20 11:26:50,638 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=890129.3333333334, ans=0.1 2024-09-20 11:27:16,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=890222.6666666666, ans=0.0 2024-09-20 11:27:33,847 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.52 vs. limit=15.0 2024-09-20 11:27:44,841 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=890316.0, ans=0.0 2024-09-20 11:27:47,283 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=15.0 2024-09-20 11:27:47,343 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.71 vs. limit=12.0 2024-09-20 11:28:02,580 INFO [train.py:1198] (1/2) Epoch 50, batch 850, loss[loss=0.2019, simple_loss=0.2677, pruned_loss=0.04943, ctc_loss=0.1101, cr_loss=0.3836, over 34362.00 frames. ], tot_loss[loss=0.1971, simple_loss=0.257, pruned_loss=0.05003, ctc_loss=0.1094, cr_loss=0.38, over 6691058.05 frames. ], batch size: 103, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:28:06,041 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=890362.6666666666, ans=0.05 2024-09-20 11:28:19,440 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=890409.3333333334, ans=0.1 2024-09-20 11:28:20,915 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=890409.3333333334, ans=0.2 2024-09-20 11:28:37,249 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=890456.0, ans=0.1 2024-09-20 11:28:40,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=890456.0, ans=0.125 2024-09-20 11:28:53,758 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=890502.6666666666, ans=0.0 2024-09-20 11:29:08,336 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.172e+02 2.742e+02 3.196e+02 4.023e+02 6.423e+02, threshold=6.392e+02, percent-clipped=4.0 2024-09-20 11:29:12,819 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2024-09-20 11:29:24,688 INFO [train.py:1198] (1/2) Epoch 50, batch 900, loss[loss=0.1804, simple_loss=0.2378, pruned_loss=0.04483, ctc_loss=0.09791, cr_loss=0.3441, over 34473.00 frames. ], tot_loss[loss=0.1973, simple_loss=0.2572, pruned_loss=0.05012, ctc_loss=0.1096, cr_loss=0.3804, over 6696038.41 frames. ], batch size: 85, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:29:53,355 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=890642.6666666666, ans=0.2 2024-09-20 11:30:05,786 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=12.0 2024-09-20 11:30:06,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=890689.3333333334, ans=0.1 2024-09-20 11:30:09,993 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=890689.3333333334, ans=0.1 2024-09-20 11:30:23,589 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.61 vs. limit=15.0 2024-09-20 11:30:31,629 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=890782.6666666666, ans=0.125 2024-09-20 11:30:49,133 INFO [train.py:1198] (1/2) Epoch 50, batch 950, loss[loss=0.1858, simple_loss=0.243, pruned_loss=0.04704, ctc_loss=0.1021, cr_loss=0.3532, over 34670.00 frames. ], tot_loss[loss=0.1975, simple_loss=0.2573, pruned_loss=0.05027, ctc_loss=0.1098, cr_loss=0.3812, over 6700080.28 frames. ], batch size: 87, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:30:55,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=890829.3333333334, ans=0.0 2024-09-20 11:31:15,879 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=890876.0, ans=0.0 2024-09-20 11:31:39,630 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.72 vs. limit=15.0 2024-09-20 11:31:52,244 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:31:53,903 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:31:56,811 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.251e+02 2.685e+02 3.327e+02 3.982e+02 7.116e+02, threshold=6.653e+02, percent-clipped=1.0 2024-09-20 11:32:13,079 INFO [train.py:1198] (1/2) Epoch 50, batch 1000, loss[loss=0.1872, simple_loss=0.2467, pruned_loss=0.04665, ctc_loss=0.1009, cr_loss=0.3532, over 34479.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.258, pruned_loss=0.05058, ctc_loss=0.1104, cr_loss=0.3826, over 6694143.62 frames. ], batch size: 90, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:32:13,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=891062.6666666666, ans=0.125 2024-09-20 11:32:15,498 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=22.5 2024-09-20 11:32:25,440 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.87 vs. limit=15.0 2024-09-20 11:32:43,556 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.74 vs. limit=15.0 2024-09-20 11:32:54,810 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.bypass.scale_min, batch_count=891156.0, ans=0.2 2024-09-20 11:33:22,378 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=891249.3333333334, ans=0.07 2024-09-20 11:33:37,160 INFO [train.py:1198] (1/2) Epoch 50, batch 1050, loss[loss=0.2164, simple_loss=0.2749, pruned_loss=0.05811, ctc_loss=0.1237, cr_loss=0.4206, over 34560.00 frames. ], tot_loss[loss=0.1977, simple_loss=0.2573, pruned_loss=0.05042, ctc_loss=0.11, cr_loss=0.3816, over 6702528.30 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:33:40,861 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=891296.0, ans=0.125 2024-09-20 11:33:52,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=891342.6666666666, ans=0.1 2024-09-20 11:33:57,913 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=891342.6666666666, ans=0.0 2024-09-20 11:33:59,965 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=22.5 2024-09-20 11:34:14,474 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=891389.3333333334, ans=0.125 2024-09-20 11:34:26,448 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.11 vs. limit=10.0 2024-09-20 11:34:27,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=891436.0, ans=0.125 2024-09-20 11:34:43,449 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.186e+02 2.569e+02 2.907e+02 3.334e+02 6.540e+02, threshold=5.815e+02, percent-clipped=0.0 2024-09-20 11:34:48,763 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=891482.6666666666, ans=0.2 2024-09-20 11:34:57,239 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=891482.6666666666, ans=0.2 2024-09-20 11:35:00,221 INFO [train.py:1198] (1/2) Epoch 50, batch 1100, loss[loss=0.1997, simple_loss=0.2574, pruned_loss=0.05194, ctc_loss=0.1134, cr_loss=0.3849, over 34736.00 frames. ], tot_loss[loss=0.1973, simple_loss=0.257, pruned_loss=0.05025, ctc_loss=0.1097, cr_loss=0.3805, over 6715620.45 frames. ], batch size: 92, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:35:16,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=768, metric=6.39 vs. limit=12.0 2024-09-20 11:35:25,167 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=891576.0, ans=0.125 2024-09-20 11:35:33,995 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=7.82 vs. limit=15.0 2024-09-20 11:35:51,551 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=891669.3333333334, ans=0.0 2024-09-20 11:35:59,756 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=891669.3333333334, ans=0.2 2024-09-20 11:36:09,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=891716.0, ans=0.125 2024-09-20 11:36:24,454 INFO [train.py:1198] (1/2) Epoch 50, batch 1150, loss[loss=0.186, simple_loss=0.2485, pruned_loss=0.04458, ctc_loss=0.1003, cr_loss=0.3568, over 34352.00 frames. ], tot_loss[loss=0.1975, simple_loss=0.257, pruned_loss=0.05039, ctc_loss=0.1099, cr_loss=0.3811, over 6714320.88 frames. ], batch size: 91, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:36:29,768 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:36:54,651 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=15.0 2024-09-20 11:37:32,542 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.219e+02 2.500e+02 2.834e+02 3.291e+02 5.068e+02, threshold=5.669e+02, percent-clipped=0.0 2024-09-20 11:37:48,837 INFO [train.py:1198] (1/2) Epoch 50, batch 1200, loss[loss=0.2029, simple_loss=0.2698, pruned_loss=0.04908, ctc_loss=0.1103, cr_loss=0.3956, over 34564.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2579, pruned_loss=0.05062, ctc_loss=0.1104, cr_loss=0.3825, over 6708220.98 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:39:11,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.const_attention_rate, batch_count=892229.3333333334, ans=0.025 2024-09-20 11:39:12,899 INFO [train.py:1198] (1/2) Epoch 50, batch 1250, loss[loss=0.2146, simple_loss=0.276, pruned_loss=0.05617, ctc_loss=0.123, cr_loss=0.4075, over 34361.00 frames. ], tot_loss[loss=0.1987, simple_loss=0.2583, pruned_loss=0.05081, ctc_loss=0.1107, cr_loss=0.3828, over 6741339.36 frames. ], batch size: 107, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 11:39:23,862 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2024-09-20 11:39:32,204 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=892276.0, ans=0.1 2024-09-20 11:39:37,990 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.38 vs. limit=22.5 2024-09-20 11:39:41,970 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=892276.0, ans=0.125 2024-09-20 11:40:21,106 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.258e+02 2.655e+02 3.141e+02 3.635e+02 7.102e+02, threshold=6.283e+02, percent-clipped=2.0 2024-09-20 11:40:21,496 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.max_abs, batch_count=892416.0, ans=10.0 2024-09-20 11:40:30,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=14.96 vs. limit=15.0 2024-09-20 11:40:35,696 INFO [train.py:1198] (1/2) Epoch 50, batch 1300, loss[loss=0.2008, simple_loss=0.2649, pruned_loss=0.04948, ctc_loss=0.1116, cr_loss=0.3865, over 32927.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.258, pruned_loss=0.05079, ctc_loss=0.1107, cr_loss=0.3829, over 6744657.54 frames. ], batch size: 130, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 11:40:40,952 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=892462.6666666666, ans=0.125 2024-09-20 11:40:44,185 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=892462.6666666666, ans=0.0 2024-09-20 11:40:59,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=892509.3333333334, ans=0.125 2024-09-20 11:41:04,177 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:41:13,368 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=5.34 vs. limit=15.0 2024-09-20 11:41:15,996 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=892556.0, ans=0.2 2024-09-20 11:41:24,773 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=768, metric=4.87 vs. limit=12.0 2024-09-20 11:41:47,465 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=892649.3333333334, ans=0.04949747468305833 2024-09-20 11:42:00,133 INFO [train.py:1198] (1/2) Epoch 50, batch 1350, loss[loss=0.1935, simple_loss=0.2549, pruned_loss=0.04759, ctc_loss=0.1064, cr_loss=0.3915, over 34569.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.2574, pruned_loss=0.05054, ctc_loss=0.1102, cr_loss=0.3817, over 6764286.90 frames. ], batch size: 94, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 11:42:42,919 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=892789.3333333334, ans=0.0 2024-09-20 11:43:04,605 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=892836.0, ans=0.09899494936611666 2024-09-20 11:43:09,239 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.189e+02 2.704e+02 3.335e+02 4.605e+02 7.373e+02, threshold=6.670e+02, percent-clipped=4.0 2024-09-20 11:43:17,735 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=892882.6666666666, ans=0.125 2024-09-20 11:43:24,059 INFO [train.py:1198] (1/2) Epoch 50, batch 1400, loss[loss=0.1729, simple_loss=0.2271, pruned_loss=0.04317, ctc_loss=0.09225, cr_loss=0.3462, over 34303.00 frames. ], tot_loss[loss=0.1977, simple_loss=0.2574, pruned_loss=0.05043, ctc_loss=0.11, cr_loss=0.381, over 6776884.04 frames. ], batch size: 80, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 11:43:24,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=892929.3333333334, ans=0.2 2024-09-20 11:43:52,502 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=768, metric=4.73 vs. limit=15.0 2024-09-20 11:44:10,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=893022.6666666666, ans=0.0 2024-09-20 11:44:20,191 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=893069.3333333334, ans=0.125 2024-09-20 11:44:43,250 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=893116.0, ans=0.125 2024-09-20 11:44:46,211 INFO [train.py:1198] (1/2) Epoch 50, batch 1450, loss[loss=0.2154, simple_loss=0.2761, pruned_loss=0.05726, ctc_loss=0.1183, cr_loss=0.4164, over 34468.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2577, pruned_loss=0.05036, ctc_loss=0.1098, cr_loss=0.3807, over 6772838.05 frames. ], batch size: 110, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 11:45:05,596 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=893209.3333333334, ans=0.1 2024-09-20 11:45:29,963 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=893256.0, ans=0.0 2024-09-20 11:45:38,397 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.58 vs. limit=15.0 2024-09-20 11:45:49,564 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=12.0 2024-09-20 11:45:50,918 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=893302.6666666666, ans=0.125 2024-09-20 11:45:55,518 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.202e+02 2.540e+02 2.811e+02 3.429e+02 5.661e+02, threshold=5.621e+02, percent-clipped=0.0 2024-09-20 11:46:10,563 INFO [train.py:1198] (1/2) Epoch 50, batch 1500, loss[loss=0.2143, simple_loss=0.2759, pruned_loss=0.056, ctc_loss=0.1212, cr_loss=0.4147, over 34460.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.258, pruned_loss=0.0503, ctc_loss=0.1097, cr_loss=0.3806, over 6773213.64 frames. ], batch size: 100, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 11:46:10,884 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=893396.0, ans=0.2 2024-09-20 11:46:14,223 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=893396.0, ans=0.1 2024-09-20 11:46:29,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=893442.6666666666, ans=0.2 2024-09-20 11:46:49,560 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.87 vs. limit=6.0 2024-09-20 11:46:59,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=893489.3333333334, ans=0.125 2024-09-20 11:47:08,177 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.36 vs. limit=15.0 2024-09-20 11:47:10,964 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=893536.0, ans=0.0 2024-09-20 11:47:34,987 INFO [train.py:1198] (1/2) Epoch 50, batch 1550, loss[loss=0.2178, simple_loss=0.2761, pruned_loss=0.0592, ctc_loss=0.1236, cr_loss=0.4119, over 34429.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.2579, pruned_loss=0.0504, ctc_loss=0.1099, cr_loss=0.381, over 6746392.05 frames. ], batch size: 105, lr: 2.38e-03, grad_scale: 16.0 2024-09-20 11:47:41,920 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.const_attention_rate, batch_count=893629.3333333334, ans=0.025 2024-09-20 11:48:05,524 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=3.85 vs. limit=6.0 2024-09-20 11:48:16,412 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=893722.6666666666, ans=0.125 2024-09-20 11:48:27,127 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=22.5 2024-09-20 11:48:34,672 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff3_skip_rate, batch_count=893769.3333333334, ans=0.0 2024-09-20 11:48:43,917 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=768, metric=7.27 vs. limit=15.0 2024-09-20 11:48:44,591 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.250e+02 2.660e+02 3.165e+02 3.626e+02 6.070e+02, threshold=6.331e+02, percent-clipped=1.0 2024-09-20 11:48:47,405 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=15.0 2024-09-20 11:48:56,645 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=893816.0, ans=0.1 2024-09-20 11:48:59,366 INFO [train.py:1198] (1/2) Epoch 50, batch 1600, loss[loss=0.2, simple_loss=0.2629, pruned_loss=0.04994, ctc_loss=0.1082, cr_loss=0.3909, over 34580.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2576, pruned_loss=0.05042, ctc_loss=0.1099, cr_loss=0.3809, over 6724573.02 frames. ], batch size: 99, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:49:01,270 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=893862.6666666666, ans=0.125 2024-09-20 11:49:15,106 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=768, metric=3.63 vs. limit=15.0 2024-09-20 11:49:17,817 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer2.prob, batch_count=893909.3333333334, ans=0.125 2024-09-20 11:49:22,482 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=893909.3333333334, ans=0.1 2024-09-20 11:49:37,428 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=893956.0, ans=0.2 2024-09-20 11:49:45,550 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=893956.0, ans=0.125 2024-09-20 11:49:50,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=894002.6666666666, ans=0.125 2024-09-20 11:50:02,266 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=894002.6666666666, ans=0.0 2024-09-20 11:50:03,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=894002.6666666666, ans=0.0 2024-09-20 11:50:13,544 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=894049.3333333334, ans=0.125 2024-09-20 11:50:18,552 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.const_attention_rate, batch_count=894049.3333333334, ans=0.025 2024-09-20 11:50:22,996 INFO [train.py:1198] (1/2) Epoch 50, batch 1650, loss[loss=0.2125, simple_loss=0.2756, pruned_loss=0.05466, ctc_loss=0.12, cr_loss=0.4036, over 34399.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.2576, pruned_loss=0.05043, ctc_loss=0.11, cr_loss=0.3811, over 6718862.49 frames. ], batch size: 103, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:50:25,006 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=894096.0, ans=0.1 2024-09-20 11:50:29,969 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=894096.0, ans=0.05 2024-09-20 11:51:12,653 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=894236.0, ans=0.2 2024-09-20 11:51:16,138 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=894236.0, ans=0.125 2024-09-20 11:51:22,694 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=894236.0, ans=0.0 2024-09-20 11:51:30,472 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.242e+02 2.550e+02 3.047e+02 3.863e+02 7.871e+02, threshold=6.094e+02, percent-clipped=2.0 2024-09-20 11:51:35,661 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.const_attention_rate, batch_count=894282.6666666666, ans=0.025 2024-09-20 11:51:42,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=894282.6666666666, ans=0.125 2024-09-20 11:51:45,042 INFO [train.py:1198] (1/2) Epoch 50, batch 1700, loss[loss=0.1625, simple_loss=0.2213, pruned_loss=0.03685, ctc_loss=0.08517, cr_loss=0.3241, over 34286.00 frames. ], tot_loss[loss=0.1975, simple_loss=0.2575, pruned_loss=0.05016, ctc_loss=0.1096, cr_loss=0.3806, over 6744530.31 frames. ], batch size: 80, lr: 2.38e-03, grad_scale: 32.0 2024-09-20 11:52:14,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=894376.0, ans=0.07 2024-09-20 11:53:06,304 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=894516.0, ans=0.125 2024-09-20 11:53:08,733 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=15.0 2024-09-20 11:53:09,278 INFO [train.py:1198] (1/2) Epoch 50, batch 1750, loss[loss=0.1767, simple_loss=0.231, pruned_loss=0.04452, ctc_loss=0.0977, cr_loss=0.3454, over 34171.00 frames. ], tot_loss[loss=0.1972, simple_loss=0.2571, pruned_loss=0.05009, ctc_loss=0.1094, cr_loss=0.3801, over 6755190.56 frames. ], batch size: 78, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 11:53:16,302 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=894562.6666666666, ans=0.0 2024-09-20 11:53:29,354 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer1.prob, batch_count=894609.3333333334, ans=0.125 2024-09-20 11:53:43,141 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=4.87 vs. limit=15.0 2024-09-20 11:54:03,783 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=894702.6666666666, ans=0.1 2024-09-20 11:54:15,735 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=5.27 vs. limit=12.0 2024-09-20 11:54:18,314 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.041e+02 2.689e+02 3.148e+02 4.054e+02 7.945e+02, threshold=6.295e+02, percent-clipped=2.0 2024-09-20 11:54:32,866 INFO [train.py:1198] (1/2) Epoch 50, batch 1800, loss[loss=0.2146, simple_loss=0.2767, pruned_loss=0.05572, ctc_loss=0.1214, cr_loss=0.4192, over 34693.00 frames. ], tot_loss[loss=0.1973, simple_loss=0.2572, pruned_loss=0.05014, ctc_loss=0.1095, cr_loss=0.3807, over 6758257.95 frames. ], batch size: 97, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 11:54:34,905 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=894796.0, ans=10.0 2024-09-20 11:54:41,898 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=22.5 2024-09-20 11:54:57,955 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:54:59,360 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=894842.6666666666, ans=0.1 2024-09-20 11:55:03,455 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=3.59 vs. limit=12.0 2024-09-20 11:55:12,676 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=894889.3333333334, ans=0.125 2024-09-20 11:55:29,061 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=894936.0, ans=0.0 2024-09-20 11:55:29,142 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.hidden_balancer.prob, batch_count=894936.0, ans=0.125 2024-09-20 11:55:30,659 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=894936.0, ans=0.2 2024-09-20 11:55:34,346 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=22.5 2024-09-20 11:55:41,988 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=894982.6666666666, ans=0.2 2024-09-20 11:55:47,508 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.32 vs. limit=15.0 2024-09-20 11:55:53,970 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 11:55:55,143 INFO [train.py:1198] (1/2) Epoch 50, batch 1850, loss[loss=0.2131, simple_loss=0.2746, pruned_loss=0.05573, ctc_loss=0.1193, cr_loss=0.4062, over 34477.00 frames. ], tot_loss[loss=0.1974, simple_loss=0.2572, pruned_loss=0.05023, ctc_loss=0.1097, cr_loss=0.3814, over 6764973.37 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 11:55:57,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.28 vs. limit=15.0 2024-09-20 11:56:05,810 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.21 vs. limit=15.0 2024-09-20 11:56:34,274 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=895122.6666666666, ans=0.1 2024-09-20 11:56:46,068 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=895169.3333333334, ans=0.2 2024-09-20 11:56:51,164 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=8.11 vs. limit=15.0 2024-09-20 11:57:06,815 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.292e+02 2.736e+02 3.316e+02 4.465e+02 8.257e+02, threshold=6.631e+02, percent-clipped=2.0 2024-09-20 11:57:19,714 INFO [train.py:1198] (1/2) Epoch 50, batch 1900, loss[loss=0.2007, simple_loss=0.2631, pruned_loss=0.05077, ctc_loss=0.1076, cr_loss=0.3779, over 34410.00 frames. ], tot_loss[loss=0.1982, simple_loss=0.2581, pruned_loss=0.05047, ctc_loss=0.1101, cr_loss=0.3819, over 6773851.49 frames. ], batch size: 103, lr: 2.37e-03, grad_scale: 16.0 2024-09-20 11:57:36,565 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=895309.3333333334, ans=0.125 2024-09-20 11:57:46,283 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=895309.3333333334, ans=0.125 2024-09-20 11:58:09,791 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=895402.6666666666, ans=0.0 2024-09-20 11:58:09,931 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=895402.6666666666, ans=0.125 2024-09-20 11:58:23,110 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=895402.6666666666, ans=0.05 2024-09-20 11:58:32,901 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=895449.3333333334, ans=0.125 2024-09-20 11:58:39,779 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=2.03 vs. limit=15.0 2024-09-20 11:58:43,925 INFO [train.py:1198] (1/2) Epoch 50, batch 1950, loss[loss=0.1834, simple_loss=0.2422, pruned_loss=0.04501, ctc_loss=0.1007, cr_loss=0.3592, over 34352.00 frames. ], tot_loss[loss=0.1992, simple_loss=0.2593, pruned_loss=0.05082, ctc_loss=0.1108, cr_loss=0.3838, over 6790295.81 frames. ], batch size: 91, lr: 2.37e-03, grad_scale: 16.0 2024-09-20 11:58:46,003 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=895496.0, ans=0.0 2024-09-20 11:59:03,024 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.89 vs. limit=15.0 2024-09-20 11:59:07,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=895542.6666666666, ans=0.2 2024-09-20 11:59:23,802 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.self_attn_weights.pos_emb_skip_rate, batch_count=895589.3333333334, ans=0.0 2024-09-20 11:59:28,804 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=895589.3333333334, ans=0.0 2024-09-20 11:59:53,055 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.212e+02 2.549e+02 2.734e+02 3.078e+02 4.801e+02, threshold=5.468e+02, percent-clipped=0.0 2024-09-20 11:59:59,916 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=895682.6666666666, ans=0.1 2024-09-20 12:00:08,480 INFO [train.py:1198] (1/2) Epoch 50, batch 2000, loss[loss=0.1738, simple_loss=0.2293, pruned_loss=0.04289, ctc_loss=0.09385, cr_loss=0.3423, over 34165.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2592, pruned_loss=0.05077, ctc_loss=0.1107, cr_loss=0.3833, over 6766770.99 frames. ], batch size: 78, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:00:33,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=895776.0, ans=0.125 2024-09-20 12:00:35,343 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=895776.0, ans=0.0 2024-09-20 12:00:40,443 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:00:43,776 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=895822.6666666666, ans=0.0 2024-09-20 12:00:45,334 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=895822.6666666666, ans=0.125 2024-09-20 12:01:13,327 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=895869.3333333334, ans=0.0 2024-09-20 12:01:26,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=895916.0, ans=0.125 2024-09-20 12:01:32,819 INFO [train.py:1198] (1/2) Epoch 50, batch 2050, loss[loss=0.1784, simple_loss=0.2354, pruned_loss=0.0442, ctc_loss=0.09641, cr_loss=0.3445, over 34467.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2584, pruned_loss=0.05048, ctc_loss=0.1102, cr_loss=0.382, over 6757530.07 frames. ], batch size: 82, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:01:40,429 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.75 vs. limit=10.0 2024-09-20 12:01:57,692 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=896009.3333333334, ans=0.125 2024-09-20 12:02:17,419 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=896056.0, ans=0.125 2024-09-20 12:02:18,975 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=896056.0, ans=0.2 2024-09-20 12:02:23,675 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=896056.0, ans=0.0 2024-09-20 12:02:40,072 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:02:40,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=896102.6666666666, ans=0.2 2024-09-20 12:02:42,304 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=768, metric=6.95 vs. limit=15.0 2024-09-20 12:02:43,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=896149.3333333334, ans=0.125 2024-09-20 12:02:45,002 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=896149.3333333334, ans=0.125 2024-09-20 12:02:48,009 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.193e+02 2.618e+02 3.347e+02 4.160e+02 8.118e+02, threshold=6.693e+02, percent-clipped=8.0 2024-09-20 12:02:55,493 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.conv_module1.whiten, num_groups=1, num_channels=768, metric=7.16 vs. limit=15.0 2024-09-20 12:03:01,219 INFO [train.py:1198] (1/2) Epoch 50, batch 2100, loss[loss=0.2025, simple_loss=0.265, pruned_loss=0.05088, ctc_loss=0.1139, cr_loss=0.3881, over 34529.00 frames. ], tot_loss[loss=0.198, simple_loss=0.258, pruned_loss=0.05036, ctc_loss=0.11, cr_loss=0.3818, over 6770181.86 frames. ], batch size: 94, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:03:03,193 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_proj.dropout_p, batch_count=896196.0, ans=0.1 2024-09-20 12:03:27,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=896242.6666666666, ans=0.125 2024-09-20 12:03:27,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=896242.6666666666, ans=0.1 2024-09-20 12:03:47,182 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=896289.3333333334, ans=0.07 2024-09-20 12:03:52,163 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=896336.0, ans=0.125 2024-09-20 12:04:25,192 INFO [train.py:1198] (1/2) Epoch 50, batch 2150, loss[loss=0.1849, simple_loss=0.248, pruned_loss=0.04401, ctc_loss=0.09956, cr_loss=0.3448, over 34353.00 frames. ], tot_loss[loss=0.1975, simple_loss=0.2574, pruned_loss=0.05019, ctc_loss=0.1096, cr_loss=0.3806, over 6787241.24 frames. ], batch size: 91, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:04:25,477 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=896429.3333333334, ans=10.0 2024-09-20 12:04:34,073 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=896429.3333333334, ans=0.1 2024-09-20 12:04:35,639 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.attention_skip_rate, batch_count=896429.3333333334, ans=0.0 2024-09-20 12:04:49,243 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=896476.0, ans=22.5 2024-09-20 12:05:36,433 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.243e+02 2.548e+02 2.903e+02 3.607e+02 5.889e+02, threshold=5.807e+02, percent-clipped=0.0 2024-09-20 12:05:49,551 INFO [train.py:1198] (1/2) Epoch 50, batch 2200, loss[loss=0.2119, simple_loss=0.277, pruned_loss=0.05382, ctc_loss=0.1162, cr_loss=0.3988, over 34467.00 frames. ], tot_loss[loss=0.1974, simple_loss=0.2573, pruned_loss=0.05021, ctc_loss=0.1096, cr_loss=0.3804, over 6784486.30 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:05:53,988 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=7.23 vs. limit=15.0 2024-09-20 12:05:58,333 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=896662.6666666666, ans=0.2 2024-09-20 12:06:14,587 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=896709.3333333334, ans=0.2 2024-09-20 12:06:33,150 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.85 vs. limit=10.0 2024-09-20 12:06:33,231 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=8.84 vs. limit=15.0 2024-09-20 12:07:06,946 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=896849.3333333334, ans=0.125 2024-09-20 12:07:11,693 INFO [train.py:1198] (1/2) Epoch 50, batch 2250, loss[loss=0.2022, simple_loss=0.2616, pruned_loss=0.05251, ctc_loss=0.11, cr_loss=0.3922, over 34407.00 frames. ], tot_loss[loss=0.1973, simple_loss=0.2571, pruned_loss=0.05016, ctc_loss=0.1095, cr_loss=0.3803, over 6780940.45 frames. ], batch size: 95, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:07:42,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=896942.6666666666, ans=0.125 2024-09-20 12:08:01,491 INFO [scaling.py:1024] (1/2) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.49 vs. limit=8.0 2024-09-20 12:08:22,793 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.173e+02 2.568e+02 2.998e+02 3.623e+02 5.714e+02, threshold=5.995e+02, percent-clipped=0.0 2024-09-20 12:08:23,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.const_attention_rate, batch_count=897082.6666666666, ans=0.025 2024-09-20 12:08:26,374 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=897082.6666666666, ans=0.1 2024-09-20 12:08:36,057 INFO [train.py:1198] (1/2) Epoch 50, batch 2300, loss[loss=0.1803, simple_loss=0.2429, pruned_loss=0.04273, ctc_loss=0.09406, cr_loss=0.3337, over 34270.00 frames. ], tot_loss[loss=0.1962, simple_loss=0.2561, pruned_loss=0.04973, ctc_loss=0.1087, cr_loss=0.3784, over 6766303.75 frames. ], batch size: 83, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:09:01,084 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:09:02,891 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=897176.0, ans=0.125 2024-09-20 12:09:12,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=897222.6666666666, ans=0.125 2024-09-20 12:09:14,467 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module1.balancer1.prob, batch_count=897222.6666666666, ans=0.125 2024-09-20 12:09:32,530 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=897269.3333333334, ans=0.125 2024-09-20 12:09:35,885 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=897269.3333333334, ans=0.0 2024-09-20 12:09:52,422 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=897316.0, ans=0.0 2024-09-20 12:10:00,154 INFO [train.py:1198] (1/2) Epoch 50, batch 2350, loss[loss=0.2014, simple_loss=0.2639, pruned_loss=0.05075, ctc_loss=0.1097, cr_loss=0.3862, over 34706.00 frames. ], tot_loss[loss=0.1969, simple_loss=0.2566, pruned_loss=0.05008, ctc_loss=0.1093, cr_loss=0.3794, over 6772186.50 frames. ], batch size: 97, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:10:15,218 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.906e-01 2024-09-20 12:10:18,287 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=897409.3333333334, ans=0.5 2024-09-20 12:10:28,252 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=897409.3333333334, ans=0.125 2024-09-20 12:10:48,017 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_skip_rate, batch_count=897502.6666666666, ans=0.0 2024-09-20 12:11:08,957 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.134e+02 2.501e+02 2.748e+02 3.516e+02 5.827e+02, threshold=5.497e+02, percent-clipped=0.0 2024-09-20 12:11:10,848 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=897549.3333333334, ans=0.125 2024-09-20 12:11:10,907 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=897549.3333333334, ans=0.125 2024-09-20 12:11:21,105 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=897596.0, ans=0.125 2024-09-20 12:11:22,305 INFO [train.py:1198] (1/2) Epoch 50, batch 2400, loss[loss=0.1817, simple_loss=0.2394, pruned_loss=0.04459, ctc_loss=0.1006, cr_loss=0.3668, over 34565.00 frames. ], tot_loss[loss=0.197, simple_loss=0.2568, pruned_loss=0.05005, ctc_loss=0.1092, cr_loss=0.3794, over 6776339.51 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:11:53,000 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=897642.6666666666, ans=0.125 2024-09-20 12:11:54,730 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.skip_rate, batch_count=897642.6666666666, ans=0.07 2024-09-20 12:12:07,794 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=897689.3333333334, ans=0.125 2024-09-20 12:12:19,259 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.4.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:12:27,858 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=7.82 vs. limit=15.0 2024-09-20 12:12:40,764 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=897782.6666666666, ans=0.0 2024-09-20 12:12:45,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=897782.6666666666, ans=0.125 2024-09-20 12:12:48,714 INFO [train.py:1198] (1/2) Epoch 50, batch 2450, loss[loss=0.1978, simple_loss=0.2579, pruned_loss=0.05049, ctc_loss=0.1097, cr_loss=0.3685, over 34430.00 frames. ], tot_loss[loss=0.1981, simple_loss=0.2579, pruned_loss=0.05052, ctc_loss=0.1101, cr_loss=0.3814, over 6751538.39 frames. ], batch size: 95, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:12:49,849 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.69 vs. limit=12.0 2024-09-20 12:12:57,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=897829.3333333334, ans=0.0 2024-09-20 12:13:04,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=897876.0, ans=0.0 2024-09-20 12:13:05,866 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.ff3_skip_rate, batch_count=897876.0, ans=0.0 2024-09-20 12:13:12,386 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:13:28,814 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=897922.6666666666, ans=0.125 2024-09-20 12:13:29,321 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=6.33 vs. limit=15.0 2024-09-20 12:13:45,782 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=9.56 vs. limit=15.0 2024-09-20 12:13:50,118 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=897969.3333333334, ans=0.0 2024-09-20 12:13:55,009 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=898016.0, ans=0.125 2024-09-20 12:13:57,933 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.183e+02 2.643e+02 3.005e+02 3.817e+02 6.826e+02, threshold=6.011e+02, percent-clipped=6.0 2024-09-20 12:14:04,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=898016.0, ans=0.0 2024-09-20 12:14:11,128 INFO [train.py:1198] (1/2) Epoch 50, batch 2500, loss[loss=0.2022, simple_loss=0.265, pruned_loss=0.0509, ctc_loss=0.1117, cr_loss=0.3843, over 34451.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.258, pruned_loss=0.05067, ctc_loss=0.1104, cr_loss=0.382, over 6762536.22 frames. ], batch size: 100, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:14:13,486 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2024-09-20 12:14:19,827 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=898062.6666666666, ans=0.0 2024-09-20 12:14:41,282 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=898109.3333333334, ans=0.2 2024-09-20 12:14:48,414 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=15.0 2024-09-20 12:14:52,755 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=898156.0, ans=0.125 2024-09-20 12:15:09,174 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass.scale_min, batch_count=898202.6666666666, ans=0.2 2024-09-20 12:15:17,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=898249.3333333334, ans=0.0 2024-09-20 12:15:31,210 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=898249.3333333334, ans=0.125 2024-09-20 12:15:35,751 INFO [train.py:1198] (1/2) Epoch 50, batch 2550, loss[loss=0.1855, simple_loss=0.2418, pruned_loss=0.04722, ctc_loss=0.1033, cr_loss=0.3499, over 34115.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.258, pruned_loss=0.05061, ctc_loss=0.1103, cr_loss=0.3821, over 6766543.49 frames. ], batch size: 78, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:15:55,463 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=898342.6666666666, ans=0.125 2024-09-20 12:16:26,702 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:16:40,197 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.const_attention_rate, batch_count=898436.0, ans=0.025 2024-09-20 12:16:46,474 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.128e+02 2.524e+02 2.910e+02 3.440e+02 1.081e+03, threshold=5.819e+02, percent-clipped=2.0 2024-09-20 12:16:59,518 INFO [train.py:1198] (1/2) Epoch 50, batch 2600, loss[loss=0.2049, simple_loss=0.2592, pruned_loss=0.05545, ctc_loss=0.1191, cr_loss=0.4003, over 34356.00 frames. ], tot_loss[loss=0.1987, simple_loss=0.2583, pruned_loss=0.0508, ctc_loss=0.1106, cr_loss=0.383, over 6762722.20 frames. ], batch size: 91, lr: 2.37e-03, grad_scale: 16.0 2024-09-20 12:16:59,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=898529.3333333334, ans=0.0 2024-09-20 12:17:11,290 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=898529.3333333334, ans=0.0 2024-09-20 12:17:11,295 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=898529.3333333334, ans=0.2 2024-09-20 12:17:27,429 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.balancer2.prob, batch_count=898576.0, ans=0.125 2024-09-20 12:18:00,241 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=898669.3333333334, ans=0.0 2024-09-20 12:18:09,969 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.2.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:18:20,894 INFO [train.py:1198] (1/2) Epoch 50, batch 2650, loss[loss=0.2085, simple_loss=0.2678, pruned_loss=0.05505, ctc_loss=0.1174, cr_loss=0.3908, over 34184.00 frames. ], tot_loss[loss=0.1988, simple_loss=0.2587, pruned_loss=0.05075, ctc_loss=0.1106, cr_loss=0.3835, over 6769007.71 frames. ], batch size: 117, lr: 2.37e-03, grad_scale: 16.0 2024-09-20 12:18:27,736 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:18:34,772 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=898762.6666666666, ans=0.0 2024-09-20 12:18:41,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_module2.balancer2.prob, batch_count=898809.3333333334, ans=0.125 2024-09-20 12:18:52,757 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=898856.0, ans=0.1 2024-09-20 12:18:54,376 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=898856.0, ans=0.125 2024-09-20 12:19:29,600 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=898949.3333333334, ans=0.0 2024-09-20 12:19:34,215 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.118e+02 2.608e+02 2.915e+02 3.598e+02 6.737e+02, threshold=5.830e+02, percent-clipped=1.0 2024-09-20 12:19:39,459 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=898949.3333333334, ans=0.125 2024-09-20 12:19:41,103 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.bypass_mid.scale_min, batch_count=898949.3333333334, ans=0.2 2024-09-20 12:19:45,675 INFO [train.py:1198] (1/2) Epoch 50, batch 2700, loss[loss=0.1954, simple_loss=0.2581, pruned_loss=0.04805, ctc_loss=0.1071, cr_loss=0.3786, over 34659.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.259, pruned_loss=0.05087, ctc_loss=0.1108, cr_loss=0.3842, over 6764023.23 frames. ], batch size: 102, lr: 2.37e-03, grad_scale: 16.0 2024-09-20 12:20:18,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=899089.3333333334, ans=0.0 2024-09-20 12:20:21,183 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.72 vs. limit=15.0 2024-09-20 12:20:47,323 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=899136.0, ans=0.2 2024-09-20 12:20:56,940 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=899182.6666666666, ans=0.1 2024-09-20 12:21:08,671 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:21:09,865 INFO [train.py:1198] (1/2) Epoch 50, batch 2750, loss[loss=0.188, simple_loss=0.2493, pruned_loss=0.04622, ctc_loss=0.101, cr_loss=0.3501, over 34614.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.2578, pruned_loss=0.05039, ctc_loss=0.11, cr_loss=0.3821, over 6760994.67 frames. ], batch size: 88, lr: 2.37e-03, grad_scale: 16.0 2024-09-20 12:21:33,480 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=899276.0, ans=0.1 2024-09-20 12:21:49,654 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=899322.6666666666, ans=0.1 2024-09-20 12:21:54,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.attention_skip_rate, batch_count=899322.6666666666, ans=0.0 2024-09-20 12:22:01,393 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=899369.3333333334, ans=0.125 2024-09-20 12:22:06,448 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.feed_forward3.hidden_balancer.prob, batch_count=899369.3333333334, ans=0.125 2024-09-20 12:22:21,221 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.178e+02 2.790e+02 3.292e+02 4.031e+02 7.314e+02, threshold=6.585e+02, percent-clipped=5.0 2024-09-20 12:22:23,465 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:22:25,065 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=899416.0, ans=0.1 2024-09-20 12:22:32,857 INFO [train.py:1198] (1/2) Epoch 50, batch 2800, loss[loss=0.2186, simple_loss=0.2748, pruned_loss=0.0599, ctc_loss=0.1302, cr_loss=0.4145, over 23217.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.258, pruned_loss=0.05059, ctc_loss=0.1104, cr_loss=0.3827, over 6739775.59 frames. ], batch size: 244, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:22:42,977 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=899462.6666666666, ans=0.125 2024-09-20 12:23:30,083 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:23:39,949 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:23:42,096 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=4.14 vs. limit=15.0 2024-09-20 12:23:48,227 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=899649.3333333334, ans=0.125 2024-09-20 12:23:57,889 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=899696.0, ans=0.0 2024-09-20 12:23:59,263 INFO [train.py:1198] (1/2) Epoch 50, batch 2850, loss[loss=0.1964, simple_loss=0.251, pruned_loss=0.05203, ctc_loss=0.1112, cr_loss=0.3879, over 34482.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.2588, pruned_loss=0.05092, ctc_loss=0.1111, cr_loss=0.3848, over 6723358.34 frames. ], batch size: 90, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:23:59,575 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=899696.0, ans=0.125 2024-09-20 12:24:11,192 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=899696.0, ans=0.0 2024-09-20 12:24:12,180 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.06 vs. limit=15.0 2024-09-20 12:24:34,392 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=899789.3333333334, ans=0.1 2024-09-20 12:24:44,547 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=899789.3333333334, ans=0.2 2024-09-20 12:24:51,899 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.17 vs. limit=10.0 2024-09-20 12:24:59,077 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=899836.0, ans=0.0 2024-09-20 12:25:00,792 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=899836.0, ans=0.05 2024-09-20 12:25:04,656 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=6.79 vs. limit=10.0 2024-09-20 12:25:10,197 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.190e+02 2.610e+02 2.926e+02 3.674e+02 7.941e+02, threshold=5.851e+02, percent-clipped=1.0 2024-09-20 12:25:13,002 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=5.12 vs. limit=15.0 2024-09-20 12:25:21,719 INFO [train.py:1198] (1/2) Epoch 50, batch 2900, loss[loss=0.1861, simple_loss=0.2485, pruned_loss=0.04511, ctc_loss=0.0993, cr_loss=0.3396, over 34533.00 frames. ], tot_loss[loss=0.1996, simple_loss=0.2596, pruned_loss=0.05102, ctc_loss=0.1113, cr_loss=0.3853, over 6754255.21 frames. ], batch size: 94, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:25:30,353 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=899929.3333333334, ans=0.0 2024-09-20 12:25:53,502 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=900022.6666666666, ans=0.125 2024-09-20 12:25:56,685 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=900022.6666666666, ans=0.025 2024-09-20 12:26:03,155 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=900022.6666666666, ans=0.0 2024-09-20 12:26:16,401 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module1.balancer2.prob, batch_count=900069.3333333334, ans=0.125 2024-09-20 12:26:32,883 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=900116.0, ans=0.025 2024-09-20 12:26:46,250 INFO [train.py:1198] (1/2) Epoch 50, batch 2950, loss[loss=0.1916, simple_loss=0.2497, pruned_loss=0.04821, ctc_loss=0.1073, cr_loss=0.3905, over 34639.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2578, pruned_loss=0.05031, ctc_loss=0.1099, cr_loss=0.3817, over 6749444.55 frames. ], batch size: 88, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:27:01,405 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.bypass_mid.scale_min, batch_count=900209.3333333334, ans=0.2 2024-09-20 12:27:03,066 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=900209.3333333334, ans=0.125 2024-09-20 12:27:21,157 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=900256.0, ans=0.0 2024-09-20 12:27:39,226 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=900302.6666666666, ans=0.2 2024-09-20 12:27:49,117 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=900302.6666666666, ans=0.125 2024-09-20 12:27:49,120 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=900302.6666666666, ans=0.125 2024-09-20 12:28:00,391 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.277e+02 2.617e+02 3.091e+02 3.976e+02 7.089e+02, threshold=6.181e+02, percent-clipped=1.0 2024-09-20 12:28:10,470 INFO [train.py:1198] (1/2) Epoch 50, batch 3000, loss[loss=0.196, simple_loss=0.2553, pruned_loss=0.04959, ctc_loss=0.1111, cr_loss=0.3829, over 34552.00 frames. ], tot_loss[loss=0.1975, simple_loss=0.2577, pruned_loss=0.05012, ctc_loss=0.1096, cr_loss=0.3808, over 6750951.89 frames. ], batch size: 94, lr: 2.37e-03, grad_scale: 16.0 2024-09-20 12:28:10,470 INFO [train.py:1221] (1/2) Computing validation loss 2024-09-20 12:28:15,108 INFO [zipformer.py:1858] (1/2) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([2.5919, 3.1325, 3.0146, 2.6594], device='cuda:1') 2024-09-20 12:28:27,330 INFO [train.py:1230] (1/2) Epoch 50, validation: loss=0.1484, simple_loss=0.2415, pruned_loss=0.02376, ctc_loss=0.03858, cr_loss=2.393e-14, over 944034.00 frames. 2024-09-20 12:28:27,330 INFO [train.py:1231] (1/2) Maximum memory allocated so far is 53607MB 2024-09-20 12:28:27,777 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=900396.0, ans=0.125 2024-09-20 12:28:41,204 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2024-09-20 12:29:10,364 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2024-09-20 12:29:11,708 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=900489.3333333334, ans=0.0 2024-09-20 12:29:31,843 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=768, metric=13.90 vs. limit=22.5 2024-09-20 12:29:33,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=900582.6666666666, ans=0.125 2024-09-20 12:29:48,739 INFO [train.py:1198] (1/2) Epoch 50, batch 3050, loss[loss=0.1854, simple_loss=0.2437, pruned_loss=0.04625, ctc_loss=0.1015, cr_loss=0.357, over 34561.00 frames. ], tot_loss[loss=0.1982, simple_loss=0.2582, pruned_loss=0.05041, ctc_loss=0.1101, cr_loss=0.382, over 6743386.72 frames. ], batch size: 89, lr: 2.37e-03, grad_scale: 16.0 2024-09-20 12:29:49,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=900629.3333333334, ans=0.2 2024-09-20 12:29:54,004 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=900629.3333333334, ans=0.0 2024-09-20 12:30:00,624 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=900629.3333333334, ans=0.0 2024-09-20 12:30:25,203 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.ff2_skip_rate, batch_count=900722.6666666666, ans=0.0 2024-09-20 12:30:37,939 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=900769.3333333334, ans=0.125 2024-09-20 12:30:59,894 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.195e+02 2.618e+02 3.247e+02 4.261e+02 7.141e+02, threshold=6.495e+02, percent-clipped=5.0 2024-09-20 12:31:00,838 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.49 vs. limit=15.0 2024-09-20 12:31:09,570 INFO [train.py:1198] (1/2) Epoch 50, batch 3100, loss[loss=0.2147, simple_loss=0.2745, pruned_loss=0.05715, ctc_loss=0.122, cr_loss=0.4081, over 34225.00 frames. ], tot_loss[loss=0.1985, simple_loss=0.2584, pruned_loss=0.0506, ctc_loss=0.1105, cr_loss=0.3829, over 6743741.12 frames. ], batch size: 117, lr: 2.37e-03, grad_scale: 16.0 2024-09-20 12:31:10,889 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2024-09-20 12:31:19,816 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=900862.6666666666, ans=0.2 2024-09-20 12:31:29,851 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=900909.3333333334, ans=0.125 2024-09-20 12:31:36,418 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=900909.3333333334, ans=0.125 2024-09-20 12:31:41,296 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=900909.3333333334, ans=0.1 2024-09-20 12:31:48,236 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=3.37 vs. limit=15.0 2024-09-20 12:31:49,403 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=900956.0, ans=0.0 2024-09-20 12:31:52,699 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=900956.0, ans=10.0 2024-09-20 12:32:34,223 INFO [train.py:1198] (1/2) Epoch 50, batch 3150, loss[loss=0.208, simple_loss=0.273, pruned_loss=0.05245, ctc_loss=0.1128, cr_loss=0.3888, over 33811.00 frames. ], tot_loss[loss=0.1987, simple_loss=0.2586, pruned_loss=0.05068, ctc_loss=0.1106, cr_loss=0.3837, over 6749090.07 frames. ], batch size: 122, lr: 2.37e-03, grad_scale: 16.0 2024-09-20 12:33:08,780 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=901189.3333333334, ans=0.2 2024-09-20 12:33:29,500 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=901236.0, ans=0.125 2024-09-20 12:33:45,068 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.237e+02 2.626e+02 3.095e+02 3.821e+02 7.128e+02, threshold=6.191e+02, percent-clipped=3.0 2024-09-20 12:33:54,744 INFO [train.py:1198] (1/2) Epoch 50, batch 3200, loss[loss=0.191, simple_loss=0.2499, pruned_loss=0.04833, ctc_loss=0.1027, cr_loss=0.372, over 34543.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.2578, pruned_loss=0.05033, ctc_loss=0.1099, cr_loss=0.3817, over 6762973.62 frames. ], batch size: 94, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:34:07,160 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.41 vs. limit=15.0 2024-09-20 12:34:16,036 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=901376.0, ans=0.0 2024-09-20 12:34:19,420 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=901376.0, ans=0.125 2024-09-20 12:34:30,156 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=901422.6666666666, ans=15.0 2024-09-20 12:34:40,573 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=901422.6666666666, ans=0.0 2024-09-20 12:35:06,268 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=901516.0, ans=0.1 2024-09-20 12:35:15,619 INFO [train.py:1198] (1/2) Epoch 50, batch 3250, loss[loss=0.2099, simple_loss=0.2655, pruned_loss=0.05662, ctc_loss=0.1223, cr_loss=0.4181, over 34655.00 frames. ], tot_loss[loss=0.1983, simple_loss=0.2583, pruned_loss=0.05048, ctc_loss=0.1101, cr_loss=0.382, over 6772170.50 frames. ], batch size: 98, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:35:15,880 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=901562.6666666666, ans=0.125 2024-09-20 12:35:23,921 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=901562.6666666666, ans=0.125 2024-09-20 12:35:25,519 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=901562.6666666666, ans=0.1 2024-09-20 12:35:30,852 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=901609.3333333334, ans=0.125 2024-09-20 12:35:59,449 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=901656.0, ans=0.0 2024-09-20 12:35:59,872 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2024-09-20 12:36:25,411 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=901749.3333333334, ans=0.0 2024-09-20 12:36:26,749 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.260e+02 2.562e+02 2.921e+02 3.505e+02 7.361e+02, threshold=5.842e+02, percent-clipped=2.0 2024-09-20 12:36:36,526 INFO [train.py:1198] (1/2) Epoch 50, batch 3300, loss[loss=0.2063, simple_loss=0.2722, pruned_loss=0.05137, ctc_loss=0.1113, cr_loss=0.3852, over 32967.00 frames. ], tot_loss[loss=0.197, simple_loss=0.2569, pruned_loss=0.05001, ctc_loss=0.1092, cr_loss=0.3796, over 6770408.19 frames. ], batch size: 130, lr: 2.37e-03, grad_scale: 32.0 2024-09-20 12:36:50,726 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=6.83 vs. limit=15.0 2024-09-20 12:37:15,686 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=901889.3333333334, ans=0.2 2024-09-20 12:37:23,595 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=901889.3333333334, ans=0.125 2024-09-20 12:37:34,456 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=901936.0, ans=0.125 2024-09-20 12:37:44,180 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=901982.6666666666, ans=0.1 2024-09-20 12:37:55,367 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=901982.6666666666, ans=0.0 2024-09-20 12:37:59,893 INFO [train.py:1198] (1/2) Epoch 50, batch 3350, loss[loss=0.2097, simple_loss=0.2714, pruned_loss=0.05437, ctc_loss=0.1164, cr_loss=0.4019, over 33875.00 frames. ], tot_loss[loss=0.1979, simple_loss=0.2577, pruned_loss=0.05041, ctc_loss=0.1098, cr_loss=0.3808, over 6744996.93 frames. ], batch size: 122, lr: 2.37e-03, grad_scale: 16.0 2024-09-20 12:38:00,264 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=902029.3333333334, ans=0.1 2024-09-20 12:38:04,984 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.const_attention_rate, batch_count=902029.3333333334, ans=0.025 2024-09-20 12:38:22,926 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=902076.0, ans=0.0 2024-09-20 12:38:34,126 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=902122.6666666666, ans=0.125 2024-09-20 12:38:42,345 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=902122.6666666666, ans=0.0 2024-09-20 12:39:12,302 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.175e+02 2.540e+02 2.887e+02 3.292e+02 5.055e+02, threshold=5.774e+02, percent-clipped=0.0 2024-09-20 12:39:15,812 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.attention_skip_rate, batch_count=902216.0, ans=0.0 2024-09-20 12:39:20,337 INFO [train.py:1198] (1/2) Epoch 50, batch 3400, loss[loss=0.1745, simple_loss=0.2291, pruned_loss=0.04396, ctc_loss=0.09389, cr_loss=0.3301, over 34184.00 frames. ], tot_loss[loss=0.1978, simple_loss=0.2576, pruned_loss=0.05036, ctc_loss=0.1097, cr_loss=0.3807, over 6736160.77 frames. ], batch size: 78, lr: 2.36e-03, grad_scale: 16.0 2024-09-20 12:39:23,910 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=902262.6666666666, ans=0.1 2024-09-20 12:39:28,795 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=902262.6666666666, ans=0.04949747468305833 2024-09-20 12:39:57,609 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=902356.0, ans=0.125 2024-09-20 12:40:17,494 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=7.25 vs. limit=15.0 2024-09-20 12:40:23,805 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2024-09-20 12:40:40,910 INFO [train.py:1198] (1/2) Epoch 50, batch 3450, loss[loss=0.2071, simple_loss=0.2711, pruned_loss=0.05174, ctc_loss=0.1165, cr_loss=0.4105, over 33101.00 frames. ], tot_loss[loss=0.1982, simple_loss=0.258, pruned_loss=0.05052, ctc_loss=0.1101, cr_loss=0.3815, over 6746864.06 frames. ], batch size: 130, lr: 2.36e-03, grad_scale: 16.0 2024-09-20 12:40:57,588 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=902542.6666666666, ans=0.125 2024-09-20 12:41:00,760 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=902542.6666666666, ans=0.125 2024-09-20 12:41:26,337 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=902589.3333333334, ans=0.125 2024-09-20 12:41:44,057 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=902682.6666666666, ans=0.125 2024-09-20 12:41:44,124 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=902682.6666666666, ans=0.0 2024-09-20 12:41:53,274 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.267e+02 2.716e+02 3.190e+02 3.871e+02 6.525e+02, threshold=6.380e+02, percent-clipped=4.0 2024-09-20 12:42:03,582 INFO [train.py:1198] (1/2) Epoch 50, batch 3500, loss[loss=0.1666, simple_loss=0.2278, pruned_loss=0.03786, ctc_loss=0.08511, cr_loss=0.3187, over 34455.00 frames. ], tot_loss[loss=0.1976, simple_loss=0.2575, pruned_loss=0.05031, ctc_loss=0.1097, cr_loss=0.381, over 6748335.19 frames. ], batch size: 85, lr: 2.36e-03, grad_scale: 16.0 2024-09-20 12:42:13,520 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=902729.3333333334, ans=0.2 2024-09-20 12:42:26,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=902776.0, ans=0.0 2024-09-20 12:42:31,153 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=902776.0, ans=0.0 2024-09-20 12:43:08,013 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=902916.0, ans=0.1 2024-09-20 12:43:18,803 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=902916.0, ans=0.0 2024-09-20 12:43:18,989 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=902916.0, ans=0.1 2024-09-20 12:43:23,371 INFO [train.py:1198] (1/2) Epoch 50, batch 3550, loss[loss=0.2119, simple_loss=0.2729, pruned_loss=0.05551, ctc_loss=0.1176, cr_loss=0.4066, over 34382.00 frames. ], tot_loss[loss=0.198, simple_loss=0.2578, pruned_loss=0.05044, ctc_loss=0.11, cr_loss=0.3819, over 6757549.24 frames. ], batch size: 103, lr: 2.36e-03, grad_scale: 16.0 2024-09-20 12:43:23,625 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=902962.6666666666, ans=0.125 2024-09-20 12:43:54,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=903056.0, ans=0.0 2024-09-20 12:44:04,230 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.4.feed_forward1.out_whiten, num_groups=1, num_channels=768, metric=5.49 vs. limit=15.0 2024-09-20 12:44:05,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=903056.0, ans=0.1 2024-09-20 12:44:06,900 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=903056.0, ans=0.125 2024-09-20 12:44:20,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=9.60 vs. limit=15.0 2024-09-20 12:44:22,997 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=903102.6666666666, ans=0.0 2024-09-20 12:44:24,610 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=903102.6666666666, ans=0.125 2024-09-20 12:44:26,152 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=903149.3333333334, ans=0.2 2024-09-20 12:44:31,038 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903149.3333333334, ans=0.1 2024-09-20 12:44:35,381 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.194e+02 2.621e+02 2.882e+02 3.503e+02 9.171e+02, threshold=5.764e+02, percent-clipped=2.0 2024-09-20 12:44:40,539 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=903149.3333333334, ans=0.5 2024-09-20 12:44:43,495 INFO [train.py:1198] (1/2) Epoch 50, batch 3600, loss[loss=0.1885, simple_loss=0.2436, pruned_loss=0.04845, ctc_loss=0.1075, cr_loss=0.376, over 34485.00 frames. ], tot_loss[loss=0.198, simple_loss=0.2578, pruned_loss=0.05046, ctc_loss=0.1101, cr_loss=0.3828, over 6766335.15 frames. ], batch size: 90, lr: 2.36e-03, grad_scale: 32.0 2024-09-20 12:44:50,342 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=903196.0, ans=0.0 2024-09-20 12:44:54,052 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=5.85 vs. limit=12.0 2024-09-20 12:45:12,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=903242.6666666666, ans=0.0 2024-09-20 12:45:34,834 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=903336.0, ans=0.125 2024-09-20 12:45:47,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.3.conv_skip_rate, batch_count=903382.6666666666, ans=0.0 2024-09-20 12:45:50,291 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.const_attention_rate, batch_count=903382.6666666666, ans=0.025 2024-09-20 12:46:05,201 INFO [train.py:1198] (1/2) Epoch 50, batch 3650, loss[loss=0.2224, simple_loss=0.2751, pruned_loss=0.0631, ctc_loss=0.1302, cr_loss=0.437, over 34443.00 frames. ], tot_loss[loss=0.1971, simple_loss=0.257, pruned_loss=0.05008, ctc_loss=0.1094, cr_loss=0.381, over 6769081.65 frames. ], batch size: 110, lr: 2.36e-03, grad_scale: 32.0 2024-09-20 12:46:10,626 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2024-09-20 12:46:29,835 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=903476.0, ans=0.0 2024-09-20 12:46:44,285 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=903522.6666666666, ans=0.0 2024-09-20 12:46:45,974 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=903522.6666666666, ans=0.0 2024-09-20 12:46:51,301 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=768, metric=9.27 vs. limit=15.0 2024-09-20 12:46:55,426 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.4.conv_module2.balancer1.prob, batch_count=903569.3333333334, ans=0.125 2024-09-20 12:47:12,184 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=6.65 vs. limit=15.0 2024-09-20 12:47:17,576 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.209e+02 2.538e+02 2.908e+02 3.851e+02 7.231e+02, threshold=5.817e+02, percent-clipped=5.0 2024-09-20 12:47:18,395 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=6.96 vs. limit=15.0 2024-09-20 12:47:25,512 INFO [train.py:1198] (1/2) Epoch 50, batch 3700, loss[loss=0.2072, simple_loss=0.2708, pruned_loss=0.05224, ctc_loss=0.1134, cr_loss=0.4092, over 34621.00 frames. ], tot_loss[loss=0.1974, simple_loss=0.2575, pruned_loss=0.05005, ctc_loss=0.1094, cr_loss=0.3817, over 6784284.23 frames. ], batch size: 102, lr: 2.36e-03, grad_scale: 32.0 2024-09-20 12:47:30,638 INFO [scaling.py:1120] (1/2) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2024-09-20 12:47:37,051 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=903662.6666666666, ans=0.0 2024-09-20 12:47:37,101 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=903662.6666666666, ans=0.0 2024-09-20 12:47:40,145 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=903709.3333333334, ans=0.1 2024-09-20 12:47:42,432 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.16 vs. limit=10.0 2024-09-20 12:47:49,762 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=903709.3333333334, ans=0.1 2024-09-20 12:48:10,769 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=903756.0, ans=0.125 2024-09-20 12:48:14,247 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.16 vs. limit=22.5 2024-09-20 12:48:22,591 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=768, metric=4.98 vs. limit=15.0 2024-09-20 12:48:23,767 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=903802.6666666666, ans=0.1 2024-09-20 12:48:31,846 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=903849.3333333334, ans=0.125 2024-09-20 12:48:43,669 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=903849.3333333334, ans=0.125 2024-09-20 12:48:45,044 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=903896.0, ans=0.1 2024-09-20 12:48:46,370 INFO [train.py:1198] (1/2) Epoch 50, batch 3750, loss[loss=0.2181, simple_loss=0.279, pruned_loss=0.05787, ctc_loss=0.124, cr_loss=0.4201, over 34374.00 frames. ], tot_loss[loss=0.2005, simple_loss=0.2606, pruned_loss=0.05126, ctc_loss=0.1118, cr_loss=0.3875, over 6786123.30 frames. ], batch size: 113, lr: 2.36e-03, grad_scale: 32.0 2024-09-20 12:48:58,421 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=903896.0, ans=0.125 2024-09-20 12:49:00,202 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=903896.0, ans=0.125 2024-09-20 12:49:01,784 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=903942.6666666666, ans=0.0 2024-09-20 12:49:10,278 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=576, metric=5.08 vs. limit=10.0 2024-09-20 12:49:48,189 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=904036.0, ans=0.125 2024-09-20 12:49:59,085 WARNING [optim.py:487] (1/2) Clipping_scale=2.0, grad-norm quartiles 2.290e+02 2.483e+02 2.635e+02 2.922e+02 4.835e+02, threshold=5.270e+02, percent-clipped=0.0 2024-09-20 12:50:07,232 INFO [train.py:1198] (1/2) Epoch 50, batch 3800, loss[loss=0.2208, simple_loss=0.2737, pruned_loss=0.06192, ctc_loss=0.1336, cr_loss=0.4335, over 30256.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.263, pruned_loss=0.05251, ctc_loss=0.1141, cr_loss=0.3923, over 6676609.61 frames. ], batch size: 175, lr: 2.36e-03, grad_scale: 32.0 2024-09-20 12:50:09,924 INFO [scaling.py:1024] (1/2) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2024-09-20 12:50:36,026 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=904176.0, ans=0.0 2024-09-20 12:50:54,935 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=904222.6666666666, ans=12.0 2024-09-20 12:50:56,455 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=904269.3333333334, ans=0.2 2024-09-20 12:51:09,446 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=904269.3333333334, ans=0.0 2024-09-20 12:51:12,734 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=904316.0, ans=0.0 2024-09-20 12:51:30,478 INFO [train.py:1198] (1/2) Epoch 50, batch 3850, loss[loss=0.2144, simple_loss=0.2728, pruned_loss=0.05741, ctc_loss=0.1254, cr_loss=0.4043, over 23271.00 frames. ], tot_loss[loss=0.2057, simple_loss=0.2646, pruned_loss=0.05378, ctc_loss=0.1169, cr_loss=0.3957, over 6249734.01 frames. ], batch size: 244, lr: 2.36e-03, grad_scale: 16.0 2024-09-20 12:51:32,495 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=904362.6666666666, ans=0.2 2024-09-20 12:51:44,147 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=904362.6666666666, ans=0.125 2024-09-20 12:52:06,076 INFO [scaling.py:214] (1/2) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.const_attention_rate, batch_count=904456.0, ans=0.025 2024-09-20 12:52:13,512 INFO [train.py:1496] (1/2) Done!